metadata
language:
- en
license: apache-2.0
tags:
- speech
- phone-recognition
- ipa
- ctc
- pronunciation-assessment
- mhubert
base_model: utter-project/mHuBERT-147
pipeline_tag: audio-classification
datasets:
- timit-asr/timit_asr
- buckeye
metrics:
- per
model-index:
- name: mHuBERT-147-ipa-ctc-ft
results:
- task:
type: phone-recognition
name: Phone Recognition
dataset:
name: TIMIT
type: timit-asr/timit_asr
split: test
metrics:
- name: Phone Error Rate
type: per
value: 0.0896
- task:
type: phone-recognition
name: Phone Recognition
dataset:
name: Buckeye
type: buckeye
split: validation
metrics:
- name: Phone Error Rate
type: per
value: 0.1987
mHuBERT-147 IPA CTC FT
Fine-tuned English IPA phone-recognition model initialized from
utter-project/mHuBERT-147 and trained with a BiLSTM CTC head.
This repository contains the full fine-tuned model:
- mHuBERT-147 backbone
- BiLSTM CTC head
- audio preprocessor config
- model size:
97.2Mparameters total (94.4Mbackbone +2.85MCTC head)
Training setup:
- initialized from
utter-project/mHuBERT-147 - trained on TIMIT train + Buckeye train
Validation results:
- TIMIT TEST:
PER = 0.0896 - Buckeye val:
PER = 0.1987
The output vocabulary is the same IPA set as in istomin9192/mHuBERT-147-ipa-head,
with one extra CTC blank symbol at the last output index.
Minimal loading example:
import json
import librosa
import torch
from transformers import AutoFeatureExtractor, AutoModel
repo_id = "istomin9192/mHuBERT-147-ipa-ctc-ft"
feature_extractor = AutoFeatureExtractor.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
model.eval()
with open("ipa_map.json", "r", encoding="utf-8") as f:
id2phone = {int(k): v for k, v in json.load(f)["id2phone"].items()}
wav, sr = librosa.load(wav_file, sr=16000, mono=True)
inputs = feature_extractor(wav, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits[0]
pred_ids = logits.argmax(dim=-1).tolist()
blank_id = model.config.architecture["blank_id"]
phones = []
prev = blank_id
for pid in pred_ids:
if pid != blank_id and pid != prev:
phones.append(id2phone[pid])
prev = pid
print(phones)