--- language: - en license: apache-2.0 tags: - speech - phone-recognition - ipa - ctc - pronunciation-assessment - mhubert base_model: utter-project/mHuBERT-147 pipeline_tag: audio-classification datasets: - timit-asr/timit_asr - buckeye metrics: - per model-index: - name: mHuBERT-147-ipa-ctc-ft results: - task: type: phone-recognition name: Phone Recognition dataset: name: TIMIT type: timit-asr/timit_asr split: test metrics: - name: Phone Error Rate type: per value: 0.0896 - task: type: phone-recognition name: Phone Recognition dataset: name: Buckeye type: buckeye split: validation metrics: - name: Phone Error Rate type: per value: 0.1987 --- # mHuBERT-147 IPA CTC FT Fine-tuned English IPA phone-recognition model initialized from `utter-project/mHuBERT-147` and trained with a BiLSTM CTC head. This repository contains the full fine-tuned model: - mHuBERT-147 backbone - BiLSTM CTC head - audio preprocessor config - model size: `97.2M` parameters total (`94.4M` backbone + `2.85M` CTC head) Training setup: - initialized from `utter-project/mHuBERT-147` - trained on TIMIT train + Buckeye train Validation results: - TIMIT TEST: `PER = 0.0896` - Buckeye val: `PER = 0.1987` The output vocabulary is the same IPA set as in `istomin9192/mHuBERT-147-ipa-head`, with one extra CTC blank symbol at the last output index. Minimal loading example: ```python import json import librosa import torch from transformers import AutoFeatureExtractor, AutoModel repo_id = "istomin9192/mHuBERT-147-ipa-ctc-ft" feature_extractor = AutoFeatureExtractor.from_pretrained(repo_id, trust_remote_code=True) model = AutoModel.from_pretrained(repo_id, trust_remote_code=True) model.eval() with open("ipa_map.json", "r", encoding="utf-8") as f: id2phone = {int(k): v for k, v in json.load(f)["id2phone"].items()} wav, sr = librosa.load(wav_file, sr=16000, mono=True) inputs = feature_extractor(wav, sampling_rate=16000, return_tensors="pt") with torch.no_grad(): logits = model(**inputs).logits[0] pred_ids = logits.argmax(dim=-1).tolist() blank_id = model.config.architecture["blank_id"] phones = [] prev = blank_id for pid in pred_ids: if pid != blank_id and pid != prev: phones.append(id2phone[pid]) prev = pid print(phones) ```