Instructions to use khazarai/Cardiology-TTS with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use khazarai/Cardiology-TTS with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/csm-1b") model = PeftModel.from_pretrained(base_model, "khazarai/Cardiology-TTS") - Transformers
How to use khazarai/Cardiology-TTS with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="khazarai/Cardiology-TTS")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("khazarai/Cardiology-TTS", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| base_model: unsloth/csm-1b | |
| library_name: peft | |
| license: mit | |
| datasets: | |
| - Dev372/Cardiology_Medical_STT_Dataset | |
| language: | |
| - en | |
| pipeline_tag: text-to-speech | |
| tags: | |
| - cardiology | |
| - medical | |
| - transformers | |
| # Model Card for Cardiology-TTS | |
| <!-- Provide a quick summary of what the model is/does. --> | |
| ## Model Details | |
| This is a fine-tuned version of the Conversational Speech Model (CSM-1B) using LoRA for parameter-efficient fine-tuning. | |
| The model is trained on a 1,530-sample dataset of medical cardiology texts, designed to generate high-quality speech from cardiology-related text. | |
| It leverages the capabilities of the original CSM-1B model for text-to-speech synthesis, extended with domain-specific terminology for medical cardiology. | |
| It is intended for speech generation in English, especially for clinical and educational contexts. | |
| ## Uses | |
| ### Direct Use | |
| - Text-to-Speech (TTS) generation for cardiology educational content, medical reports, or clinical explanations. | |
| - Integrating spoken content in healthcare apps, e-learning platforms, or patient-facing tools for cardiology topics. | |
| - Research and prototyping domain-specific TTS applications using small medical datasets. | |
| ## Bias, Risks, and Limitations | |
| - Small training dataset (2K samples) β Model may not generalize well to rare medical terms, long passages, or other medical domains outside cardiology. | |
| - English-only support β Model is not trained for other languages. | |
| - TTS artifacts β Some generated audio may contain unnatural pauses, mispronunciations, or clipping in challenging sentences. | |
| - Not for diagnostic purposes β Model outputs speech for educational/illustrative purposes and should not be used for medical diagnosis or patient instructions. | |
| - Model size and resources β CSM-1B is large; requires GPU for real-time inference and significant VRAM for batch synthesis. | |
| ## How to Get Started with the Model | |
| Use the code below to get started with the model. | |
| ```python | |
| import torch | |
| from transformers import CsmForConditionalGeneration, AutoProcessor | |
| import soundfile as sf | |
| from peft import PeftModel | |
| model_id = "unsloth/csm-1b" | |
| device = "cuda" if torch.cuda.is_available() else "cpu" | |
| processor = AutoProcessor.from_pretrained(model_id) | |
| base_model = CsmForConditionalGeneration.from_pretrained(model_id, device_map=device) | |
| model = PeftModel.from_pretrained(base_model, "khazarai/Cardiology-TTS") | |
| text = "The coronary arteries are patent with no significant stenosis." | |
| speaker_id = 0 | |
| conversation = [ | |
| {"role": str(speaker_id), "content": [{"type": "text", "text": text}]}, | |
| ] | |
| audio_values = model.generate( | |
| **processor.apply_chat_template( | |
| conversation, | |
| tokenize=True, | |
| return_dict=True, | |
| ).to("cuda"), | |
| max_new_tokens=200, | |
| # play with these parameters to tweak results | |
| # depth_decoder_top_k=0, | |
| # depth_decoder_top_p=0.9, | |
| # depth_decoder_do_sample=True, | |
| # depth_decoder_temperature=0.9, | |
| # top_k=0, | |
| # top_p=1.0, | |
| # temperature=0.9, | |
| # do_sample=True, | |
| ######################################################### | |
| output_audio=True | |
| ) | |
| audio = audio_values[0].to(torch.float32).cpu().numpy() | |
| sf.write("example.wav", audio, 24000) | |
| ``` | |
| ## Training Details | |
| ### Training Data | |
| - Dataset: Dev372/Cardiology_Medical_STT_Dataset | |
| 1,530 samples of cardiology-related text paired with audio. | |
| ### Framework versions | |
| - PEFT 0.15.2 |