GreekTTS / README.md

moiraai2024

Update README.md

40479a9 verified 6 months ago

5.18 kB

	---
	base_model: unsloth/csm-1b
	pipeline_tag: text-to-speech
	tags:
	- base_model:adapter:unsloth/csm-1b
	- lora
	- transformers
	- unsloth
	license: apache-2.0
	language:
	- el
	new_version: moiraai2024/GreekTTS-1.5
	---


	# Description
	Website: https://moira-ai.com/

	Email: moira.ai2024@gmail.com

	Report: https://moiraai2024.github.io/GreekTTS-demo/

	Welcome to Moira.AI GreekTTS, a state-of-the-art text-to-speech model fine-tuned specifically for Greek language synthesis! This model is built on the powerful sesame/csm-1b architecture, which has been fine-tuned with Greek speech data to provide high-quality, natural-sounding speech generation.

	Moira.AI excels in delivering lifelike, expressive speech, making it ideal for a wide range of applications, including virtual assistants, audiobooks, accessibility tools, and more. By leveraging the power of large-scale transformer-based models, Moira.AI ensures fluid prosody and accurate pronunciation of Greek text.

	Key Features:

	- Fine-tuned specifically for Greek TTS.
	- Built on the robust sesame/csm-1b model, ensuring high-quality performance.
	- Capable of generating natural-sounding, expressive Greek speech.
	- Ideal for integration into applications requiring high-quality, human-like text-to-speech synthesis in Greek.

	Explore the model and see how it can enhance your Greek TTS applications!


	# How to use it
	https://docs.unsloth.ai/get-started/install-and-update/conda-install


	```python
	conda create --name unsloth_env \
	python=3.11 \
	pytorch-cuda=12.1 \
	pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \
	-y
	```

	```
	conda activate unsloth_env
	```
	```
	pip install unsloth
	```

	```python
	from unsloth import FastModel
	from transformers import CsmForConditionalGeneration
	import torch

	gpu_stats = torch.cuda.get_device_properties(0)
	start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
	max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
	print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
	print(f"{start_gpu_memory} GB of memory reserved.")

	from unsloth import FastLanguageModel as FastModel
	from peft import PeftModel
	from IPython.display import Audio

	# --- 1. Load the Base Unsloth Model and Processor ---
	# This setup must be identical to your training script.
	print("Loading the base model and processor...")
	model, processor = FastModel.from_pretrained(
	model_name = "unsloth/csm-1b",
	max_seq_length = 2048,
	dtype = None,
	auto_model = CsmForConditionalGeneration,
	load_in_4bit = False,
	)

	# --- 2. Identify and Load Your Best LoRA Checkpoint ---
	# !!! IMPORTANT: Change this path to your best checkpoint folder !!!
	# (The one you found in trainer_state.json)
	int_check = 30_000

	final_int =94_764
	best_checkpoint_path = "./training_outputs_second_run/checkpoint-"+str(final_int)

	print(f"\nLoading and merging the LoRA adapter from: {best_checkpoint_path}")

	# This command seamlessly merges your trained adapter weights onto the base model
	model = PeftModel.from_pretrained(model, best_checkpoint_path)

	print("\nFine-tuned model is ready for inference!")
	# Unsloth automatically handles moving the model to the GPU
	```

	```python
	from transformers import AutoProcessor
	processor = AutoProcessor.from_pretrained("unsloth/csm-1b")
	```

	```python
	greek_sentences = [
	"Σου μιλάααανε!",
	"Γεια σας, είμαι η Μίρα και σήμερα θα κάνουμε μάθημα Ελληνικων.",
	"Ημουν εξω με φιλους και τα επινα. Μου αρεσει πολυ η μπυρα αλφα!",
	"Όταν ξανά άνοιξα τα μάτια διαπίστωσα ότι ήμουν ξαπλωμένος σε ένα μαλακό στρώμα από κουβέρτες",
	]
	```

	```python
	from IPython.display import Audio, display
	import soundfile as sf
	```

	```python
	# --- Configure the Generation ---

	int_ = 1
	text_to_synthesize = greek_sentences[int_]

	print(f"\nSynthesizing text: '{text_to_synthesize}'")

	speaker_id = 0
	inputs = processor(f"[{speaker_id}]{text_to_synthesize}", add_special_tokens=True).to("cuda")

	audio_values = model.generate(
	**inputs,
	max_new_tokens=125, # 125 tokens is 10 seconds of audio, for longer speech increase this
	# play with these parameters to tweak results
	# depth_decoder_top_k=0,
	# depth_decoder_top_p=0.9,
	# depth_decoder_do_sample=True,
	# depth_decoder_temperature=0.9,
	# top_k=0,
	# top_p=1.0,
	# temperature=0.9,
	# do_sample=True,
	#########################################################
	output_audio=True
	)
	```

	```python
	audio = audio_values[0].to(torch.float32).cpu().numpy()
	sf.write("example_without_context.wav", audio, 24000)
	display(Audio(audio, rate=24000))
	```

	# 📖 How to Cite This Model
	```
	@misc{moira2025greektts15,
	title = {GreekTTS-1.0: A State-of-the-Art System for Greek Text-to-Speech Synthesis},
	author = {Moira.AI},
	year = {2025},
	month = {sep},
	day = {22},
	url = {https://moira-ai.com/},
	note = {Demo report: https://moiraai2024.github.io/GreekTTS-demo/}
	}
	```