---
license: apache-2.0
language:
- el
base_model:
- unsloth/orpheus-3b-0.1-ft
pipeline_tag: text-to-speech
---

# Description
Website: https://moira-ai.com/

Email: moira.ai2024@gmail.com

Report: https://moiraai2024.github.io/GreekTTS-1.5-demo/

Welcome to Moira.AI **GreekTTS-1.5**, a state-of-the-art Greek text-to-speech system designed to deliver exceptional naturalness and intelligibility in speech synthesis. Building on our previous work in Greek TTS, GreekTTS-1.5 marks a significant leap forward in quality, accessibility, and performance.

GreekTTS-1.5 is built on the powerful **Orpheus foundation model** and fine-tuned using **Low-Rank Adaptation (LoRA)** — a parameter-efficient method that enables effective adaptation to a custom, high-quality Greek speech corpus. This results in a model that consistently outperforms existing baselines, offering fluid prosody, accurate pronunciation, and expressive speech generation.

Whether you're developing virtual assistants, audiobooks, accessibility tools, or any other application that requires natural-sounding Greek speech, GreekTTS-1.5 provides a high-fidelity solution ready for integration.

Key Features:

- Built on the robust Orpheus foundation model for high-quality performance.
- Fine-tuned using LoRA for efficient adaptation to Greek speech data.
- Produces natural, expressive, and intelligible Greek speech.
- Designed specifically for Greek — a low-resource language in TTS.
- Ideal for integration into applications requiring human-like speech synthesis.
- Open-source and extensible for future research and development.

**Explore GreekTTS-1.5 and take your Greek TTS applications to the next level.**

# How to use it
https://docs.unsloth.ai/get-started/install-and-update/conda-install


```python
conda create --name unsloth_env \
    python=3.11 \
    pytorch-cuda=12.1 \
    pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \
    -y
```

```
conda activate unsloth_env
pip install unsloth
```

```python
import torch
from unsloth import FastLanguageModel
from transformers import AutoTokenizer
from snac import SNAC
from IPython.display import display, Audio
import numpy as np
import locale
import scipy.io.wavfile

gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

from unsloth import FastLanguageModel as FastModel
from peft import PeftModel
from IPython.display import Audio


# --- Define Constants and Configuration ---
print("\n⏳ Defining constants...")
# Model paths
BASE_MODEL_NAME = "unsloth/orpheus-3b-0.1-ft"
# 🔴 CRITICAL: UPDATE THIS PATH 🔴
# This must be the path to the LoRA adapters you saved during training.
# This should be inside the `output_dir` you set, e.g., "orpheus_training_dir/checkpoint-6000"
LORA_ADAPTERS_PATH = "training_dir_latest/checkpoint-1688" 


# --- Load Your Fine-Tuned Orpheus Model ---
print("\n⏳ Loading models...")
# Load the base Orpheus model
model, _ = FastLanguageModel.from_pretrained(
    model_name = BASE_MODEL_NAME,
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = False,
)
# Load your fine-tuned LoRA adapters on top
model.load_adapter(LORA_ADAPTERS_PATH)
print("✅ Loaded fine-tuned LoRA adapters.")

# Load the Orpheus tokenizer
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_NAME)

# Load the SNAC model (the "Vocal Cords")
snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz")
```


```python

# Special token IDs 
tokeniser_length = 128256
start_of_speech = tokeniser_length + 1
end_of_speech = tokeniser_length + 2
start_of_human = tokeniser_length + 3
end_of_human = tokeniser_length + 4
start_of_ai = tokeniser_length + 5
end_of_ai =  tokeniser_length + 6
pad_token = 128263
start_of_text = 128000
end_of_text = 128009
print("✅ Constants defined.")
```

```python
golden_test_set_prompts =  [
        "Στις 15 Μαΐου του 2024, το προϊόν κόστιζε 19,50€.",
        "Έκανα post στο Instagram και μετά πήγα για shopping στο mall.",
        "Ο λογαριασμός της Δ.Ε.Η. πρέπει να πληρωθεί, π.χ. μέσω τραπέζης.", # D.E.H. and p.x.
        "Η εκστρατεία προσέλκυσε χιλιάδες εθελοντές.",
        "Η Μαρία Παπαδοπούλου συνάντησε τον Γιάννη Οικονόμου.",
        "Μια πάπια, μα ποια πάπια Μια πάπια με παπιά.",
        "Ο παπάς ο παχύς, έφαγε παχιά φακή. Γιατί παπά παχύ, έφαγες παχιά φακή;",
        "Άσπρη πέτρα ξέξασπρη κι απ' τον ήλιο ξεξασπρότερη.",
        "Ο μπαμπάς πήγε στην αντάρα για να βρει τα αγκάθια.", # Tests μπ, ντ, γκ
        "Οι τρεις ιερείς είδαν το υλικό.", # Tests ει, οι, υι (all sound like /i/)
        "Έφαγα τζατζίκι και τσάι στην πλατεία.", # Tests τσ, τζ
        "Ο νόμος είναι σαφής.", # NOmos (law)
        "Ο νομός Αττικής είναι μεγάλος.", # noMOS (prefecture)
        "Η παγκοσμιοποίηση επηρεάζει την οικονομία." # Tests stress on long words,
    ]
```


```python
# --- Configure the Generation ---

def infer(prompts,chosen_voice):

    FastLanguageModel.for_inference(model) # Enable native 2x faster inference
    
    # Moving snac_model cuda to cpu
    snac_model.to("cpu")
    
    prompts_ = [(f"{chosen_voice}: " + p) if chosen_voice else p for p in prompts]
    
    all_input_ids = []
    
    for prompt in prompts_:
      input_ids = tokenizer(prompt, return_tensors="pt").input_ids
      all_input_ids.append(input_ids)
    
    start_token = torch.tensor([[ 128259]], dtype=torch.int64) # Start of human
    end_tokens = torch.tensor([[128009, 128260]], dtype=torch.int64) # End of text, End of human
    
    all_modified_input_ids = []
    for input_ids in all_input_ids:
      modified_input_ids = torch.cat([start_token, input_ids, end_tokens], dim=1) # SOH SOT Text EOT EOH
      all_modified_input_ids.append(modified_input_ids)
    
    all_padded_tensors = []
    all_attention_masks = []
    max_length = max([modified_input_ids.shape[1] for modified_input_ids in all_modified_input_ids])
    for modified_input_ids in all_modified_input_ids:
      padding = max_length - modified_input_ids.shape[1]
      padded_tensor = torch.cat([torch.full((1, padding), 128263, dtype=torch.int64), modified_input_ids], dim=1)
      attention_mask = torch.cat([torch.zeros((1, padding), dtype=torch.int64), torch.ones((1, modified_input_ids.shape[1]), dtype=torch.int64)], dim=1)
      all_padded_tensors.append(padded_tensor)
      all_attention_masks.append(attention_mask)
    
    all_padded_tensors = torch.cat(all_padded_tensors, dim=0)
    all_attention_masks = torch.cat(all_attention_masks, dim=0)
    
    
    input_ids = all_padded_tensors.to("cuda")
    attention_mask = all_attention_masks.to("cuda")
    generated_ids = model.generate(
          input_ids=input_ids,
          attention_mask=attention_mask,
          max_new_tokens=1200,
          do_sample=True,
          temperature=0.6,
          top_p=0.95,
          repetition_penalty=1.1,
          num_return_sequences=1,
          eos_token_id=128258,
         use_cache = True
      )
    token_to_find = 128257
    token_to_remove = 128258
    
    token_indices = (generated_ids == token_to_find).nonzero(as_tuple=True)
    
    if len(token_indices[1]) > 0:
        last_occurrence_idx = token_indices[1][-1].item()
        cropped_tensor = generated_ids[:, last_occurrence_idx+1:]
    else:
        cropped_tensor = generated_ids
    
    mask = cropped_tensor != token_to_remove
    
    processed_rows = []
    
    for row in cropped_tensor:
        masked_row = row[row != token_to_remove]
        processed_rows.append(masked_row)
    
    code_lists = []
    
    for row in processed_rows:
        row_length = row.size(0)
        new_length = (row_length // 7) * 7
        trimmed_row = row[:new_length]
        trimmed_row = [t - 128266 for t in trimmed_row]
        code_lists.append(trimmed_row)
    
    
    def redistribute_codes(code_list):
      layer_1 = []
      layer_2 = []
      layer_3 = []
      for i in range((len(code_list)+1)//7):
        layer_1.append(code_list[7*i])
        layer_2.append(code_list[7*i+1]-4096)
        layer_3.append(code_list[7*i+2]-(2*4096))
        layer_3.append(code_list[7*i+3]-(3*4096))
        layer_2.append(code_list[7*i+4]-(4*4096))
        layer_3.append(code_list[7*i+5]-(5*4096))
        layer_3.append(code_list[7*i+6]-(6*4096))
      codes = [torch.tensor(layer_1).unsqueeze(0),
             torch.tensor(layer_2).unsqueeze(0),
             torch.tensor(layer_3).unsqueeze(0)]
    
      # codes = [c.to("cuda") for c in codes]
      audio_hat = snac_model.decode(codes)
      return audio_hat
    
    my_samples = []
    for code_list in code_lists:
      samples = redistribute_codes(code_list)
      my_samples.append(samples)
    from IPython.display import display, Audio
    if len(prompts) != len(my_samples):
      raise Exception("Number of prompts and samples do not match")
    else:
      for i in range(len(my_samples)):
        print(prompts[i])
        samples = my_samples[i]
        display(Audio(samples.detach().squeeze().to("cpu").numpy(), rate=24000))
    # Clean up to save RAM
    del my_samples,samples

```

```python
# --- Run infrence ---
for prompt in golden_test_set_prompts:
    prompts = [prompt,]
    print(prompts)
    chosen_voice = None # None for single-speaker
    infer(prompts,chosen_voice)
```

# 📖 How to Cite This Model
```
@misc{moira2025greektts15,
  title        = {GreekTTS-1.5: A State-of-the-Art System for Greek Text-to-Speech Synthesis},
  author       = {Moira.AI},
  year         = {2025},
  month        = {oct},
  day          = {12},
  url          = {https://moira-ai.com/},
  note         = {Demo report: https://moiraai2024.github.io/GreekTTS-1.5-demo/}
}
```