How to use from
SGLang
Install from pip and serve model
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ringover/ringover-summaries-llama3b-instruct-v1.2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ringover/ringover-summaries-llama3b-instruct-v1.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'
Use Docker images
docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ringover/ringover-summaries-llama3b-instruct-v1.2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ringover/ringover-summaries-llama3b-instruct-v1.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'
Quick Links

Model Card for Model ID

Model Details

Model Description

This model is a LoRA (Low-Rank Adaptation) adapter for Llama-3.2-3B-Instruct, specifically fine-tuned for high-quality multilingual(fr,en,sp) summarization of phone call transcripts. It has been optimized to handle long-form dialogue and extract key information across multiple European languages.

  • Training Time: 2026 jan
  • Model type: LoRA Adapter (PEFT)
  • Language(s) (NLP): multilangue (finetuned on FR,EN,SP )
  • Finetuned from model [optional]: [meta-llama/Llama-3.2-3B-Instruct]

Quick Start

Since this is a LoRA adapter, you must load the base model first, then apply these adapters on top.

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model_id = "meta-llama/Llama-3.2-3B-Instruct"
adapter_id = "ringover/ringover-summaries-llama3b-instruct-v1.2-lora"

base_model = AutoModelForCausalLM.from_pretrained(base_model_id)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

#  Load lora adapter 
ft_model = PeftModel.from_pretrained(base_model, adapter_id)

# Ready for inference
inputs = tokenizer("Summarizing the following phone call transcript ", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=700)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

- 14 724 total  transcirptions 
- train & test dataset : 13724 trans , eval dataset :  1000 transcriptions
- 95% transcriptions are ≤  8535 tokens
- max length : 33201 tokens
- Language distribution :
    
    Counter({'fr': 11079,
    'es': 3176,
    'en': 1393,
    'ca': 49,
    'it': 28,
    'pt': 13,
    'de': 3,
    'pl': 1})

Training Procedure

This model was fine-tuned using the SFTTrainer from the trl library.

  • Framework : PyTorch & Hugging Face Transformers

  • Library : PEFT (Parameter-Efficient Fine-Tuning)

  • Precision: BF16

Training Hyperparameters

  per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    learning_rate=2e-5, # 1e-4 was too high 
    
    logging_steps=50,
    warmup_ratio=0.1,
    
    eval_strategy="steps",
    eval_steps =200,
    save_strategy="steps",
    save_steps =400,
     
    report_to="tensorboard",
    load_best_model_at_end=True,
    save_total_limit=1,

    metric_for_best_model="eval_loss"
    greater_is_better=False,
    
    # metric_for_best_model="eval_rougeL",
    # greater_is_better=True,

    fp16=True,
    lr_scheduler_type="cosine",

     LoraConfig(
        r=16, #rank 
        lora_alpha=32, # alpha value
        lora_dropout=0.1,
        target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","down_proj","up_proj"],
        bias="none",
        task_type="CAUSAL_LM",
    )

Evaluation

Metrics

Multi-dimensional evaluation approach:

  • Base metrics: rouge, bleu, bertoscore, LLM-as-a-juge (GPT4o-mini)

  • Language Count meric: : DetectLang

  • Lexical metrics:(finetuned summ V.S. gold summ) : BLEU_details_Brevity_Penalty, chrF,METEOR, bleurt

  • Facts metrics: (finetuned summ V.S. Context): alignscore, uniEval

Results

See Ringover Summarization Doc

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ringover/ringover-summaries-llama3b-instruct-v1.2

Adapter
(760)
this model