Instructions to use ringover/ringover-summaries-llama3b-instruct-v1.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ringover/ringover-summaries-llama3b-instruct-v1.2 with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")
model = PeftModel.from_pretrained(base_model, "ringover/ringover-summaries-llama3b-instruct-v1.2")

Transformers

How to use ringover/ringover-summaries-llama3b-instruct-v1.2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ringover/ringover-summaries-llama3b-instruct-v1.2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("ringover/ringover-summaries-llama3b-instruct-v1.2", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ringover/ringover-summaries-llama3b-instruct-v1.2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ringover/ringover-summaries-llama3b-instruct-v1.2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ringover/ringover-summaries-llama3b-instruct-v1.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ringover/ringover-summaries-llama3b-instruct-v1.2

SGLang

How to use ringover/ringover-summaries-llama3b-instruct-v1.2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ringover/ringover-summaries-llama3b-instruct-v1.2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ringover/ringover-summaries-llama3b-instruct-v1.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ringover/ringover-summaries-llama3b-instruct-v1.2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ringover/ringover-summaries-llama3b-instruct-v1.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ringover/ringover-summaries-llama3b-instruct-v1.2 with Docker Model Runner:
```
docker model run hf.co/ringover/ringover-summaries-llama3b-instruct-v1.2
```

Model Card for Model ID

Model Details

Model Description

This model is a LoRA (Low-Rank Adaptation) adapter for Llama-3.2-3B-Instruct, specifically fine-tuned for high-quality multilingual(fr,en,sp) summarization of phone call transcripts. It has been optimized to handle long-form dialogue and extract key information across multiple European languages.

Training Time: 2026 jan
Model type: LoRA Adapter (PEFT)
Language(s) (NLP): multilangue (finetuned on FR,EN,SP )
Finetuned from model [optional]: [meta-llama/Llama-3.2-3B-Instruct]

Quick Start

Since this is a LoRA adapter, you must load the base model first, then apply these adapters on top.

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model_id = "meta-llama/Llama-3.2-3B-Instruct"
adapter_id = "ringover/ringover-summaries-llama3b-instruct-v1.2-lora"

base_model = AutoModelForCausalLM.from_pretrained(base_model_id)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

#  Load lora adapter 
ft_model = PeftModel.from_pretrained(base_model, adapter_id)

# Ready for inference
inputs = tokenizer("Summarizing the following phone call transcript ", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=700)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

- 14 724 total  transcirptions 
- train & test dataset : 13724 trans , eval dataset :  1000 transcriptions
- 95% transcriptions are ≤  8535 tokens
- max length : 33201 tokens
- Language distribution :
    
    Counter({'fr': 11079,
    'es': 3176,
    'en': 1393,
    'ca': 49,
    'it': 28,
    'pt': 13,
    'de': 3,
    'pl': 1})

Training Procedure

This model was fine-tuned using the SFTTrainer from the trl library.

Framework : PyTorch & Hugging Face Transformers
Library : PEFT (Parameter-Efficient Fine-Tuning)
Precision: BF16

Training Hyperparameters

  per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    learning_rate=2e-5, # 1e-4 was too high 
    
    logging_steps=50,
    warmup_ratio=0.1,
    
    eval_strategy="steps",
    eval_steps =200,
    save_strategy="steps",
    save_steps =400,
     
    report_to="tensorboard",
    load_best_model_at_end=True,
    save_total_limit=1,

    metric_for_best_model="eval_loss"
    greater_is_better=False,
    
    # metric_for_best_model="eval_rougeL",
    # greater_is_better=True,

    fp16=True,
    lr_scheduler_type="cosine",

     LoraConfig(
        r=16, #rank 
        lora_alpha=32, # alpha value
        lora_dropout=0.1,
        target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","down_proj","up_proj"],
        bias="none",
        task_type="CAUSAL_LM",
    )

Evaluation

Metrics

Multi-dimensional evaluation approach:

Base metrics: rouge, bleu, bertoscore, LLM-as-a-juge (GPT4o-mini)
Language Count meric: : DetectLang
Lexical metrics:(finetuned summ V.S. gold summ) : BLEU_details_Brevity_Penalty, chrF,METEOR, bleurt
Facts metrics: (finetuned summ V.S. Context): alignscore, uniEval

Results

See Ringover Summarization Doc

Downloads last month: 2

Model tree for ringover/ringover-summaries-llama3b-instruct-v1.2

Base model

meta-llama/Llama-3.2-3B-Instruct

Adapter

(760)

this model