Instructions to use pebeto/amigo-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use pebeto/amigo-lora with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-2B")
model = PeftModel.from_pretrained(base_model, "pebeto/amigo-lora")

Transformers

How to use pebeto/amigo-lora with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="pebeto/amigo-lora")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("pebeto/amigo-lora", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use pebeto/amigo-lora with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "pebeto/amigo-lora"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pebeto/amigo-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/pebeto/amigo-lora

SGLang

How to use pebeto/amigo-lora with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "pebeto/amigo-lora" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pebeto/amigo-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "pebeto/amigo-lora" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pebeto/amigo-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use pebeto/amigo-lora with Docker Model Runner:
```
docker model run hf.co/pebeto/amigo-lora
```

amigo-lora

A LoRA adapter that gives Qwen3.5-2B the voice of a warm, patient companion for an older adult. It speaks in short, kind sentences, takes interest in the person's day, and keeps them company. Built for amigo, a local and private voice companion, during the Hugging Face Build Small Hackathon.

What it teaches

This adapter shapes how the model talks: warm, brief, in a Peruvian Spanish register or plain English. It carries no facts about the person. Their name, family, health, and routine live in the running app's profile and memory, never in the weights, so that information stays private and stays current.

Why a LoRA on a 2B

The point is to hold the companion voice on a small model that runs fast on a laptop CPU, the kind of hardware a phone has. The difference is easy to hear. Ask the plain 2B "me siento un poco solo hoy" and it answers cheerfully but misses the feeling. With this adapter it acknowledges the loneliness and offers company before asking how the person is.

Training

Base: Qwen/Qwen3.5-2B
Method: QLoRA (4-bit), rank 16, alpha 32, dropout 0.05
Target modules: q_proj k_proj v_proj o_proj gate_proj up_proj down_proj
Schedule: 3 epochs, learning rate 2e-4
Data: 366 curated dialogue pairs (186 Spanish, 180 English), each wrapped with the app's exact system persona, plus a held-out set of 45. The pairs carry the voice, never personal facts.

It was chosen from a four-variant grid that varied capacity, epochs, and language scope. Bilingual training kept the Spanish voice intact, five epochs overfit (verbatim recall), and a smaller attention-only adapter underfit.

Usage

PEFT (transformers):

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "Qwen/Qwen3.5-2B"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base)
model = PeftModel.from_pretrained(model, "pebeto/amigo-lora")

llama.cpp (CPU): apply the bundled GGUF adapter on top of a Qwen3.5-2B GGUF.

llama-cli -m Qwen3.5-2B-Q4_K_M.gguf --lora amigo-lora-Q8_0.gguf \
  -p "Eres un companero amable y paciente..."

Give it the system persona it trained on (warm companion, one to three short sentences, no lists). The voice depends on that prompt being present.

Files

File	Purpose
`adapter_model.safetensors`, `adapter_config.json`	the PEFT adapter
`amigo-lora-Q8_0.gguf`	the same adapter for llama.cpp
`chat_template.jinja`, `tokenizer*`	the chat format

Limitations

Small model. Factual reliability is limited. Pair it with retrieval for anything current, and read its claims with care.
No built-in memory. It knows nothing about a person unless the prompt provides it.
Language focus. Spanish is the primary target, in a Peruvian register. English is plainer and lighter.

License

Apache-2.0, following the base model Qwen/Qwen3.5-2B.

Downloads last month: 53

GGUF

Model size

10.9M params

Architecture

qwen35

Hardware compatibility

8-bit

Model tree for pebeto/amigo-lora

Base model

Qwen/Qwen3.5-2B-Base

Finetuned

Qwen/Qwen3.5-2B

Adapter

(95)

this model

Dataset used to train pebeto/amigo-lora

Space using pebeto/amigo-lora 1

Collection including pebeto/amigo-lora

amigo

Collection

2 items • Updated 25 days ago