Instructions to use pebeto/amigo-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use pebeto/amigo-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-2B") model = PeftModel.from_pretrained(base_model, "pebeto/amigo-lora") - Transformers
How to use pebeto/amigo-lora with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="pebeto/amigo-lora") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("pebeto/amigo-lora", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use pebeto/amigo-lora with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "pebeto/amigo-lora" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pebeto/amigo-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/pebeto/amigo-lora
- SGLang
How to use pebeto/amigo-lora with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "pebeto/amigo-lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pebeto/amigo-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "pebeto/amigo-lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pebeto/amigo-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use pebeto/amigo-lora with Docker Model Runner:
docker model run hf.co/pebeto/amigo-lora
amigo-lora
A LoRA adapter that gives Qwen3.5-2B the voice of a warm, patient companion for an older adult. It speaks in short, kind sentences, takes interest in the person's day, and keeps them company. Built for amigo, a local and private voice companion, during the Hugging Face Build Small Hackathon.
What it teaches
This adapter shapes how the model talks: warm, brief, in a Peruvian Spanish register or plain English. It carries no facts about the person. Their name, family, health, and routine live in the running app's profile and memory, never in the weights, so that information stays private and stays current.
Why a LoRA on a 2B
The point is to hold the companion voice on a small model that runs fast on a laptop CPU, the kind of hardware a phone has. The difference is easy to hear. Ask the plain 2B "me siento un poco solo hoy" and it answers cheerfully but misses the feeling. With this adapter it acknowledges the loneliness and offers company before asking how the person is.
Training
- Base:
Qwen/Qwen3.5-2B - Method: QLoRA (4-bit), rank 16, alpha 32, dropout 0.05
- Target modules:
q_proj k_proj v_proj o_proj gate_proj up_proj down_proj - Schedule: 3 epochs, learning rate 2e-4
- Data: 366 curated dialogue pairs (186 Spanish, 180 English), each wrapped with the app's exact system persona, plus a held-out set of 45. The pairs carry the voice, never personal facts.
It was chosen from a four-variant grid that varied capacity, epochs, and language scope. Bilingual training kept the Spanish voice intact, five epochs overfit (verbatim recall), and a smaller attention-only adapter underfit.
Usage
PEFT (transformers):
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = "Qwen/Qwen3.5-2B"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base)
model = PeftModel.from_pretrained(model, "pebeto/amigo-lora")
llama.cpp (CPU): apply the bundled GGUF adapter on top of a Qwen3.5-2B GGUF.
llama-cli -m Qwen3.5-2B-Q4_K_M.gguf --lora amigo-lora-Q8_0.gguf \
-p "Eres un companero amable y paciente..."
Give it the system persona it trained on (warm companion, one to three short sentences, no lists). The voice depends on that prompt being present.
Files
| File | Purpose |
|---|---|
adapter_model.safetensors, adapter_config.json |
the PEFT adapter |
amigo-lora-Q8_0.gguf |
the same adapter for llama.cpp |
chat_template.jinja, tokenizer* |
the chat format |
Limitations
- Small model. Factual reliability is limited. Pair it with retrieval for anything current, and read its claims with care.
- No built-in memory. It knows nothing about a person unless the prompt provides it.
- Language focus. Spanish is the primary target, in a Peruvian register. English is plainer and lighter.
License
Apache-2.0, following the base model Qwen/Qwen3.5-2B.
- Downloads last month
- 53
8-bit