---
base_model: Qwen/Qwen3.5-2B
library_name: peft
pipeline_tag: text-generation
license: apache-2.0
language:
- es
- en
tags:
- base_model:adapter:Qwen/Qwen3.5-2B
- lora
- sft
- trl
- transformers
- companion
- voice-assistant
datasets:
- pebeto/amigo-companion-voice
---

# amigo-lora

A LoRA adapter that gives **Qwen3.5-2B** the voice of a warm, patient companion for an older adult. It speaks in short, kind sentences, takes interest in the person's day, and keeps them company. Built for **amigo**, a local and private voice companion, during the Hugging Face Build Small Hackathon.

## What it teaches

This adapter shapes **how the model talks**: warm, brief, in a Peruvian Spanish register or plain English. It carries no facts about the person. Their name, family, health, and routine live in the running app's profile and memory, never in the weights, so that information stays private and stays current.

## Why a LoRA on a 2B

The point is to hold the companion voice on a small model that runs fast on a laptop CPU, the kind of hardware a phone has. The difference is easy to hear. Ask the plain 2B *"me siento un poco solo hoy"* and it answers cheerfully but misses the feeling. With this adapter it acknowledges the loneliness and offers company before asking how the person is.

## Training

- **Base:** `Qwen/Qwen3.5-2B`
- **Method:** QLoRA (4-bit), rank 16, alpha 32, dropout 0.05
- **Target modules:** `q_proj k_proj v_proj o_proj gate_proj up_proj down_proj`
- **Schedule:** 3 epochs, learning rate 2e-4
- **Data:** 366 curated dialogue pairs (186 Spanish, 180 English), each wrapped with the app's exact system persona, plus a held-out set of 45. The pairs carry the voice, never personal facts.

It was chosen from a four-variant grid that varied capacity, epochs, and language scope. Bilingual training kept the Spanish voice intact, five epochs overfit (verbatim recall), and a smaller attention-only adapter underfit.

## Usage

**PEFT (transformers):**

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "Qwen/Qwen3.5-2B"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base)
model = PeftModel.from_pretrained(model, "pebeto/amigo-lora")
```

**llama.cpp (CPU):** apply the bundled GGUF adapter on top of a Qwen3.5-2B GGUF.

```bash
llama-cli -m Qwen3.5-2B-Q4_K_M.gguf --lora amigo-lora-Q8_0.gguf \
  -p "Eres un companero amable y paciente..."
```

Give it the system persona it trained on (warm companion, one to three short sentences, no lists). The voice depends on that prompt being present.

## Files

| File | Purpose |
|------|---------|
| `adapter_model.safetensors`, `adapter_config.json` | the PEFT adapter |
| `amigo-lora-Q8_0.gguf` | the same adapter for llama.cpp |
| `chat_template.jinja`, `tokenizer*` | the chat format |

## Limitations

- **Small model.** Factual reliability is limited. Pair it with retrieval for anything current, and read its claims with care.
- **No built-in memory.** It knows nothing about a person unless the prompt provides it.
- **Language focus.** Spanish is the primary target, in a Peruvian register. English is plainer and lighter.

## License

Apache-2.0, following the base model `Qwen/Qwen3.5-2B`.