Reflections — Qwen3-1.7B Actionability Classifier (LoRA Adapter)

LoRA adapter for Qwen/Qwen3-1.7B that classifies whether the latest sentence in a smart-glasses conversation transcript warrants proactive assistant intervention (label 1) or should be ignored (label 0).

This adapter is the on-device gate inside the Reflections open-source smart-glasses assistant. It runs in ~200 ms on Apple Silicon (MPS) and lets the pipeline reject low-value turns without paying the latency or cost of a frontier LLM call.

Use case

Reflections streams audio + video from MentraOS smart glasses, transcribes with Soniox, attributes speakers with an active-speaker-detection model, then asks this classifier whether the latest finalized sentence is actionable. When P(actionable) >= GLASSES_GATE_THRESHOLD (default 0.25), the pipeline escalates to Claude Haiku with tools (web search, Google Maps, Google Calendar). Otherwise the turn is dropped silently.

The classifier is not a general chat model. It is trained to output a single label (0 or 1) given five structured context inputs:

Transcript — recent speaker-attributed turns, with the target sentence marked.
Memory — short summary of prior sessions (read from memory.md).
Available tools — names of tools the agent could call this turn (e.g. send_message, create_calendar_event).
Location — a coarse description + lat/lon (used by maps-style tools).
Entity list — known people in the wearer's life, with facts (allergies, preferences, etc.).

Prompts are rendered into Qwen's ChatML format and the score is softmax(logits)[1] over the two-token vocabulary {0, 1} at the <label> position.

How to use

The adapter is intended to be loaded onto the Qwen3-1.7B base model with PEFT:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

BASE = "Qwen/Qwen3-1.7B"
ADAPTER = "rushilsaraf/qwen3-actionable-v2-adapter"

tokenizer = AutoTokenizer.from_pretrained(BASE)
base = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.float16)
model = PeftModel.from_pretrained(base, ADAPTER)
model.eval()

# In the live Reflections pipeline, prompts are rendered by
# packages/proactivity/render.py and scored as:
#   logits = model(input_ids).logits[0, -1]
#   p_actionable = torch.softmax(logits[[tok0, tok1]], dim=0)[1].item()

For end-to-end use, install Reflections and run python -m apps.viewer — the LoRA loads automatically from this Hub repo (override with REFLECTIONS_LORA_MODEL_ID).

Training


Training base	`unsloth/qwen3-1.7b-unsloth-bnb-4bit` (Unsloth 4-bit)
Inference base	`Qwen/Qwen3-1.7B` (float16)
Framework	Unsloth + TRL SFT
Hardware	Single T4 (free Colab tier)
Wall-clock	~50 minutes
Examples	~400 (synthetic, labeled)

Adapter config

Parameter	Value
PEFT type	LoRA
Rank (`r`)	8
LoRA alpha	16
LoRA dropout	0.05
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
Task type	`CAUSAL_LM`
Trainable parameters	~8.7M (0.51% of base)
Adapter size on disk	~33 MB
PEFT version	0.18.x (compatible with 0.19.x at inference)

Gate thresholds (Reflections-side)

Two distinct knobs, do not conflate:

GLASSES_GATE_THRESHOLD (default 0.25) — the live gate used by the agent worker. Sentences scoring below this are silently dropped.
REASONING_TRIGGER (0.45) — only used by the offline smoke-test path (scripts/smoke_full_transcript.py, scripts/smoke_server.py) to decide whether to also generate a reasoning trace.

The live path never reads REASONING_TRIGGER.

Performance (held-out synthetic test set)

Metric	Value
Test accuracy	88.6%
Train accuracy	94.3%
Train–test gap	+5.7%
Raw Qwen3-1.7B (no LoRA)	51.1%
Lift from LoRA	+37.5 points
Mean inference latency (Apple Silicon MPS)	~196 ms
p95 latency	~257 ms
Throughput	~5 classifications / sec

The benchmark is a synthetic dataset matched to the training distribution. Real-world ASR transcripts are not yet part of the evaluation set — see Limitations below.

Limitations

English only.
Synthetic training data ceiling. The 400-example training set was generated to cover entity / memory / tool / location signals. Real-world ASR disfluencies are not represented.
Weak categories. Per-category breakdowns show tool_dependent at ~40% and location_dependent at ~60% accuracy. Adding ~25 paired-negative examples per category should fix the imbalance in the next training cycle.
Not a general classifier. The model expects the exact 5-input prompt structure produced by packages/proactivity/render.py in Reflections. Out-of-distribution prompts will produce unreliable scores.
Not safety-critical. Do not use for medical, legal, or moderation decisions. This is a latency-saving gate in front of a stronger downstream LLM, not a standalone judgment.

License

Base model (Qwen/Qwen3-1.7B): Apache 2.0 — see the Qwen3 license.
This LoRA adapter: MIT, matching the Reflections repository.

Combined use of base + adapter remains subject to the Apache 2.0 license of the Qwen3 weights.

Citation

If you use this adapter, please reference the Reflections project and the Qwen3 base model:

@misc{qwen3-2025,
  title  = {Qwen3 Technical Report},
  author = {Qwen Team},
  year   = {2025},
  url    = {https://huggingface.co/Qwen/Qwen3-1.7B}
}

Downloads last month: 27

Model tree for rushilsaraf/qwen3-actionable-v2-adapter

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Adapter

(517)

this model