Reflections โ€” Qwen3-1.7B Actionability Classifier (LoRA Adapter)

LoRA adapter for Qwen/Qwen3-1.7B that classifies whether the latest sentence in a smart-glasses conversation transcript warrants proactive assistant intervention (label 1) or should be ignored (label 0).

This adapter is the on-device gate inside the Reflections open-source smart-glasses assistant. It runs in ~200 ms on Apple Silicon (MPS) and lets the pipeline reject low-value turns without paying the latency or cost of a frontier LLM call.

Use case

Reflections streams audio + video from MentraOS smart glasses, transcribes with Soniox, attributes speakers with an active-speaker-detection model, then asks this classifier whether the latest finalized sentence is actionable. When P(actionable) >= GLASSES_GATE_THRESHOLD (default 0.25), the pipeline escalates to Claude Haiku with tools (web search, Google Maps, Google Calendar). Otherwise the turn is dropped silently.

The classifier is not a general chat model. It is trained to output a single label (0 or 1) given five structured context inputs:

  1. Transcript โ€” recent speaker-attributed turns, with the target sentence marked.
  2. Memory โ€” short summary of prior sessions (read from memory.md).
  3. Available tools โ€” names of tools the agent could call this turn (e.g. send_message, create_calendar_event).
  4. Location โ€” a coarse description + lat/lon (used by maps-style tools).
  5. Entity list โ€” known people in the wearer's life, with facts (allergies, preferences, etc.).

Prompts are rendered into Qwen's ChatML format and the score is softmax(logits)[1] over the two-token vocabulary {0, 1} at the <label> position.

How to use

The adapter is intended to be loaded onto the Qwen3-1.7B base model with PEFT:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

BASE = "Qwen/Qwen3-1.7B"
ADAPTER = "rushilsaraf/qwen3-actionable-v2-adapter"

tokenizer = AutoTokenizer.from_pretrained(BASE)
base = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.float16)
model = PeftModel.from_pretrained(base, ADAPTER)
model.eval()

# In the live Reflections pipeline, prompts are rendered by
# packages/proactivity/render.py and scored as:
#   logits = model(input_ids).logits[0, -1]
#   p_actionable = torch.softmax(logits[[tok0, tok1]], dim=0)[1].item()

For end-to-end use, install Reflections and run python -m apps.viewer โ€” the LoRA loads automatically from this Hub repo (override with REFLECTIONS_LORA_MODEL_ID).

Training

Training base unsloth/qwen3-1.7b-unsloth-bnb-4bit (Unsloth 4-bit)
Inference base Qwen/Qwen3-1.7B (float16)
Framework Unsloth + TRL SFT
Hardware Single T4 (free Colab tier)
Wall-clock ~50 minutes
Examples ~400 (synthetic, labeled)

Adapter config

Parameter Value
PEFT type LoRA
Rank (r) 8
LoRA alpha 16
LoRA dropout 0.05
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Task type CAUSAL_LM
Trainable parameters ~8.7M (0.51% of base)
Adapter size on disk ~33 MB
PEFT version 0.18.x (compatible with 0.19.x at inference)

Gate thresholds (Reflections-side)

Two distinct knobs, do not conflate:

  • GLASSES_GATE_THRESHOLD (default 0.25) โ€” the live gate used by the agent worker. Sentences scoring below this are silently dropped.
  • REASONING_TRIGGER (0.45) โ€” only used by the offline smoke-test path (scripts/smoke_full_transcript.py, scripts/smoke_server.py) to decide whether to also generate a reasoning trace.

The live path never reads REASONING_TRIGGER.

Performance (held-out synthetic test set)

Metric Value
Test accuracy 88.6%
Train accuracy 94.3%
Trainโ€“test gap +5.7%
Raw Qwen3-1.7B (no LoRA) 51.1%
Lift from LoRA +37.5 points
Mean inference latency (Apple Silicon MPS) ~196 ms
p95 latency ~257 ms
Throughput ~5 classifications / sec

The benchmark is a synthetic dataset matched to the training distribution. Real-world ASR transcripts are not yet part of the evaluation set โ€” see Limitations below.

Limitations

  • English only.
  • Synthetic training data ceiling. The 400-example training set was generated to cover entity / memory / tool / location signals. Real-world ASR disfluencies are not represented.
  • Weak categories. Per-category breakdowns show tool_dependent at ~40% and location_dependent at ~60% accuracy. Adding ~25 paired-negative examples per category should fix the imbalance in the next training cycle.
  • Not a general classifier. The model expects the exact 5-input prompt structure produced by packages/proactivity/render.py in Reflections. Out-of-distribution prompts will produce unreliable scores.
  • Not safety-critical. Do not use for medical, legal, or moderation decisions. This is a latency-saving gate in front of a stronger downstream LLM, not a standalone judgment.

License

Combined use of base + adapter remains subject to the Apache 2.0 license of the Qwen3 weights.

Citation

If you use this adapter, please reference the Reflections project and the Qwen3 base model:

@misc{qwen3-2025,
  title  = {Qwen3 Technical Report},
  author = {Qwen Team},
  year   = {2025},
  url    = {https://huggingface.co/Qwen/Qwen3-1.7B}
}
Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for rushilsaraf/qwen3-actionable-v2-adapter

Finetuned
Qwen/Qwen3-1.7B
Adapter
(517)
this model