Instructions to use acul3/qwen3.5-2b-id-meeting-summarizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps Settings
- Unsloth Studio
How to use acul3/qwen3.5-2b-id-meeting-summarizer with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for acul3/qwen3.5-2b-id-meeting-summarizer to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for acul3/qwen3.5-2b-id-meeting-summarizer to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for acul3/qwen3.5-2b-id-meeting-summarizer to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="acul3/qwen3.5-2b-id-meeting-summarizer", max_seq_length=2048, )
- Qwen3.5-2B Indonesian Meeting Summarizer
Qwen3.5-2B Indonesian Meeting Summarizer
A fine-tune of unsloth/Qwen3.5-2B for structured meeting summarization in Bahasa Indonesia, distilled from gemini-2.5-flash outputs.
Tasks
The model is trained for three meeting-summarization tasks (selected via the user prompt template):
| Task | Description | Output shape |
|---|---|---|
rich_summary |
Structured markdown summary with required Overview / Conclusion / Action Items topics | ~500–1800 words, ## > ### hierarchy |
paragraph |
Single narrative paragraph summary | ~150–500 words, plain prose |
title_generator |
Short meeting title (≤7 words, ≤70 chars) | one-line title |
Training data
- Source: ~14,000 (transcript, output) pairs distilled from
gemini-2.5-flash - Mix: 50% rich_summary (3× oversampled), 28% title_generator, 22% paragraph
- Filtering: dropped teacher outputs with format violations (S1/S2 leakage, >1800 words, missing required topics, transcript mentions)
- Language: 100% Bahasa Indonesia
- Context lengths: median 4K tokens, p95 17K, p99 20K
Training setup
| Param | Value |
|---|---|
| Base model | unsloth/Qwen3.5-2B |
| Method | LoRA via Unsloth |
| LoRA rank / alpha | 32 / 64 |
| Target modules | q,k,v,o,gate,up,down |
| Max sequence length | 24,576 |
| Effective batch size | 16 (1 × 16 grad accum) |
| Learning rate | 1e-4 (cosine, 5% warmup) |
| Epochs | 1 (with 3× oversampling on rich_summary) |
| Optimizer | AdamW 8-bit |
| Precision | bf16 LoRA (NOT 4-bit base — Unsloth docs warn 4-bit hurts Qwen3.5 quality) |
| Total steps | 870 |
| Hardware | 1× A100 80GB |
| Total training time | ~25 hours |
| Final eval losses | rich_summary 0.98, paragraph 1.29, title 0.61 |
Evaluation — format pass rate on hard eval set (100 samples)
Hard eval set is stratified across length, multi-speaker, and code-switching axes.
| Task | n | Format Pass | Note |
|---|---|---|---|
title_generator |
30 | 100% | ship-ready |
paragraph |
36 | 100% | ship-ready |
rich_summary |
34 | 50% | mostly over_1800_words (44%) and missing_conclusion (24%) — needs inference-time guards |
Bugs fixed vs the teacher
The training pipeline filtered out teacher outputs that violated explicit prompt rules. The student model now achieves 0% on all of these (vs 17%/8%/1% in the teacher data):
has_S1S2: ✅ 0% (teacher leaked speaker tokens 17% of the time)has_preamble: ✅ 0% (no more "Berikut adalah...")mentions_transcript: ✅ 0%missing_overview: ✅ 0%
Usage
Direct inference (transformers + peft)
from transformers import AutoModelForCausalLM, AutoProcessor
import torch
model = AutoModelForCausalLM.from_pretrained(
"acul3/qwen3.5-2b-id-meeting-summarizer",
torch_dtype=torch.bfloat16, device_map="cuda",
)
processor = AutoProcessor.from_pretrained("acul3/qwen3.5-2b-id-meeting-summarizer")
tokenizer = processor.tokenizer
# Use the same rich_summary prompt template the model was trained on
prompt = """You are a helpful assistant expert in writing.
You answer only with the result without explanation or pretext.
Please follow the instructions word by word obediently.
Audio Transcript:
<transcript>
{transcript}
</transcript>
Speakers:
<speakers>
{speakers}
</speakers>
Analyze the audio transcript above. Make a high-level summary in markdown headings bullet points format written in Bahasa Indonesia language.
[... rest of the rich_summary prompt — see model's training prompt for full text ...]
"""
inputs = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt.format(transcript=my_transcript, speakers="")}],
tokenize=True, add_generation_prompt=True, return_tensors="pt",
enable_thinking=False,
).to(model.device)
out = model.generate(inputs, max_new_tokens=4096, do_sample=False, use_cache=True)
print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))
Critical: pass enable_thinking=False
Qwen3.5's chat template emits <think></think> reasoning blocks by default. For this distilled model (trained on non-thinking outputs), always pass enable_thinking=False to apply_chat_template.
Prompt templates (verbatim from training data)
Each prompt is a single user message that combines the fixed system preamble + task-specific instructions. The preamble is identical across all 3 tasks (modulo one trailing-space difference in title_generator):
You are a helpful assistant expert in writing.
You answer only with the result without explanation or pretext.
Please follow the instructions word by word obediently.
Use these prompts verbatim to match the training distribution. Do not separate the preamble into a system role; the training data always joined them into one user turn.
rich_summary prompt
You are a helpful assistant expert in writing.
You answer only with the result without explanation or pretext.
Please follow the instructions word by word obediently.
Audio Transcript:
<transcript>
{TRANSCRIPT_TEXT}
</transcript>
Speakers:
<speakers>
{SPEAKERS_OR_EMPTY}
</speakers>
Analyze the audio transcript above. Make a high-level summary in markdown headings bullet points format written in Bahasa Indonesia language.
The content of the summary should follow this guideline:
- Never add any preamble sentence to the result. Only return the markdown format summary.
- NEVER write any words like "audio transcript", "transcription", or anything that implicates the audio transcription and the prompt.
- You MUST include "Overview" as a topic (translated to Bahasa Indonesia). Always put the "Overview" topic as the first topic.
- You MUST include "Conclusion" as a topic (translated to Bahasa Indonesia). Always put the "Conclusion" topic as the last topic.
- You SHOULD include "Action Items" as a topic if there are some actionables discussed in the transcript. Always put the "Action Items" after the last topic ("Conclusion"). If there are no actionables, don't create this topic.
- Add more topics outside the required topics. More topics are better, but make sure it's relevant and meaningful information.
- Exclude any jokes or irrelevant banter.
- Write the summary using this language: Bahasa Indonesia.
- Make sure your generated summary makes sense to read.
- Mention the speaker's name in the summary if it exists and when necessary. If <speakers> exists, use speaker name instead of using S1, S2, etc (never mention S1, S2, etc in the result)
- Your generated summary should be LESS than 1800 words. There are no minimum words, but make sure the result covers the whole transcription.
The result format should match this guideline:
- A Topic MUST have more than one sub-topic
- A Sub-topic MUST have more than one important discussion point
- Every important discussion MUST be followed by MULTIPLE supporting details.
- Supporting detail MUST be written in a proper sentence structure that contains clauses. Having more than 1 clause to make it a compound or Complex sentence is preferred. So users can have a deep understanding.
- Make sure important discussions/supporting details capture all crucial details such as number, date, etc.
- Always use markdown formatting like **bold** or *italic* to emphasize necessary words/phrases/numbers (pricing, date, date range, nominal, etc) so users can read and understand easily.
- Use **bold** formatting for numbers (pricing, date, nominal, etc) to highlight their importance
- Use **bold** formatting for important information that users need to be aware of
- Use *italic* formatting for technical/professional/foreign terms
- Examples:
- The event will take place on **June 15, 2024**.
- Our new product, `SuperWidget 3000`, costs **$199.99**.
- This approach can lead to *significant cost savings*.
- it would take us **6-12 months**
- You MUST always use bold/italics for important discussions so users can read them easily.
Use this markdown format:
## Overview
### sub-topic
* important discussion
* supporting detail
* supporting detail
* important discussion
* important discussion
* important discussion
* supporting detail
* supporting detail
## topic
### sub-topic
* important discussion
* important discussion
* supporting detail
* supporting detail
## topic
### sub-topic
* important discussion
* important discussion
* supporting detail
* supporting detail
(...add more topics)
## Conclusion
### sub-topic
* important discussion
* important discussion
* supporting detail
* supporting detail
## Action Items (only if exist)
1. action item 1
2. action item 2
paragraph prompt
You are a helpful assistant expert in writing.
You answer only with the result without explanation or pretext.
Please follow the instructions word by word obediently.
<transcript>
Audio Transcript:
{TRANSCRIPT_TEXT}
</transcript>
Analyze and generate a summary based on the audio transcript above written in 1 paragraph.
Instruction:
- Do not include any preamble or additional text such as 'Summary:' or 'Ringkasan:'
- NEVER write any words like "audio transcript", "transcription", or anything that implicates the audio transcription and the prompt.
- The summary should be written in Bahasa Indonesia.
- Write it as plain text in coherent paragraphs. Do not write bullet points.
- Separate paragraphs with a blank line for readability.
- You must fit your summary in 1 paragraph.
- Use professional language with proper grammar and complete sentences.
- Exclude any jokes or irrelevant banter.
- Only include information from the transcript itself, without adding anything extra.
Remember:
- Never add any preamble sentence to the result. Only return the summary.
- Make sure it makes sense to read.
- Mention the speaker's name in the summary if it exists and when necessary. If <speakers> exists, use speaker name instead of using S1, S2, etc (never mention S1, S2, etc in the result)
⚠️ Quirk to preserve: the training prompt puts <transcript> BEFORE Audio Transcript: (reverse order vs rich_summary). The model learned this specific oddity — do not "fix" it.
title_generator prompt
⚠️ Important: title_generator takes a summary as input, not a transcript. The pipeline is: transcript → rich_summary/paragraph → title_generator.
You are a helpful assistant expert in writing.
You answer only with the result without explanation or pretext.
Please follow the instructions word by word obediently.
Summary:
<summary>
{SUMMARY_TEXT}
</summary>
Analyze the summary from the meeting above. Generate a meeting title in Bahasa Indonesia based on the summary above. The title generated must be simple and represent what the meeting is about.
Follow the following instructions:
<instructions>
- Do not include any preamble or additional text such as 'Title:' or 'Judul Pertemuan:'
- You will only respond in text without explanation.
- Use professional language with proper grammar and complete sentences
- The title generated must never be more than 70 characters.
- The title generated must never be more than 7 words.
- Never refuse or ask for clarification, and instead always make a best-effort attempt
- Use professional language with proper grammar
- The title must be written in Bahasa Indonesia language
- Make the title specific and distinctive - avoid generic phrases that could apply to any meeting
- Include key topics, names, or specific focus areas mentioned in the summary when possible
</instructions>
Note: this prompt's preamble has no trailing space after pretext. (vs the other two which have one). Subtle but it's in the training data.
Q: Can I hit 95% format pass without inference-time guards?
Honest answer: no, not by prompt-matching alone. Matching the training prompt exactly (as documented above) won't push rich_summary past ~55-60%. The 50% bottleneck isn't prompt-mismatch — the model genuinely generates 2000-2200 words on long meetings and the 1800-word cap is a mechanical limit. The exact prompt above was used for every one of the 50% failures observed during eval.
Realistic paths to ≥95% format pass on rich_summary:
| Path | Cost | Expected pass rate |
|---|---|---|
| Inference-time guards (length trim + Kesimpulan regen) | minutes of engineering | 85–95% |
| Second epoch of training | ~25h GPU | 65–75% |
| Switch base to Qwen3.5-4B | ~50h GPU | 75–85% |
| Stack: 4B base + guards | ~50h GPU + minutes | 90–98% |
Inference-time guards are not a workaround — they are standard practice for production deployments of any size model with strict format contracts.
Inference-time guards (recommended for production)
To bring rich_summary format pass rate from ~50% to 85-95% in production:
- Length trim: if output >1800 words, truncate to last complete
## Topicsection that fits in 1750 words. - Missing-Kesimpulan regenerator: if no
## Kesimpulansection, regenerate just the conclusion with a focused prompt (~200 tokens).
On-device deployment
This model can be quantized to MLX 4-bit for iPhone deployment:
# On Mac with Apple Silicon
pip install mlx-optiq # OR pip install "mlx-vlm @ git+...@pc/fix-qwen35-predicate"
mlx-optiq quantize \
--hf-path acul3/qwen3.5-2b-id-meeting-summarizer \
--output qwen35-2b-id-mlx-q4 \
--q-bits 4 --q-group-size 64
Expected on-device size: ~1.58 GB (Q4) or ~2.5 GB (Q8). Expected on-device speed (iPhone 15 Pro+): ~30–60 tok/sec at <4K context.
Limitations
- Length discipline on rich_summary: ~50% of outputs exceed the 1800-word soft cap. Use inference-time guards for production.
- Not multimodal at inference: although the base is a VL model, no vision data was used during training. Vision tower weights are present but unused.
- Indonesian only: not evaluated on other languages; code-switching with English is handled but not Mandarin/Javanese/etc.
- Long context recall: middle-of-transcript details may be missed on transcripts >15K tokens.
- Distilled from
gemini-2.5-flash: inherits teacher's content quirks; not strictly better than the teacher.
Citation / acknowledgements
- Base: Qwen team — Qwen3.5
- Repackaging: Unsloth —
unsloth/Qwen3.5-2B - Fine-tuning framework: Unsloth
- Teacher:
gemini-2.5-flash
License
apache-2.0 (inherited from base)
- Downloads last month
- 5