You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Qwen3.5-2B Indonesian Meeting Summarizer

A fine-tune of unsloth/Qwen3.5-2B for structured meeting summarization in Bahasa Indonesia, distilled from gemini-2.5-flash outputs.

Tasks

The model is trained for three meeting-summarization tasks (selected via the user prompt template):

Task	Description	Output shape
`rich_summary`	Structured markdown summary with required Overview / Conclusion / Action Items topics	~500–1800 words, `## > ###` hierarchy
`paragraph`	Single narrative paragraph summary	~150–500 words, plain prose
`title_generator`	Short meeting title (≤7 words, ≤70 chars)	one-line title

Training data

Source: ~14,000 (transcript, output) pairs distilled from gemini-2.5-flash
Mix: 50% rich_summary (3× oversampled), 28% title_generator, 22% paragraph
Filtering: dropped teacher outputs with format violations (S1/S2 leakage, >1800 words, missing required topics, transcript mentions)
Language: 100% Bahasa Indonesia
Context lengths: median 4K tokens, p95 17K, p99 20K

Training setup

Param	Value
Base model	`unsloth/Qwen3.5-2B`
Method	LoRA via Unsloth
LoRA rank / alpha	32 / 64
Target modules	q,k,v,o,gate,up,down
Max sequence length	24,576
Effective batch size	16 (1 × 16 grad accum)
Learning rate	1e-4 (cosine, 5% warmup)
Epochs	1 (with 3× oversampling on rich_summary)
Optimizer	AdamW 8-bit
Precision	bf16 LoRA (NOT 4-bit base — Unsloth docs warn 4-bit hurts Qwen3.5 quality)
Total steps	870
Hardware	1× A100 80GB
Total training time	~25 hours
Final eval losses	rich_summary 0.98, paragraph 1.29, title 0.61

Evaluation — format pass rate on hard eval set (100 samples)

Hard eval set is stratified across length, multi-speaker, and code-switching axes.

Task	n	Format Pass	Note
`title_generator`	30	100%	ship-ready
`paragraph`	36	100%	ship-ready
`rich_summary`	34	50%	mostly `over_1800_words` (44%) and `missing_conclusion` (24%) — needs inference-time guards

Bugs fixed vs the teacher

The training pipeline filtered out teacher outputs that violated explicit prompt rules. The student model now achieves 0% on all of these (vs 17%/8%/1% in the teacher data):

has_S1S2: ✅ 0% (teacher leaked speaker tokens 17% of the time)
has_preamble: ✅ 0% (no more "Berikut adalah...")
mentions_transcript: ✅ 0%
missing_overview: ✅ 0%

Usage

Direct inference (transformers + peft)

from transformers import AutoModelForCausalLM, AutoProcessor
import torch

model = AutoModelForCausalLM.from_pretrained(
    "acul3/qwen3.5-2b-id-meeting-summarizer",
    torch_dtype=torch.bfloat16, device_map="cuda",
)
processor = AutoProcessor.from_pretrained("acul3/qwen3.5-2b-id-meeting-summarizer")
tokenizer = processor.tokenizer

# Use the same rich_summary prompt template the model was trained on
prompt = """You are a helpful assistant expert in writing.
You answer only with the result without explanation or pretext.
Please follow the instructions word by word obediently.
Audio Transcript:
<transcript>
{transcript}
</transcript>

Speakers:
<speakers>
{speakers}
</speakers>

Analyze the audio transcript above. Make a high-level summary in markdown headings bullet points format written in Bahasa Indonesia language.
[... rest of the rich_summary prompt — see model's training prompt for full text ...]
"""

inputs = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt.format(transcript=my_transcript, speakers="")}],
    tokenize=True, add_generation_prompt=True, return_tensors="pt",
    enable_thinking=False,
).to(model.device)

out = model.generate(inputs, max_new_tokens=4096, do_sample=False, use_cache=True)
print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))

Critical: pass `enable_thinking=False`

Qwen3.5's chat template emits <think></think> reasoning blocks by default. For this distilled model (trained on non-thinking outputs), always pass enable_thinking=False to apply_chat_template.

Prompt templates (verbatim from training data)

Each prompt is a single user message that combines the fixed system preamble + task-specific instructions. The preamble is identical across all 3 tasks (modulo one trailing-space difference in title_generator):

You are a helpful assistant expert in writing. 
You answer only with the result without explanation or pretext. 
Please follow the instructions word by word obediently.

Use these prompts verbatim to match the training distribution. Do not separate the preamble into a system role; the training data always joined them into one user turn.

`rich_summary` prompt

You are a helpful assistant expert in writing. 
You answer only with the result without explanation or pretext. 
Please follow the instructions word by word obediently.
Audio Transcript:
<transcript>
{TRANSCRIPT_TEXT}
</transcript>

Speakers:
<speakers>
{SPEAKERS_OR_EMPTY}
</speakers>

Analyze the audio transcript above. Make a high-level summary in markdown headings bullet points format written in Bahasa Indonesia language.

The content of the summary should follow this guideline:
- Never add any preamble sentence to the result. Only return the markdown format summary.
- NEVER write any words like "audio transcript", "transcription",  or anything that implicates the audio transcription and the prompt.
- You MUST include "Overview" as a topic (translated to Bahasa Indonesia). Always put the "Overview" topic as the first topic.
- You MUST include "Conclusion" as a topic (translated to Bahasa Indonesia). Always put the "Conclusion" topic as the last topic.
- You SHOULD include "Action Items" as a topic if there are some actionables discussed in the transcript. Always put the "Action Items" after the last topic ("Conclusion"). If there are no actionables, don't create this topic.
- Add more topics outside the required topics. More topics are better, but make sure it's relevant and meaningful information.
- Exclude any jokes or irrelevant banter.
- Write the summary using this language: Bahasa Indonesia.
- Make sure your generated summary makes sense to read.
- Mention the speaker's name in the summary if it exists and when necessary. If <speakers> exists, use speaker name instead of using S1, S2, etc (never mention S1, S2, etc in the result)
- Your generated summary should be LESS than 1800 words. There are no minimum words, but make sure the result covers the whole transcription.

The result format should match this guideline:
- A Topic MUST have more than one sub-topic
- A Sub-topic MUST have more than one important discussion point
- Every important discussion MUST be followed by MULTIPLE supporting details.
- Supporting detail MUST be written in a proper sentence structure that contains clauses. Having more than 1 clause to make it a compound or Complex sentence is preferred. So users can have a deep understanding.
- Make sure important discussions/supporting details capture all crucial details such as number, date, etc.
- Always use markdown formatting like **bold** or *italic* to emphasize necessary words/phrases/numbers (pricing, date, date range, nominal, etc) so users can read and understand easily.
  - Use **bold** formatting for numbers (pricing, date, nominal, etc) to highlight their importance
  - Use **bold** formatting for important information that users need to be aware of
  - Use *italic* formatting for technical/professional/foreign terms
  - Examples:
    - The event will take place on **June 15, 2024**.
    - Our new product, `SuperWidget 3000`, costs **$199.99**.
    - This approach can lead to *significant cost savings*.
    - it would take us **6-12 months**
- You MUST always use bold/italics for important discussions so users can read them easily.

Use this markdown format:
## Overview
### sub-topic
* important discussion
  * supporting detail
  * supporting detail
* important discussion
* important discussion
* important discussion
  * supporting detail
  * supporting detail
## topic
### sub-topic
* important discussion
* important discussion
  * supporting detail
  * supporting detail
## topic
### sub-topic
* important discussion
* important discussion
  * supporting detail
  * supporting detail
(...add more topics)
## Conclusion
### sub-topic
* important discussion
* important discussion
  * supporting detail
  * supporting detail
## Action Items (only if exist)
1. action item 1
2. action item 2

`paragraph` prompt

You are a helpful assistant expert in writing. 
You answer only with the result without explanation or pretext. 
Please follow the instructions word by word obediently.
<transcript>
Audio Transcript:
{TRANSCRIPT_TEXT}
</transcript>

Analyze and generate a summary based on the audio transcript above written in 1 paragraph. 

Instruction:
- Do not include any preamble or additional text such as 'Summary:' or 'Ringkasan:'
- NEVER write any words like "audio transcript", "transcription",  or anything that implicates the audio transcription and the prompt.
- The summary should be written in Bahasa Indonesia.
- Write it as plain text in coherent paragraphs. Do not write bullet points.
- Separate paragraphs with a blank line for readability.
- You must fit your summary in 1 paragraph.
- Use professional language with proper grammar and complete sentences.
- Exclude any jokes or irrelevant banter.
- Only include information from the transcript itself, without adding anything extra.

Remember:
- Never add any preamble sentence to the result. Only return the summary.
- Make sure it makes sense to read.
- Mention the speaker's name in the summary if it exists and when necessary. If <speakers> exists, use speaker name instead of using S1, S2, etc (never mention S1, S2, etc in the result)

⚠️ Quirk to preserve: the training prompt puts <transcript> BEFORE Audio Transcript: (reverse order vs rich_summary). The model learned this specific oddity — do not "fix" it.

`title_generator` prompt

⚠️ Important: title_generator takes a summary as input, not a transcript. The pipeline is: transcript → rich_summary/paragraph → title_generator.

You are a helpful assistant expert in writing. 
You answer only with the result without explanation or pretext.
Please follow the instructions word by word obediently.
Summary:
<summary>
{SUMMARY_TEXT}
</summary>

Analyze the summary from the meeting above. Generate a meeting title in Bahasa Indonesia based on the summary above. The title generated must be simple and represent what the meeting is about.

Follow the following instructions:
<instructions>
- Do not include any preamble or additional text such as 'Title:' or 'Judul Pertemuan:'
- You will only respond in text without explanation.
- Use professional language with proper grammar and complete sentences
- The title generated must never be more than 70 characters.
- The title generated must never be more than 7 words.
- Never refuse or ask for clarification, and instead always make a best-effort attempt
- Use professional language with proper grammar
- The title must be written in Bahasa Indonesia language
- Make the title specific and distinctive - avoid generic phrases that could apply to any meeting
- Include key topics, names, or specific focus areas mentioned in the summary when possible
</instructions>

Note: this prompt's preamble has no trailing space after pretext. (vs the other two which have one). Subtle but it's in the training data.

Q: Can I hit 95% format pass without inference-time guards?

Honest answer: no, not by prompt-matching alone. Matching the training prompt exactly (as documented above) won't push rich_summary past ~55-60%. The 50% bottleneck isn't prompt-mismatch — the model genuinely generates 2000-2200 words on long meetings and the 1800-word cap is a mechanical limit. The exact prompt above was used for every one of the 50% failures observed during eval.

Realistic paths to ≥95% format pass on rich_summary:

Path	Cost	Expected pass rate
Inference-time guards (length trim + Kesimpulan regen)	minutes of engineering	85–95%
Second epoch of training	~25h GPU	65–75%
Switch base to Qwen3.5-4B	~50h GPU	75–85%
Stack: 4B base + guards	~50h GPU + minutes	90–98%

Inference-time guards are not a workaround — they are standard practice for production deployments of any size model with strict format contracts.

Inference-time guards (recommended for production)

To bring rich_summary format pass rate from ~50% to 85-95% in production:

Length trim: if output >1800 words, truncate to last complete ## Topic section that fits in 1750 words.
Missing-Kesimpulan regenerator: if no ## Kesimpulan section, regenerate just the conclusion with a focused prompt (~200 tokens).

On-device deployment

This model can be quantized to MLX 4-bit for iPhone deployment:

# On Mac with Apple Silicon
pip install mlx-optiq  # OR pip install "mlx-vlm @ git+...@pc/fix-qwen35-predicate"
mlx-optiq quantize \
  --hf-path acul3/qwen3.5-2b-id-meeting-summarizer \
  --output qwen35-2b-id-mlx-q4 \
  --q-bits 4 --q-group-size 64

Expected on-device size: ~1.58 GB (Q4) or ~2.5 GB (Q8). Expected on-device speed (iPhone 15 Pro+): ~30–60 tok/sec at <4K context.

Limitations

Length discipline on rich_summary: ~50% of outputs exceed the 1800-word soft cap. Use inference-time guards for production.
Not multimodal at inference: although the base is a VL model, no vision data was used during training. Vision tower weights are present but unused.
Indonesian only: not evaluated on other languages; code-switching with English is handled but not Mandarin/Javanese/etc.
Long context recall: middle-of-transcript details may be missed on transcripts >15K tokens.
Distilled from gemini-2.5-flash: inherits teacher's content quirks; not strictly better than the teacher.

Citation / acknowledgements

Base: Qwen team — Qwen3.5
Repackaging: Unsloth — unsloth/Qwen3.5-2B
Fine-tuning framework: Unsloth
Teacher: gemini-2.5-flash

License

apache-2.0 (inherited from base)

Downloads last month: 5

Safetensors

Model size

2B params

Tensor type

F32

BF16

Model tree for acul3/qwen3.5-2b-id-meeting-summarizer

Base model

Qwen/Qwen3.5-2B-Base

Finetuned

Qwen/Qwen3.5-2B

Finetuned

unsloth/Qwen3.5-2B

Adapter

(26)

this model

Quantizations

1 model