Image-Text-to-Text
Transformers.js
ONNX
Chinese
qwen3_5
webgpu
chinese
political-analysis
conversational
Instructions to use bobber/routangseng-0.8b-hottake-onnx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers.js
How to use bobber/routangseng-0.8b-hottake-onnx with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('image-text-to-text', 'bobber/routangseng-0.8b-hottake-onnx');
肉糖生 0.8B Hot-Take — ONNX Q8
ONNX Q8 quantized model for WebGPU deployment via transformers.js.
Model Info
- Base:
huihui-ai/Huihui-Qwen3.5-0.8B-abliterated - Training: Phase 11 distillation from 4B Phase 10 Think-SFT (949 condensed examples)
- Eval: Heuristic score 4.60/5
- Format: ONNX Q8 (uint8 MatMul quantization)
- Total size: ~1.1 GB
CDN / GitHub Pages Mirror
For faster loading and CORS support, chunked model files are hosted on GitHub Pages:
🔗 GitHub repo: bobbercheng/routangseng-models
📦 CDN URL: https://bobbercheng.github.io/routangseng-models/
Demo
- WebGPU Space: bobber/routangseng-chat
- GPU Space: bobber/routangseng-chat-gpu
Files
| Component | Size |
|---|---|
decoder_model_merged_quantized.onnx + .onnx_data |
756 MB |
embed_tokens_quantized.onnx + .onnx_data |
254 MB |
vision_encoder_quantized.onnx + .onnx_data |
101 MB |
Production Note
The 0.8B model may emit dangling </think> tags at the start of output. Strip these at inference time.
Related
- Torch model: bobber/routangseng-0.8b-hottake
- 4B recommended: bobber/routangseng-phase10-think-sft
- Project docs: bobber/routangseng-qwen35-4b-project
- Downloads last month
- 2
Model tree for bobber/routangseng-0.8b-hottake-onnx
Base model
Qwen/Qwen3.5-0.8B-Base Finetuned
Qwen/Qwen3.5-0.8B