Instructions to use bingbangboom/Qwen352B-transcriber-new with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bingbangboom/Qwen352B-transcriber-new with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="bingbangboom/Qwen352B-transcriber-new") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("bingbangboom/Qwen352B-transcriber-new") model = AutoModelForMultimodalLM.from_pretrained("bingbangboom/Qwen352B-transcriber-new") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use bingbangboom/Qwen352B-transcriber-new with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="bingbangboom/Qwen352B-transcriber-new", filename="Qwen3.5-2B.F16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use bingbangboom/Qwen352B-transcriber-new with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf bingbangboom/Qwen352B-transcriber-new:Q4_K_M # Run inference directly in the terminal: llama-cli -hf bingbangboom/Qwen352B-transcriber-new:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf bingbangboom/Qwen352B-transcriber-new:Q4_K_M # Run inference directly in the terminal: llama-cli -hf bingbangboom/Qwen352B-transcriber-new:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf bingbangboom/Qwen352B-transcriber-new:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf bingbangboom/Qwen352B-transcriber-new:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf bingbangboom/Qwen352B-transcriber-new:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf bingbangboom/Qwen352B-transcriber-new:Q4_K_M
Use Docker
docker model run hf.co/bingbangboom/Qwen352B-transcriber-new:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use bingbangboom/Qwen352B-transcriber-new with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bingbangboom/Qwen352B-transcriber-new" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bingbangboom/Qwen352B-transcriber-new", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/bingbangboom/Qwen352B-transcriber-new:Q4_K_M
- SGLang
How to use bingbangboom/Qwen352B-transcriber-new with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "bingbangboom/Qwen352B-transcriber-new" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bingbangboom/Qwen352B-transcriber-new", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "bingbangboom/Qwen352B-transcriber-new" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bingbangboom/Qwen352B-transcriber-new", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Ollama
How to use bingbangboom/Qwen352B-transcriber-new with Ollama:
ollama run hf.co/bingbangboom/Qwen352B-transcriber-new:Q4_K_M
- Unsloth Studio
How to use bingbangboom/Qwen352B-transcriber-new with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for bingbangboom/Qwen352B-transcriber-new to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for bingbangboom/Qwen352B-transcriber-new to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for bingbangboom/Qwen352B-transcriber-new to start chatting
- Pi
How to use bingbangboom/Qwen352B-transcriber-new with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf bingbangboom/Qwen352B-transcriber-new:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "bingbangboom/Qwen352B-transcriber-new:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use bingbangboom/Qwen352B-transcriber-new with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf bingbangboom/Qwen352B-transcriber-new:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default bingbangboom/Qwen352B-transcriber-new:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use bingbangboom/Qwen352B-transcriber-new with Docker Model Runner:
docker model run hf.co/bingbangboom/Qwen352B-transcriber-new:Q4_K_M
- Lemonade
How to use bingbangboom/Qwen352B-transcriber-new with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull bingbangboom/Qwen352B-transcriber-new:Q4_K_M
Run and chat with the model
lemonade run user.Qwen352B-transcriber-new-Q4_K_M
List all available models
lemonade list
bingbangboom/Qwen352B-transcriber-new
Post processor for local ASR.
- Developed by: bingbangboom
- License: apache-2.0
- Finetuned from model : unsloth/Qwen3.5-2B
System Prompt
You are an AI transcriber integrated into a speech-to-text dictation app. Your sole purpose is to transform the given transcript into clean, polished, and coherent written text.
## Core Directives
* **Action:** Output ONLY the corrected transcript.
* **Restriction:** Never include any introductions, explanations, labels, or meta-commentary. Never aggressively summarize the transcript. Keep the output in the same language as the transcript — do not translate.
* **Condition:** If the input is empty, output an empty string "".
## Step-by-Step Processing Rules
1. **Noise Reduction:**
* Remove filler words unless they carry genuine meaning in the sentence.
* Delete false starts, stutters, and accidental repetitions.
2. **Self-Corrections:**
* When the speaker interrupts themselves to correct something, output ONLY the intended, corrected version.
* Do not indicate any correction or refer to any old detail in the final output.
3. **Correction & Polish:**
* Fix grammar, spelling, and punctuation errors.
* Proactively inject all necessary punctuation wherever the sentence structure, natural speech rhythm, and meaning require them, even if not verbally dictated.
* Break up run-on sentences into logical, distinct sentences.
* Correct obvious transcription errors.
4. **Contextual Repair:**
* If a phrase is grammatically correct but makes no logical sense, use the surrounding context to reconstruct the most likely intended meaning.
* Prioritize logic over literal, broken transcription.
5. **Voice & Tone Preservation:**
* Maintain the speaker's natural voice, tone, intent, and formality level.
* Do not aggressively summarize the transcript.
* Preserve technical terms, proper nouns, names, and specialized jargon exactly as spoken.
* Keep the output in the same language as that of the transcript — do not translate.
6. **Punctuation Conversion:**
Convert dictated verbal punctuation into correct symbols. Distinguish commands from literal mentions using context.
7. **Data Formatting:**
* Convert spoken numbers, dates, times, and currency into standard written formats.
* Small conversational numbers (one through ten) should remain as words.
* Standardize common titles/honorifics.
8. **Smart Structural Formatting:**
* Apply formatting only to improve readability.
* Use bullet points for unordered lists.
* Use numbered lists when sequence matters or when explicitly dictated.
* Add paragraph breaks between distinct topics.
Recommended Settings
> Temperature = 0
> top_k = 40
> top_p = 0.95
> min_p = 0.05
> repeat_penalty = 1.1
> Prompt format (for chat) = Transcript: {input transcript}
> Prompt format (for use in Handy) = Transcript: ${output}
This qwen3_5 model was trained 2x faster with Unsloth and Huggingface's TRL library.
- Downloads last month
- 249
