What languages are supported in audio input?

#25
by J22 - opened

E4B supports Chinese (although not quite good, frankly speaking), but 12B does not.

Please provide this information on the model card.

Can confirm Korean works for audio input — a short (~10s) Korean phone-call clip transcribes correctly via llama-server's input_audio field (Q4_K_M + the bf16 mmproj, recent master, --jinja). Quality is solid for short clips but the model card caps audio at ~30s, and longer clips degrade. Note it's also a reasoning model, so the transcription comes after a think block — keep max_tokens generous and read reasoning_content vs content separately on the OpenAI endpoint.

Sign up or log in to comment