What languages are supported in audio input?

#25

by J22 - opened 17 days ago

Discussion

J22

17 days ago

•

edited 17 days ago

E4B supports Chinese (although not quite good, frankly speaking), but 12B does not.

Please provide this information on the model card.

hero775

16 days ago

Can confirm Korean works for audio input — a short (~10s) Korean phone-call clip transcribes correctly via llama-server's input_audio field (Q4_K_M + the bf16 mmproj, recent master, --jinja). Quality is solid for short clips but the model card caps audio at ~30s, and longer clips degrade. Note it's also a reasoning model, so the transcription comes after a think block — keep max_tokens generous and read reasoning_content vs content separately on the OpenAI endpoint.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment