--- license: apache-2.0 language: - zh - en metrics: - cer pipeline_tag: automatic-speech-recognition tags: - Paraformer - FunASR - ASR - non-autoregressive - speech-recognition library_name: funasr --- # Paraformer-zh **Non-autoregressive end-to-end speech recognition** — 120x realtime on GPU, production-ready for Mandarin Chinese. Paraformer is a non-autoregressive (NAR) ASR model that generates the entire output in parallel, achieving significant speedups over autoregressive models like Whisper while maintaining competitive accuracy. ## Quick Start ```python from funasr import AutoModel # Basic recognition model = AutoModel(model="funasr/paraformer-zh", hub="hf", device="cuda") result = model.generate(input="audio.wav") print(result[0]["text"]) ``` ## Full Pipeline (VAD + ASR + Punctuation + Speaker Diarization) ```python from funasr import AutoModel model = AutoModel( model="funasr/paraformer-zh", hub="hf", vad_model="funasr/fsmn-vad", punc_model="funasr/ct-punc", spk_model="funasr/campplus", device="cuda", ) result = model.generate(input="meeting.wav") # Output includes timestamps, punctuation, and speaker labels for sentence in result[0]["sentence_info"]: print(f"[Speaker {sentence['spk']}] {sentence['text']}") ``` ## Features - **120x realtime** on GPU (non-autoregressive parallel decoding) - **Chinese + English** mixed recognition - Built-in **VAD** (voice activity detection) for long audio - **Punctuation restoration** with ct-punc model - **Speaker diarization** with cam++ model - Streaming and offline modes - ONNX export supported ## Model Details | Property | Value | |----------|-------| | Architecture | Paraformer (Non-autoregressive) | | Parameters | 220M | | Languages | Chinese, English | | Sample Rate | 16kHz | | Training Data | 60,000+ hours | ## Related Models | Model | Description | Link | |-------|-------------|------| | funasr/fsmn-vad | Voice Activity Detection | [HF](https://huggingface.co/funasr/fsmn-vad) | | funasr/ct-punc | Punctuation Restoration | [HF](https://huggingface.co/funasr/ct-punc) | | funasr/campplus | Speaker Verification | [HF](https://huggingface.co/funasr/campplus) | | funasr/paraformer-zh-streaming | Streaming version | [HF](https://huggingface.co/funasr/paraformer-zh-streaming) | ## Links - **GitHub**: [FunASR](https://github.com/modelscope/FunASR) - **Docs**: [modelscope.github.io/FunASR](https://modelscope.github.io/FunASR/) - **Paper**: [Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition](https://arxiv.org/abs/2206.08317) ## Citation ```bibtex @inproceedings{gao2022paraformer, title={Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition}, author={Gao, Zhifu and Zhang, Shiliang and McLoughlin, Ian and Yan, Zhijie}, booktitle={INTERSPEECH}, year={2022} } ```