Feature Extraction
Transformers
Safetensors
ecapa_tdnn_speaker_encoder
image-feature-extraction
text-to-audio
audio-to-audio
audio-classification
speaker-embedding
ecapa-tdnn
x-vector
qwen3-tts
custom_code
Instructions to use marksverdhei/Qwen3-Voice-Embedding-12Hz-1.7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use marksverdhei/Qwen3-Voice-Embedding-12Hz-1.7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="marksverdhei/Qwen3-Voice-Embedding-12Hz-1.7B", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("marksverdhei/Qwen3-Voice-Embedding-12Hz-1.7B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Batching Behaviour
#1
by gclose19 - opened
Hello.
Thank you for providing standalone code for the speaker embeddings.
I've noticed an issue where differing lengths of padded batched inputs are not being considered by the model.
This results in different embeddings for a given audio segment depending on if the embedding was computed in a batch or not.
This is (presumably) unintentional behaviour ?