fixie-ai/librispeech_asr
Viewer • Updated • 292k • 7.29k • 5
How to use Mrkomiljon/voiceGUARD with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("audio-classification", model="Mrkomiljon/voiceGUARD") # Load model directly
from transformers import AutoProcessor, AutoModelForAudioClassification
processor = AutoProcessor.from_pretrained("Mrkomiljon/voiceGUARD")
model = AutoModelForAudioClassification.from_pretrained("Mrkomiljon/voiceGUARD")This model is a fine-tuned Wav2Vec2-based audio classifier capable of distinguishing between real human voices and AI-generated voices. It has been trained on a dataset containing samples from various TTS models and real human audio recordings.
.wav, .mp3, etc.Make sure you have transformers and torch installed:
pip install transformers torch torchaudio
import torch
from transformers import Wav2Vec2ForSequenceClassification, Wav2Vec2Processor
import torchaudio
# Load model and processor
model_name = "Mrkomiljon/voiceGUARD"
model = Wav2Vec2ForSequenceClassification.from_pretrained(model_name)
processor = Wav2Vec2Processor.from_pretrained(model_name)
# Load audio
waveform, sample_rate = torchaudio.load("path_to_audio_file.wav")
# Resample if necessary
if sample_rate != 16000:
resampler = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)
waveform = resampler(waveform)
# Preprocess
inputs = processor(waveform.squeeze().numpy(), sampling_rate=16000, return_tensors="pt", padding=True)
# Inference
with torch.no_grad():
logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, dim=-1)
# Map to label
labels = ["Real Human Voice", "AI-generated"]
prediction = labels[predicted_ids.item()]
print(f"Prediction: {prediction}")
This project is licensed under the MIT License. See the LICENSE file for details.
Base model
facebook/wav2vec2-base