Spaces:

hugofara
/

wavlm-phonemizer-word-detection

Sleeping

Hugo Farajallah commited on Sep 23, 2025

Commit

82e797f

1 Parent(s): 0cde9d4

chore(general): slight polish to the code, adds documentation.

Files changed (6) hide show

.gitignore CHANGED Viewed

@@ -10,5 +10,7 @@ wheels/
 .venv
 # Ignored generated figures
-figures/*.png
 outputs/

 .venv
 # Ignored generated figures
 outputs/
+# The phonemizer needs the phonemes examples
+lang_dict/

README.md CHANGED Viewed

@@ -3,8 +3,20 @@
 Some simple utility script to show how WavLM works and how to use it.
 It is all based on WavLM Base + Phonemizer FR-IT
 ## Gradio interface
 To launch Gradio, just run:
 ```shell
@@ -13,7 +25,7 @@ python hf_space.py
 And click on the web link!
-## Idea
 - [x] Show activation logits of WavLM (fake model for now)
 - [ ] Compare performances with Wav2Vec 2.0-Phonemizer-FR

 Some simple utility script to show how WavLM works and how to use it.
 It is all based on WavLM Base + Phonemizer FR-IT
+## main.py
+This is the principal entry point.
+Upon running, it will either capture audio from the microphone, or from a sample file.
+Then, it will run an animate an inference, layer by layer and time step by time step.
+![Animation of the inference](images/inference_animation.gif)
 ## Gradio interface
+The Gradio interface is a web interface to demonstrate the word classification capabilities.
+![View of the Gradio interface](images/gradio_space.png)
 To launch Gradio, just run:
 ```shell
 And click on the web link!
+## Ideas
 - [x] Show activation logits of WavLM (fake model for now)
 - [ ] Compare performances with Wav2Vec 2.0-Phonemizer-FR

figures/.gitkeep DELETED Viewed

File without changes

images/gradio_space.png ADDED Viewed

images/inference_animation.gif ADDED Viewed

main.py CHANGED Viewed

@@ -1,4 +1,5 @@
 import functools
 import matplotlib.animation
 import matplotlib.pyplot as plt
@@ -54,6 +55,12 @@ def update_frame(frames, ax, matrix_plot, tokenizer=None, colorbar=None):
 def main(record_mic=False):
     audio_duration = 5
     split_length = 0.1
@@ -148,9 +155,23 @@ def main(record_mic=False):
         # blit=True
     )
     plt.show()
-    animation.save("animated.webm")
 if __name__ == "__main__":
     animation = None
-    main(record_mic=True)

 import functools
+import os
 import matplotlib.animation
 import matplotlib.pyplot as plt
 def main(record_mic=False):
+    """
+    Record an inference run of the model.
+    :param bool record_mic: True to record from the microphone, False to use dummy file.
+    :return str: Path of the output file.
+    """
     audio_duration = 5
     split_length = 0.1
         # blit=True
     )
     plt.show()
+    # Save to file
+    dir_path = "outputs"
+    if not os.path.exists(dir_path) or not os.path.isdir(dir_path):
+        os.makedirs(dir_path)
+    if os.path.exists(f"{dir_path}/animated.webm"):
+        i = 1
+        while os.path.exists(f"{dir_path}/animated_({i}).webm"):
+            i += 1
+        file_name = f"{dir_path}/animated_({i}).webm"
+    else:
+        file_name = f"{dir_path}/animated.webm"
+    animation.save(file_name)
+    return file_name
 if __name__ == "__main__":
     animation = None
+    main(record_mic=False)