Hugo Farajallah commited on
Commit
82e797f
·
1 Parent(s): 0cde9d4

chore(general): slight polish to the code, adds documentation.

Browse files
.gitignore CHANGED
@@ -10,5 +10,7 @@ wheels/
10
  .venv
11
 
12
  # Ignored generated figures
13
- figures/*.png
14
  outputs/
 
 
 
 
10
  .venv
11
 
12
  # Ignored generated figures
 
13
  outputs/
14
+
15
+ # The phonemizer needs the phonemes examples
16
+ lang_dict/
README.md CHANGED
@@ -3,8 +3,20 @@
3
  Some simple utility script to show how WavLM works and how to use it.
4
  It is all based on WavLM Base + Phonemizer FR-IT
5
 
 
 
 
 
 
 
 
 
6
  ## Gradio interface
7
 
 
 
 
 
8
  To launch Gradio, just run:
9
 
10
  ```shell
@@ -13,7 +25,7 @@ python hf_space.py
13
 
14
  And click on the web link!
15
 
16
- ## Idea
17
 
18
  - [x] Show activation logits of WavLM (fake model for now)
19
  - [ ] Compare performances with Wav2Vec 2.0-Phonemizer-FR
 
3
  Some simple utility script to show how WavLM works and how to use it.
4
  It is all based on WavLM Base + Phonemizer FR-IT
5
 
6
+ ## main.py
7
+
8
+ This is the principal entry point.
9
+ Upon running, it will either capture audio from the microphone, or from a sample file.
10
+ Then, it will run an animate an inference, layer by layer and time step by time step.
11
+
12
+ ![Animation of the inference](images/inference_animation.gif)
13
+
14
  ## Gradio interface
15
 
16
+ The Gradio interface is a web interface to demonstrate the word classification capabilities.
17
+
18
+ ![View of the Gradio interface](images/gradio_space.png)
19
+
20
  To launch Gradio, just run:
21
 
22
  ```shell
 
25
 
26
  And click on the web link!
27
 
28
+ ## Ideas
29
 
30
  - [x] Show activation logits of WavLM (fake model for now)
31
  - [ ] Compare performances with Wav2Vec 2.0-Phonemizer-FR
figures/.gitkeep DELETED
File without changes
images/gradio_space.png ADDED

Git LFS Details

  • SHA256: 128ad5c87d6a3c9892541bbd21eebed9eebe36f4fea234f5b8ba72feae442284
  • Pointer size: 131 Bytes
  • Size of remote file: 156 kB
images/inference_animation.gif ADDED

Git LFS Details

  • SHA256: 0489a14233ff15ee304bad5bd2adc01a1ba1589eb00c11c0a136ff8a688513cc
  • Pointer size: 132 Bytes
  • Size of remote file: 9.33 MB
main.py CHANGED
@@ -1,4 +1,5 @@
1
  import functools
 
2
 
3
  import matplotlib.animation
4
  import matplotlib.pyplot as plt
@@ -54,6 +55,12 @@ def update_frame(frames, ax, matrix_plot, tokenizer=None, colorbar=None):
54
 
55
 
56
  def main(record_mic=False):
 
 
 
 
 
 
57
  audio_duration = 5
58
  split_length = 0.1
59
 
@@ -148,9 +155,23 @@ def main(record_mic=False):
148
  # blit=True
149
  )
150
  plt.show()
151
- animation.save("animated.webm")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
152
 
153
 
154
  if __name__ == "__main__":
155
  animation = None
156
- main(record_mic=True)
 
1
  import functools
2
+ import os
3
 
4
  import matplotlib.animation
5
  import matplotlib.pyplot as plt
 
55
 
56
 
57
  def main(record_mic=False):
58
+ """
59
+ Record an inference run of the model.
60
+
61
+ :param bool record_mic: True to record from the microphone, False to use dummy file.
62
+ :return str: Path of the output file.
63
+ """
64
  audio_duration = 5
65
  split_length = 0.1
66
 
 
155
  # blit=True
156
  )
157
  plt.show()
158
+
159
+ # Save to file
160
+ dir_path = "outputs"
161
+ if not os.path.exists(dir_path) or not os.path.isdir(dir_path):
162
+ os.makedirs(dir_path)
163
+
164
+ if os.path.exists(f"{dir_path}/animated.webm"):
165
+ i = 1
166
+ while os.path.exists(f"{dir_path}/animated_({i}).webm"):
167
+ i += 1
168
+ file_name = f"{dir_path}/animated_({i}).webm"
169
+ else:
170
+ file_name = f"{dir_path}/animated.webm"
171
+ animation.save(file_name)
172
+ return file_name
173
 
174
 
175
  if __name__ == "__main__":
176
  animation = None
177
+ main(record_mic=False)