DeepBeepMeep commited on
Commit
a8a3e31
·
1 Parent(s): 3bac8f7

New model selection logic / removed tabs

Browse files
README.md CHANGED
@@ -174,35 +174,35 @@ pip install -e .
174
 
175
  To run the text to video generator (in Low VRAM mode):
176
  ```bash
177
- python gradio_server.py
178
  #or
179
- python gradio_server.py --t2v #launch the default text 2 video model
180
  #or
181
- python gradio_server.py --t2v-14B #for the 14B model
182
  #or
183
- python gradio_server.py --t2v-1-3B #for the 1.3B model
184
 
185
  ```
186
 
187
  To run the image to video generator (in Low VRAM mode):
188
  ```bash
189
- python gradio_server.py --i2v
190
  ```
191
  To run the 1.3B Fun InP image to video generator (in Low VRAM mode):
192
  ```bash
193
- python gradio_server.py --i2v-1-3B
194
  ```
195
 
196
  To be able to input multiple images with the image to video generator:
197
  ```bash
198
- python gradio_server.py --i2v --multiple-images
199
  ```
200
 
201
  Within the application you can configure which video generator will be launched without specifying a command line switch.
202
 
203
  To run the application while loading entirely the diffusion model in VRAM (slightly faster but requires 24 GB of VRAM for a 8 bits quantized 14B model )
204
  ```bash
205
- python gradio_server.py --profile 3
206
  ```
207
 
208
  **Trouble shooting**:\
@@ -215,7 +215,7 @@ Therefore you may have no choice but to fallback to sdpa attention, to do so:
215
  or
216
  - Launch the application this way:
217
  ```bash
218
- python gradio_server.py --attention sdpa
219
  ```
220
 
221
  ### Loras support
@@ -249,7 +249,7 @@ Each preset, is a file with ".lset" extension stored in the loras directory and
249
 
250
  Last but not least you can pre activate Loras corresponding and prefill a prompt (comments only or full prompt) by specifying a preset when launching the gradio server:
251
  ```bash
252
- python gradio_server.py --lora-preset mylorapreset.lset # where 'mylorapreset.lset' is a preset stored in the 'loras' folder
253
  ```
254
 
255
  You will find prebuilt Loras on https://civitai.com/ or you will be able to build them with tools such as kohya or onetrainer.
@@ -274,7 +274,7 @@ You can define multiple lines of macros. If there is only one macro line, the ap
274
 
275
  Vace is a ControlNet 1.3B text2video model that allows you on top of a text prompt to provide visual hints to guide the generation. It can do more things than image2video although it is not as good for just starting a video with an image because it only a 1.3B model (in fact 3B) versus 14B and (it is not specialized for start frames). However, with Vace you can inject in the scene people or objects, animate a person, perform inpainting or outpainting, continue a video, ...
276
 
277
- First you need to switch the t2v model to Vace 1.3 in the Configuration Tab. Please note that Vace works well for the moment only with videos up to 5s (81 frames).
278
 
279
  Beside the usual Text Prompt, three new types of visual hints can be provided (and combined !):
280
  - reference Images: use this to inject people or objects in the video. You can select multiple reference Images. The integration of the image is more efficient if the background is replaced by the full white color. You can do that with your preferred background remover or use the built in background remover by checking the box *Remove background*
@@ -296,6 +296,8 @@ There are lots of possible combinations. Some of them require to prepare some ma
296
  Vace provides on its github (https://github.com/ali-vilab/VACE/tree/main/vace/gradios) annotators / preprocessors Gradio tool that can help you build some of these materials depending on the task you want to achieve.
297
 
298
  There is also a guide that describes the various combination of hints (https://github.com/ali-vilab/VACE/blob/main/UserGuide.md).Good luck !
 
 
299
  ### Command line parameters for Gradio Server
300
  --i2v : launch the image to video generator\
301
  --t2v : launch the text to video generator (default defined in the configuration)\
@@ -303,6 +305,7 @@ There is also a guide that describes the various combination of hints (https://g
303
  --t2v-1-3B : launch the 1.3B model text to video generator\
304
  --i2v-14B : launch the 14B model image to video generator\
305
  --i2v-1-3B : launch the Fun InP 1.3B model image to video generator\
 
306
  --quantize-transformer bool: (default True) : enable / disable on the fly transformer quantization\
307
  --lora-dir path : Path of directory that contains Loras in diffusers / safetensor format\
308
  --lora-preset preset : name of preset gile (without the extension) to preload
@@ -324,8 +327,6 @@ There is also a guide that describes the various combination of hints (https://g
324
  --slg : turn on skip layer guidance for improved quality\
325
  --check-loras : filter loras that are incompatible (will take a few seconds while refreshing the lora list or while starting the app)\
326
  --advanced : turn on the advanced mode while launching the app\
327
- --i2v-settings : path to launch settings for i2v\
328
- --t2v-settings : path to launch settings for t2v\
329
  --listen : make server accessible on network\
330
  --gpu device : run Wan on device for instance "cuda:1"
331
 
 
174
 
175
  To run the text to video generator (in Low VRAM mode):
176
  ```bash
177
+ python wgp.py.py
178
  #or
179
+ python wgp.py.py --t2v #launch the default text 2 video model
180
  #or
181
+ python wgp.py.py --t2v-14B #for the 14B model
182
  #or
183
+ python wgp.py.py --t2v-1-3B #for the 1.3B model
184
 
185
  ```
186
 
187
  To run the image to video generator (in Low VRAM mode):
188
  ```bash
189
+ python wgp.py.py --i2v
190
  ```
191
  To run the 1.3B Fun InP image to video generator (in Low VRAM mode):
192
  ```bash
193
+ python wgp.py.py --i2v-1-3B
194
  ```
195
 
196
  To be able to input multiple images with the image to video generator:
197
  ```bash
198
+ python wgp.py.py --i2v --multiple-images
199
  ```
200
 
201
  Within the application you can configure which video generator will be launched without specifying a command line switch.
202
 
203
  To run the application while loading entirely the diffusion model in VRAM (slightly faster but requires 24 GB of VRAM for a 8 bits quantized 14B model )
204
  ```bash
205
+ python wgp.py.py --profile 3
206
  ```
207
 
208
  **Trouble shooting**:\
 
215
  or
216
  - Launch the application this way:
217
  ```bash
218
+ python wgp.py.py --attention sdpa
219
  ```
220
 
221
  ### Loras support
 
249
 
250
  Last but not least you can pre activate Loras corresponding and prefill a prompt (comments only or full prompt) by specifying a preset when launching the gradio server:
251
  ```bash
252
+ python wgp.py.py --lora-preset mylorapreset.lset # where 'mylorapreset.lset' is a preset stored in the 'loras' folder
253
  ```
254
 
255
  You will find prebuilt Loras on https://civitai.com/ or you will be able to build them with tools such as kohya or onetrainer.
 
274
 
275
  Vace is a ControlNet 1.3B text2video model that allows you on top of a text prompt to provide visual hints to guide the generation. It can do more things than image2video although it is not as good for just starting a video with an image because it only a 1.3B model (in fact 3B) versus 14B and (it is not specialized for start frames). However, with Vace you can inject in the scene people or objects, animate a person, perform inpainting or outpainting, continue a video, ...
276
 
277
+ First you need to select the Vace 1.3B model in the Drop Down box at the top. Please note that Vace works well for the moment only with videos up to 5s (81 frames).
278
 
279
  Beside the usual Text Prompt, three new types of visual hints can be provided (and combined !):
280
  - reference Images: use this to inject people or objects in the video. You can select multiple reference Images. The integration of the image is more efficient if the background is replaced by the full white color. You can do that with your preferred background remover or use the built in background remover by checking the box *Remove background*
 
296
  Vace provides on its github (https://github.com/ali-vilab/VACE/tree/main/vace/gradios) annotators / preprocessors Gradio tool that can help you build some of these materials depending on the task you want to achieve.
297
 
298
  There is also a guide that describes the various combination of hints (https://github.com/ali-vilab/VACE/blob/main/UserGuide.md).Good luck !
299
+
300
+ It seems you will get better results if you turn on "Skip Layer Guidance" with its default configuration
301
  ### Command line parameters for Gradio Server
302
  --i2v : launch the image to video generator\
303
  --t2v : launch the text to video generator (default defined in the configuration)\
 
305
  --t2v-1-3B : launch the 1.3B model text to video generator\
306
  --i2v-14B : launch the 14B model image to video generator\
307
  --i2v-1-3B : launch the Fun InP 1.3B model image to video generator\
308
+ --vace : launch the Vace ControlNet 1.3B model image to video generator\
309
  --quantize-transformer bool: (default True) : enable / disable on the fly transformer quantization\
310
  --lora-dir path : Path of directory that contains Loras in diffusers / safetensor format\
311
  --lora-preset preset : name of preset gile (without the extension) to preload
 
327
  --slg : turn on skip layer guidance for improved quality\
328
  --check-loras : filter loras that are incompatible (will take a few seconds while refreshing the lora list or while starting the app)\
329
  --advanced : turn on the advanced mode while launching the app\
 
 
330
  --listen : make server accessible on network\
331
  --gpu device : run Wan on device for instance "cuda:1"
332
 
wan/utils/vace_preprocessor.py CHANGED
@@ -183,7 +183,7 @@ class VaceVideoProcessor(object):
183
  def _get_frameid_bbox_adjust_last(self, fps, frame_timestamps, h, w, crop_box, rng, max_frames= 0):
184
  import math
185
  target_fps = self.max_fps
186
- video_duration = frame_timestamps[-1][1]
187
  video_frame_duration = 1 /fps
188
  target_frame_duration = 1 / target_fps
189
 
@@ -197,9 +197,9 @@ class VaceVideoProcessor(object):
197
  frame_ids.append(frame_no)
198
  cur_time += add_frames_count * video_frame_duration
199
  target_time += target_frame_duration
200
- if cur_time > video_duration:
201
  break
202
-
203
  x1, x2, y1, y2 = [0, w, 0, h] if crop_box is None else crop_box
204
  h, w = y2 - y1, x2 - x1
205
  ratio = h / w
 
183
  def _get_frameid_bbox_adjust_last(self, fps, frame_timestamps, h, w, crop_box, rng, max_frames= 0):
184
  import math
185
  target_fps = self.max_fps
186
+ video_frames_count = len(frame_timestamps)
187
  video_frame_duration = 1 /fps
188
  target_frame_duration = 1 / target_fps
189
 
 
197
  frame_ids.append(frame_no)
198
  cur_time += add_frames_count * video_frame_duration
199
  target_time += target_frame_duration
200
+ if frame_no >= video_frames_count -1:
201
  break
202
+ frame_ids = frame_ids[:video_frames_count]
203
  x1, x2, y1, y2 = [0, w, 0, h] if crop_box is None else crop_box
204
  h, w = y2 - y1, x2 - x1
205
  ratio = h / w
gradio_server.py → wgp.py RENAMED
The diff for this file is too large to render. See raw diff