DeepBeepMeep commited on
Commit ·
a8a3e31
1
Parent(s): 3bac8f7
New model selection logic / removed tabs
Browse files- README.md +14 -13
- wan/utils/vace_preprocessor.py +3 -3
- gradio_server.py → wgp.py +0 -0
README.md
CHANGED
|
@@ -174,35 +174,35 @@ pip install -e .
|
|
| 174 |
|
| 175 |
To run the text to video generator (in Low VRAM mode):
|
| 176 |
```bash
|
| 177 |
-
python
|
| 178 |
#or
|
| 179 |
-
python
|
| 180 |
#or
|
| 181 |
-
python
|
| 182 |
#or
|
| 183 |
-
python
|
| 184 |
|
| 185 |
```
|
| 186 |
|
| 187 |
To run the image to video generator (in Low VRAM mode):
|
| 188 |
```bash
|
| 189 |
-
python
|
| 190 |
```
|
| 191 |
To run the 1.3B Fun InP image to video generator (in Low VRAM mode):
|
| 192 |
```bash
|
| 193 |
-
python
|
| 194 |
```
|
| 195 |
|
| 196 |
To be able to input multiple images with the image to video generator:
|
| 197 |
```bash
|
| 198 |
-
python
|
| 199 |
```
|
| 200 |
|
| 201 |
Within the application you can configure which video generator will be launched without specifying a command line switch.
|
| 202 |
|
| 203 |
To run the application while loading entirely the diffusion model in VRAM (slightly faster but requires 24 GB of VRAM for a 8 bits quantized 14B model )
|
| 204 |
```bash
|
| 205 |
-
python
|
| 206 |
```
|
| 207 |
|
| 208 |
**Trouble shooting**:\
|
|
@@ -215,7 +215,7 @@ Therefore you may have no choice but to fallback to sdpa attention, to do so:
|
|
| 215 |
or
|
| 216 |
- Launch the application this way:
|
| 217 |
```bash
|
| 218 |
-
python
|
| 219 |
```
|
| 220 |
|
| 221 |
### Loras support
|
|
@@ -249,7 +249,7 @@ Each preset, is a file with ".lset" extension stored in the loras directory and
|
|
| 249 |
|
| 250 |
Last but not least you can pre activate Loras corresponding and prefill a prompt (comments only or full prompt) by specifying a preset when launching the gradio server:
|
| 251 |
```bash
|
| 252 |
-
python
|
| 253 |
```
|
| 254 |
|
| 255 |
You will find prebuilt Loras on https://civitai.com/ or you will be able to build them with tools such as kohya or onetrainer.
|
|
@@ -274,7 +274,7 @@ You can define multiple lines of macros. If there is only one macro line, the ap
|
|
| 274 |
|
| 275 |
Vace is a ControlNet 1.3B text2video model that allows you on top of a text prompt to provide visual hints to guide the generation. It can do more things than image2video although it is not as good for just starting a video with an image because it only a 1.3B model (in fact 3B) versus 14B and (it is not specialized for start frames). However, with Vace you can inject in the scene people or objects, animate a person, perform inpainting or outpainting, continue a video, ...
|
| 276 |
|
| 277 |
-
First you need to
|
| 278 |
|
| 279 |
Beside the usual Text Prompt, three new types of visual hints can be provided (and combined !):
|
| 280 |
- reference Images: use this to inject people or objects in the video. You can select multiple reference Images. The integration of the image is more efficient if the background is replaced by the full white color. You can do that with your preferred background remover or use the built in background remover by checking the box *Remove background*
|
|
@@ -296,6 +296,8 @@ There are lots of possible combinations. Some of them require to prepare some ma
|
|
| 296 |
Vace provides on its github (https://github.com/ali-vilab/VACE/tree/main/vace/gradios) annotators / preprocessors Gradio tool that can help you build some of these materials depending on the task you want to achieve.
|
| 297 |
|
| 298 |
There is also a guide that describes the various combination of hints (https://github.com/ali-vilab/VACE/blob/main/UserGuide.md).Good luck !
|
|
|
|
|
|
|
| 299 |
### Command line parameters for Gradio Server
|
| 300 |
--i2v : launch the image to video generator\
|
| 301 |
--t2v : launch the text to video generator (default defined in the configuration)\
|
|
@@ -303,6 +305,7 @@ There is also a guide that describes the various combination of hints (https://g
|
|
| 303 |
--t2v-1-3B : launch the 1.3B model text to video generator\
|
| 304 |
--i2v-14B : launch the 14B model image to video generator\
|
| 305 |
--i2v-1-3B : launch the Fun InP 1.3B model image to video generator\
|
|
|
|
| 306 |
--quantize-transformer bool: (default True) : enable / disable on the fly transformer quantization\
|
| 307 |
--lora-dir path : Path of directory that contains Loras in diffusers / safetensor format\
|
| 308 |
--lora-preset preset : name of preset gile (without the extension) to preload
|
|
@@ -324,8 +327,6 @@ There is also a guide that describes the various combination of hints (https://g
|
|
| 324 |
--slg : turn on skip layer guidance for improved quality\
|
| 325 |
--check-loras : filter loras that are incompatible (will take a few seconds while refreshing the lora list or while starting the app)\
|
| 326 |
--advanced : turn on the advanced mode while launching the app\
|
| 327 |
-
--i2v-settings : path to launch settings for i2v\
|
| 328 |
-
--t2v-settings : path to launch settings for t2v\
|
| 329 |
--listen : make server accessible on network\
|
| 330 |
--gpu device : run Wan on device for instance "cuda:1"
|
| 331 |
|
|
|
|
| 174 |
|
| 175 |
To run the text to video generator (in Low VRAM mode):
|
| 176 |
```bash
|
| 177 |
+
python wgp.py.py
|
| 178 |
#or
|
| 179 |
+
python wgp.py.py --t2v #launch the default text 2 video model
|
| 180 |
#or
|
| 181 |
+
python wgp.py.py --t2v-14B #for the 14B model
|
| 182 |
#or
|
| 183 |
+
python wgp.py.py --t2v-1-3B #for the 1.3B model
|
| 184 |
|
| 185 |
```
|
| 186 |
|
| 187 |
To run the image to video generator (in Low VRAM mode):
|
| 188 |
```bash
|
| 189 |
+
python wgp.py.py --i2v
|
| 190 |
```
|
| 191 |
To run the 1.3B Fun InP image to video generator (in Low VRAM mode):
|
| 192 |
```bash
|
| 193 |
+
python wgp.py.py --i2v-1-3B
|
| 194 |
```
|
| 195 |
|
| 196 |
To be able to input multiple images with the image to video generator:
|
| 197 |
```bash
|
| 198 |
+
python wgp.py.py --i2v --multiple-images
|
| 199 |
```
|
| 200 |
|
| 201 |
Within the application you can configure which video generator will be launched without specifying a command line switch.
|
| 202 |
|
| 203 |
To run the application while loading entirely the diffusion model in VRAM (slightly faster but requires 24 GB of VRAM for a 8 bits quantized 14B model )
|
| 204 |
```bash
|
| 205 |
+
python wgp.py.py --profile 3
|
| 206 |
```
|
| 207 |
|
| 208 |
**Trouble shooting**:\
|
|
|
|
| 215 |
or
|
| 216 |
- Launch the application this way:
|
| 217 |
```bash
|
| 218 |
+
python wgp.py.py --attention sdpa
|
| 219 |
```
|
| 220 |
|
| 221 |
### Loras support
|
|
|
|
| 249 |
|
| 250 |
Last but not least you can pre activate Loras corresponding and prefill a prompt (comments only or full prompt) by specifying a preset when launching the gradio server:
|
| 251 |
```bash
|
| 252 |
+
python wgp.py.py --lora-preset mylorapreset.lset # where 'mylorapreset.lset' is a preset stored in the 'loras' folder
|
| 253 |
```
|
| 254 |
|
| 255 |
You will find prebuilt Loras on https://civitai.com/ or you will be able to build them with tools such as kohya or onetrainer.
|
|
|
|
| 274 |
|
| 275 |
Vace is a ControlNet 1.3B text2video model that allows you on top of a text prompt to provide visual hints to guide the generation. It can do more things than image2video although it is not as good for just starting a video with an image because it only a 1.3B model (in fact 3B) versus 14B and (it is not specialized for start frames). However, with Vace you can inject in the scene people or objects, animate a person, perform inpainting or outpainting, continue a video, ...
|
| 276 |
|
| 277 |
+
First you need to select the Vace 1.3B model in the Drop Down box at the top. Please note that Vace works well for the moment only with videos up to 5s (81 frames).
|
| 278 |
|
| 279 |
Beside the usual Text Prompt, three new types of visual hints can be provided (and combined !):
|
| 280 |
- reference Images: use this to inject people or objects in the video. You can select multiple reference Images. The integration of the image is more efficient if the background is replaced by the full white color. You can do that with your preferred background remover or use the built in background remover by checking the box *Remove background*
|
|
|
|
| 296 |
Vace provides on its github (https://github.com/ali-vilab/VACE/tree/main/vace/gradios) annotators / preprocessors Gradio tool that can help you build some of these materials depending on the task you want to achieve.
|
| 297 |
|
| 298 |
There is also a guide that describes the various combination of hints (https://github.com/ali-vilab/VACE/blob/main/UserGuide.md).Good luck !
|
| 299 |
+
|
| 300 |
+
It seems you will get better results if you turn on "Skip Layer Guidance" with its default configuration
|
| 301 |
### Command line parameters for Gradio Server
|
| 302 |
--i2v : launch the image to video generator\
|
| 303 |
--t2v : launch the text to video generator (default defined in the configuration)\
|
|
|
|
| 305 |
--t2v-1-3B : launch the 1.3B model text to video generator\
|
| 306 |
--i2v-14B : launch the 14B model image to video generator\
|
| 307 |
--i2v-1-3B : launch the Fun InP 1.3B model image to video generator\
|
| 308 |
+
--vace : launch the Vace ControlNet 1.3B model image to video generator\
|
| 309 |
--quantize-transformer bool: (default True) : enable / disable on the fly transformer quantization\
|
| 310 |
--lora-dir path : Path of directory that contains Loras in diffusers / safetensor format\
|
| 311 |
--lora-preset preset : name of preset gile (without the extension) to preload
|
|
|
|
| 327 |
--slg : turn on skip layer guidance for improved quality\
|
| 328 |
--check-loras : filter loras that are incompatible (will take a few seconds while refreshing the lora list or while starting the app)\
|
| 329 |
--advanced : turn on the advanced mode while launching the app\
|
|
|
|
|
|
|
| 330 |
--listen : make server accessible on network\
|
| 331 |
--gpu device : run Wan on device for instance "cuda:1"
|
| 332 |
|
wan/utils/vace_preprocessor.py
CHANGED
|
@@ -183,7 +183,7 @@ class VaceVideoProcessor(object):
|
|
| 183 |
def _get_frameid_bbox_adjust_last(self, fps, frame_timestamps, h, w, crop_box, rng, max_frames= 0):
|
| 184 |
import math
|
| 185 |
target_fps = self.max_fps
|
| 186 |
-
|
| 187 |
video_frame_duration = 1 /fps
|
| 188 |
target_frame_duration = 1 / target_fps
|
| 189 |
|
|
@@ -197,9 +197,9 @@ class VaceVideoProcessor(object):
|
|
| 197 |
frame_ids.append(frame_no)
|
| 198 |
cur_time += add_frames_count * video_frame_duration
|
| 199 |
target_time += target_frame_duration
|
| 200 |
-
if
|
| 201 |
break
|
| 202 |
-
|
| 203 |
x1, x2, y1, y2 = [0, w, 0, h] if crop_box is None else crop_box
|
| 204 |
h, w = y2 - y1, x2 - x1
|
| 205 |
ratio = h / w
|
|
|
|
| 183 |
def _get_frameid_bbox_adjust_last(self, fps, frame_timestamps, h, w, crop_box, rng, max_frames= 0):
|
| 184 |
import math
|
| 185 |
target_fps = self.max_fps
|
| 186 |
+
video_frames_count = len(frame_timestamps)
|
| 187 |
video_frame_duration = 1 /fps
|
| 188 |
target_frame_duration = 1 / target_fps
|
| 189 |
|
|
|
|
| 197 |
frame_ids.append(frame_no)
|
| 198 |
cur_time += add_frames_count * video_frame_duration
|
| 199 |
target_time += target_frame_duration
|
| 200 |
+
if frame_no >= video_frames_count -1:
|
| 201 |
break
|
| 202 |
+
frame_ids = frame_ids[:video_frames_count]
|
| 203 |
x1, x2, y1, y2 = [0, w, 0, h] if crop_box is None else crop_box
|
| 204 |
h, w = y2 - y1, x2 - x1
|
| 205 |
ratio = h / w
|
gradio_server.py → wgp.py
RENAMED
|
The diff for this file is too large to render.
See raw diff
|
|
|