With the update Stable Diffusion WebUI Forge now fully supports the Flux.1 model, offering users an enhanced experience in AI-driven image generation. This update significantly boosts speed and precision, particularly when using the NF4 format. In this post, we’ll explore the advantages of running Flux.1 NF4 on Stable Diffusion WebUI Forge, focusing on speed improvements and how to optimize performance across different PC hardware configurations. I have to do this because for some reason, the GenAI gurus in their videos do not emphasize this at all.
NF4 vs. FP8: a Comparison in Speed and Efficiency
Flux.1 introduces two primary checkpoint formats: NF4 and FP8. Each comes with distinct advantages, but NF4 stands out for its remarkable speed and efficiency.
- Speed Advantage: NF4 is significantly faster than FP8, especially on devices with limited VRAM. For instance, on an 8GB VRAM device like the 3070 Ti, NF4 can reduce the iteration time from 8.3 seconds (with FP8) to just 2.15 seconds — an impressive 3.86x speed improvement. This makes NF4 the optimal choice for users seeking rapid image generation.
- Memory Efficiency: NF4 checkpoint files are about half the size of their FP8 counterparts, making them more storage-efficient and faster to load.
- Precision and Dynamic Range: while FP8 can sometimes offer higher precision, NF4 generally provides better performance in terms of detail retention and dynamic range. This is due to NF4’s sophisticated tensor compression method, which optimizes both storage and computation.
Diffusion with Low Bits: Choosing the Right Setting
In WebUI Forge, you can force the loading weight type through the “Diffusion with Low Bits” settings. These include Auto, nf4, fp8e4, fp4, and fp8e5.
However, in most cases, you can simply set this option to Auto, which will automatically select the optimal precision based on your downloaded checkpoint. This feature ensures that you use the most efficient setting for your hardware without manually adjusting the configuration.
Optimizing NF4 on Stable Diffusion-WebUI-Forge
No matter which PC model you’re using, the following settings will help you optimize the performance of Flux.1 NF4 on Stable Diffusion-WebUI-Forge:
Swap Location:
- CPU Swap: this method offloads part of the model to CPU memory when VRAM is insufficient. It’s reliable but slower.
- Shared Memory Swap: for PC models with simple RAM, consider using shared memory swap, which can be up to 15% faster than CPU swap, although it may cause instability on some systems.
- GPU Weights Slider: adjust the GPU weights according to your project needs. Larger weights increase speed but require more VRAM. For most PC configurations, starting with a mid-range setting and adjusting based on performance is advisable.
Swap Method:
- Queue: This method processes layers sequentially, providing stable and predictable performance.
- Async: Ideal for powerful PC models, Async can accelerate processing but requires careful GPU memory management.
Distilled CFG Guidance
Flux-dev is a distilled model. It is recommended to set CFG=1 and then do not use negative prompts. Using “Distilled CFG Guidance” instead. The default value is 3.5.
Note that if CFG=1, the UI of negative prompt will be greyed out.
Generate images with NF4.
UI select flux,
Checkpoint select: flux1-dev-bnb-nf4-v2
Astronaut in a jungle, cold color palette, muted colors, very detailed, sharp focus
Steps: 20, Sampler: Euler, Schedule type: Simple, CFG scale: 1, Distilled CFG Scale: 3.5, Seed: 12345, Size: 896x1152, Model: flux1-dev-bnb-nf4-v2
We then get a similar image:
Black Myth: Wukong has been taking the world by storm lately, so let’s see what kind of Wukong NF4 has in store for us!
Chinese mythology, the Monkey King wukong, wearing a golden hoop spell, holding a golden rod, riding a somersault cloud, soaring in the heavenly palace
Steps: 20, Sampler: Euler, Schedule type: Simple, CFG scale: 1, Distilled CFG Scale: 3.5, Seed: 3107193459, Size: 896x1152, Model hash: bea01d51bd, Model: flux1-dev-bnb-nf4-v2, Version: f2.0.1v1.10.1-previous-361-g65ec461f
Well, a happy monkey who hasn’t experienced the Black Myth.
Girl, 20 years old, HD close-up photo of face, Disney style, very detailed
Steps: 20, Sampler: Euler, Schedule type: Simple, CFG scale: 1, Distilled CFG Scale: 3.5, Seed: 3107193459, Size: 896x1152, Model hash: bea01d51bd, Model: flux1-dev-bnb-nf4-v2, Version: f2.0.1v1.10.1-previous-361-g65ec461fcale: 3.5, Seed: 3107193459, Size: 896x1152, Model hash: bea01d51bd,n: f2.0.1v1.10.1-previous-361-g65ec461f
European vintage style living room with black wooden furniture, brown wooden floor, large floor-to-ceiling windows, brown leather sofa, crystal chandelier, white carved plaster ceiling
Steps: 20, Sampler: Euler, Schedule type: Simple, CFG scale: 1, Distilled CFG Scale: 3.5, Seed: 2503002636, Size: 896x1152, Model hash: bea01d51bd, Model: flux1-dev-bnb-nf4-v2, Version: f2.0.1v1.10.1-previous-361-g65ec461f
Conclusion
With PC’s latest update to Stable Diffusion-WebUI-Forge, using the Flux.1 model - especially in the NF4 format - has never been more powerful. By aligning your hardware setup with the right configurations, you can fully exploit NF4’s speed and efficiency, making your image generation workflow faster and more effective.
No comments:
Post a Comment
А что вы думаете по этому поводу?