Qwen-Image is a 20B parameter MMDiT (Multimodal Diffusion Transformer) model open-sourced under the Apache 2.0 license.
Model Used | VRAM Usage | First Generation | Second Generation |
---|---|---|---|
fp8_e4m3fn | 86% | ≈ 94s | ≈ 71s |
fp8_e4m3fn with lightx2v 8-step LoRA | 86% | ≈ 55s | ≈ 34s |
Distilled fp8_e4m3fn | 86% | ≈ 69s | ≈ 36s |
Download Workflow for Qwen-Image Official Model
Distilled versionDownload Workflow for Distilled Model
Load Diffusion Model
node has loaded qwen_image_fp8_e4m3fn.safetensors
Load CLIP
node has loaded qwen_2.5_vl_7b_fp8_scaled.safetensors
Load VAE
node has loaded qwen_image_vae.safetensors
EmptySD3LatentImage
node is set with the correct image dimensionsCLIP Text Encoder
node; currently, it supports at least English, Chinese, Korean, Japanese, Italian, etc.Ctrl + B
to enable it, and modify the Ksampler settings as described in step 8Queue
button, or use the shortcut Ctrl(cmd) + Enter
to run the workflow