Files
research-bot/skills/creative/comfyui/references/workflow-format.md
simple321vip da07b1f453 v1.0.0
2026-05-26 15:59:18 +00:00

227 lines
7.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ComfyUI Workflow JSON Format
## Two Formats — Only API Format Is Executable
**API format** is required for `/api/prompt` and every script in this skill.
The web UI also produces an "editor format" used for visual editing, which
**cannot** be submitted directly.
### API Format
Top-level keys are string node IDs. Each node has `class_type` and `inputs`:
```json
{
"3": {
"class_type": "KSampler",
"inputs": {
"seed": 156680208700286,
"steps": 20,
"cfg": 8,
"sampler_name": "euler",
"scheduler": "normal",
"denoise": 1.0,
"model": ["4", 0],
"positive": ["6", 0],
"negative": ["7", 0],
"latent_image": ["5", 0]
},
"_meta": {"title": "KSampler"}
},
"4": {
"class_type": "CheckpointLoaderSimple",
"inputs": {"ckpt_name": "v1-5-pruned-emaonly.safetensors"}
}
}
```
**Detection:** every top-level value has `class_type`. The skill's
`_common.is_api_format()` does this check.
### Editor Format (not directly executable)
Has `nodes[]` and `links[]` arrays — the visual graph. To convert: open in
ComfyUI's web UI and use **Workflow → Export (API)** (newer UI) or the
"Save (API Format)" button (older UI).
**Detection:** top-level has `"nodes"` and `"links"` keys.
## Inputs: Literals vs Links
```json
"inputs": {
"text": "a cat", // literal — modifiable
"seed": 42, // literal — modifiable
"clip": ["4", 1] // link — wiring; do NOT overwrite
}
```
Links are length-2 arrays of `[upstream_node_id, output_slot]`. The skill's
parameter injector refuses to overwrite a link with a literal (logs a
warning and skips).
## Common Node Types and Their Controllable Parameters
The full catalog lives in `scripts/_common.py` (`PARAM_PATTERNS` and
`MODEL_LOADERS`). Highlights:
### Text Prompts
| Node Class | Key Fields |
|------------|------------|
| `CLIPTextEncode` | `text` |
| `CLIPTextEncodeSDXL` | `text_g`, `text_l`, `width`, `height` |
| `CLIPTextEncodeFlux` | `clip_l`, `t5xxl`, `guidance` |
To distinguish positive from negative the skill traces `KSampler.negative`
back through Reroute / Primitive nodes to the source CLIPTextEncode. Falls
back to `_meta.title` heuristics ("negative", "neg", "anti").
### Sampling
| Node Class | Key Fields |
|------------|------------|
| `KSampler` | `seed`, `steps`, `cfg`, `sampler_name`, `scheduler`, `denoise` |
| `KSamplerAdvanced` | `noise_seed`, `steps`, `cfg`, `start_at_step`, `end_at_step` |
| `SamplerCustom` | `noise_seed`, `cfg`, `sampler`, `sigmas` |
| `SamplerCustomAdvanced` | `noise_seed` (via RandomNoise input) |
| `RandomNoise` | `noise_seed` |
| `BasicScheduler` | `steps`, `scheduler`, `denoise` |
| `KSamplerSelect` | `sampler_name` |
| `BasicGuider` / `CFGGuider` | `cfg` |
| `ModelSamplingFlux` | `max_shift`, `base_shift`, `width`, `height` |
| `SDTurboScheduler` | `steps`, `denoise` |
### Latent / Dimensions
| Node Class | Key Fields |
|------------|------------|
| `EmptyLatentImage` | `width`, `height`, `batch_size` |
| `EmptySD3LatentImage` | `width`, `height`, `batch_size` |
| `EmptyHunyuanLatentVideo` | `width`, `height`, `length`, `batch_size` |
| `EmptyMochiLatentVideo` | `width`, `height`, `length`, `batch_size` |
| `EmptyLTXVLatentVideo` | `width`, `height`, `length`, `batch_size` |
### Model Loading
| Node Class | Key Fields | Folder |
|------------|------------|--------|
| `CheckpointLoaderSimple` | `ckpt_name` | `checkpoints` |
| `LoraLoader` | `lora_name`, `strength_model`, `strength_clip` | `loras` |
| `LoraLoaderModelOnly` | `lora_name`, `strength_model` | `loras` |
| `VAELoader` | `vae_name` | `vae` |
| `ControlNetLoader` | `control_net_name` | `controlnet` |
| `CLIPLoader` | `clip_name` | `clip` |
| `DualCLIPLoader` | `clip_name1`, `clip_name2` | `clip` |
| `TripleCLIPLoader` | `clip_name1/2/3` | `clip` |
| `UNETLoader` | `unet_name` | `unet` |
| `DiffusionModelLoader` | `model_name` | `diffusion_models` |
| `UpscaleModelLoader` | `model_name` | `upscale_models` |
| `IPAdapterModelLoader` | `ipadapter_file` | `ipadapter` |
| `ADE_AnimateDiffLoaderWithContext` | `model_name`, `motion_scale` | `animatediff_models` |
### Image Input/Output
| Node Class | Key Fields |
|------------|------------|
| `LoadImage` | `image` (server-side filename, after upload) |
| `LoadImageMask` | `image`, `channel` (`red` / `green` / `blue` / `alpha`) |
| `VAEEncode` / `VAEDecode` | (no controllable fields) |
| `VAEEncodeForInpaint` | `grow_mask_by` |
| `SaveImage` | `filename_prefix` |
| `VHS_VideoCombine` | `frame_rate`, `format`, `filename_prefix`, `loop_count`, `pingpong` |
### ControlNet
| Node Class | Key Fields |
|------------|------------|
| `ControlNetApply` | `strength` |
| `ControlNetApplyAdvanced` | `strength`, `start_percent`, `end_percent` |
### IPAdapter (community pack `comfyui_ipadapter_plus`)
| Node Class | Key Fields |
|------------|------------|
| `IPAdapterAdvanced` | `weight`, `start_at`, `end_at` |
| `IPAdapter` | `weight` |
### Embeddings (referenced inside prompt strings)
ComfyUI scans prompt text for `embedding:NAME` syntax. The skill's
`_common.iter_embedding_refs()` extracts these as model dependencies.
```text
"a beautiful cat, embedding:goodvibes:1.2, embedding:art-style"
```
`extract_schema.py` and `check_deps.py` surface these in
`embedding_dependencies` / `missing_embeddings`.
## Parameter Injection Pattern
```python
import json, copy
with open("workflow_api.json") as f:
workflow = json.load(f)
wf = copy.deepcopy(workflow)
wf["6"]["inputs"]["text"] = "a beautiful sunset"
wf["7"]["inputs"]["text"] = "ugly, blurry"
wf["3"]["inputs"]["seed"] = 42
wf["3"]["inputs"]["steps"] = 30
wf["5"]["inputs"]["width"] = 1024
wf["5"]["inputs"]["height"] = 1024
```
`scripts/extract_schema.py` automates discovering which node IDs/fields
correspond to which user-facing parameters. It returns a `parameters` dict
that `run_workflow.py` reads to inject values from `--args`.
## Identifying Controllable Parameters (Heuristics)
For unknown workflows:
1. **Prompt text** — any `CLIPTextEncode.text`. Use connection tracing back
from `KSampler.positive` / `.negative` to disambiguate (don't trust
meta-title alone).
2. **Seed**`KSampler.seed` / `KSamplerAdvanced.noise_seed` / `RandomNoise.noise_seed`.
3. **Dimensions**`Empty*LatentImage.width/height` (must be multiples of 8).
4. **Steps / CFG**`KSampler.steps`, `KSampler.cfg`. Steps 2050 typical.
CFG 515 typical (Flux uses guidance, not CFG).
5. **Model / checkpoint**`CheckpointLoaderSimple.ckpt_name`. Filename must
match an installed file *exactly*.
6. **LoRA**`LoraLoader.lora_name`, `.strength_model`.
7. **Images for img2img / inpaint**`LoadImage.image`. Server-side filename
after upload.
8. **Denoise**`KSampler.denoise`. 0.01.0; 1.0 = ignore input image,
0.0 = pass through. Sweet spot for img2img: 0.40.7.
## Output Nodes
Output is produced by these node types. The skill's `OUTPUT_NODES` set
extends to common community packs.
| Node | Output Key | Content |
|------|-----------|---------|
| `SaveImage` | `images` | List of `{filename, subfolder, type}` |
| `PreviewImage` | `images` | Temporary preview (not saved) |
| `VHS_VideoCombine` | `gifs` (older) or `videos`/`video` (newer cloud) | Video file refs |
| `SaveAudio` | `audio` | Audio file refs |
| `SaveAnimatedWEBP` / `SaveAnimatedPNG` | `images` | Animated images |
| `Save3D` | `3d` | 3D asset refs |
After execution, fetch outputs from `/history/{prompt_id}` (local) or
`/api/jobs/{prompt_id}` (cloud) → `outputs``{node_id}``{key}`.
## Wrapper Variants
Some saved JSON files wrap the workflow under a `"prompt"` key (matching
the `/api/prompt` payload shape). The skill's `_common.unwrap_workflow()`
handles this — pass any of:
- raw API format: `{"3": {...}, "4": {...}}`
- wrapped: `{"prompt": {"3": {...}}, "client_id": "..."}`
It rejects editor format with a clear error and a re-export instruction.