Generating a video
Video generation is asynchronous. You create a job, receive a job ID, and poll for completion.| Status | Description |
|---|---|
queued | Job is waiting in queue |
in_progress | Video is being generated |
completed | Generation successful, video available |
failed | Generation failed, check info.errors |
cancelled | Job was cancelled |
Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
prompt | string | Text description of the video to generate | Required |
model | string | Model identifier | Required |
width | integer | Video width in pixels | 1366 |
height | integer | Video height in pixels | 768 |
seconds | string | Length of video (1-10) | "6" |
fps | integer | Frames per second | 15-60 |
steps | integer | Diffusion steps (higher = better quality, slower) | 10-50 |
guidance_scale | float | How closely to follow prompt | 6.0-10.0 |
seed | integer | Random seed for reproducibility | any |
output_format | string | Video format (MP4, WEBM) | MP4 |
output_quality | integer | Bitrate/quality (lower = higher quality) | 20 |
negative_prompt | string | What to avoid in generation | - |
frame_images | array | Keyframe images for video generation. If size 1, starting frame; if size 2, starting and ending frame; if more than 2, frame must be specified per image. | |
resolution | string | Video resolution tier (720P, 1080P). Used by Wan 2.7 models instead of width/height. | "1080P" |
ratio | string | Aspect ratio (16:9, 9:16, 1:1, 4:3, 3:4). Used by Wan 2.7 models. | "16:9" |
media | object | Media inputs for the request (see schema and compatibility below) | - |
promptis required for all models except Klingwidthandheightwill rely on defaults unless otherwise specified - options for dimensions vary by model- Wan 2.7 models use
resolutionandratioinstead ofwidth/height
Media Object
Themedia object is the unified way to pass images, videos, and audio into video generation requests.
| Field | Type | Description |
|---|---|---|
frame_images | array | Keyframe images for I2V. Each item: {input_image, frame} where frame is "first" or "last". |
frame_videos | array | Input video clips for video continuation (I2V). Each item: {video: "url"}. |
reference_images | array | Reference images for character/object consistency (R2V) or visual guidance (Video Edit). |
reference_videos | array | Reference videos for character/object consistency (R2V). Each item: {video: "url"}. |
source_video | string | Source video URL to edit (Video Edit). |
audio_inputs | array | Audio file URLs to drive generation — lip sync, beat-matched motion, etc. (T2V, I2V). Each item: "url". WAV or MP3, 3-30s, up to 15 MB. |
media fields are supported on every model. See the Wan 2.7 quickstart for field compatibility across Wan 2.7 models.
These parameters vary by model, please refer to the models table for details.
Generate customized videos using the above parameters:
Reference Images
Guide your video’s visual style with reference images:Keyframe Control
Control specific frames in your video for precise transitions. Single Keyframe: Set a single(for the example below this is the first frame) frame to a specific image. Depending on the model you can also specify to set multiple keyframes please refer to the models table for details.Audio Input
For models that support audio-driven generation (such as Wan 2.7 T2V), you can pass an audio file via themedia.audio_inputs field. The model synchronizes the generated video to the audio — useful for lip sync, beat-matched motion, or narration-driven scenes.
Audio constraints: WAV or MP3 format, 3–30 seconds, up to 15 MB. If the audio is longer than the video duration, it will be truncated. If shorter, the remaining portion of the video will be silent.
Guidance Scale
Controls how closely the model follows your prompt:- 6.0-7.0: More creative, less literal
- 7.0-9.0: Sweet spot for most use cases
- 9.0-10.0: Strict adherence to prompt
- >12.0: Avoid - may cause artifacts
Quality Control with Steps
Trade off between generation time and quality:- 10 steps: Quick testing, lower quality
- 20 steps: Standard quality, good balance
- 30-40 steps: Production quality, slower
- >50 steps: Diminishing returns
Supported Model Details
See our supported video models and relevant parameters below.| Organization | Name | Model API String | Duration | Dimensions | FPS | Keyframes | Prompt | Reference Images |
|---|---|---|---|---|---|---|---|---|
| MiniMax | MiniMax 01 Director | minimax/video-01-director | 5s | 1366×768 | 25 | First | 2-3000 char | ❌ |
| MiniMax | MiniMax Hailuo 02 | minimax/hailuo-02 | 10s | 1366×768, 1920×1080 | 25 | First | 2-3000 char | ❌ |
| Veo 2.0 | google/veo-2.0 | 5s | 1280×720, 720×1280 | 24 | First, Last | 2-3000 char | ❌ | |
| Veo 3.0 | google/veo-3.0 | 8s | 1280×720, 720×1280, 1920×1080, 1080×1920 | 24 | First | 2-3000 char | ❌ | |
| Veo 3.0 + Audio | google/veo-3.0-audio | 8s | 1280×720, 720×1280, 1920×1080, 1080×1920 | 24 | First | 2-3000 char | ❌ | |
| Veo 3.0 Fast | google/veo-3.0-fast | 8s | 1280×720, 720×1280, 1920×1080, 1080×1920 | 24 | First | 2-3000 char | ❌ | |
| Veo 3.0 Fast + Audio | google/veo-3.0-fast-audio | 8s | 1280×720, 720×1280, 1920×1080, 1080×1920 | 24 | First | 2-3000 char | ❌ | |
| ByteDance | Seedance 1.0 Lite | ByteDance/Seedance-1.0-lite | 5s | 864×480, 736×544, 640×640, 960×416, 416×960, 1248×704, 1120×832, 960×960, 1504×640, 640×1504 | 24 | First, Last | 2-3000 char | ❌ |
| ByteDance | Seedance 1.0 Pro | ByteDance/Seedance-1.0-pro | 5s | 864×480, 736×544, 640×640, 960×416, 416×960, 1248×704, 1120×832, 960×960, 1504×640, 640×1504 | 24 | First, Last | 2-3000 char | ❌ |
| PixVerse | PixVerse v5 | pixverse/pixverse-v5 | 5s | 640×360, 480×360, 360×360, 270×360, 360×640, 960×540, 720×540, 540×540, 405×540, 540×960, 1280×720, | ||||
| 960×720, 720×720, 540×720, 720×1280, 1920×1080, 1440×1080, 1080×1080, 810×1080, 1080×1920 | 16, 24 | First, Last | 2-2048 char | ❌ | ||||
| Kuaishou | Kling 2.1 Master | kwaivgI/kling-2.1-master | 5s | 1920×1080, 1080×1080, 1080×1920 | 24 | First | 2-2500 char | ❌ |
| Kuaishou | Kling 2.1 Standard | kwaivgI/kling-2.1-standard | 5s | 1920×1080, 1080×1080, 1080×1920 | 24 | First | - | ❌ |
| Kuaishou | Kling 2.1 Pro | kwaivgI/kling-2.1-pro | 5s | 1920×1080, 1080×1080, 1080×1920 | 24 | First, Last | - | ❌ |
| Kuaishou | Kling 2.0 Master | kwaivgI/kling-2.0-master | 5s | 1280×720, 720×720, 720×1280 | 24 | First | 2-2500 char | ❌ |
| Kuaishou | Kling 1.6 Standard | kwaivgI/kling-1.6-standard | 5s | 1920×1080, 1080×1080, 1080×1920 | 30, 24 | First | 2-2500 char | ❌ |
| Kuaishou | Kling 1.6 Pro | kwaivgI/kling-1.6-pro | 5s | 1920×1080, 1080×1080, 1080×1920 | 24 | First | - | ❌ |
| Wan-AI | Wan 2.2 I2V | Wan-AI/Wan2.2-I2V-A14B | - | - | - | - | - | ❌ |
| Wan-AI | Wan 2.2 T2V | Wan-AI/Wan2.2-T2V-A14B | - | - | - | - | - | ❌ |
| Wan-AI | Wan 2.7 T2V | Wan-AI/wan2.7-t2v | 2-15s | 720P, 1080P (16:9, 9:16, 1:1, 4:3, 3:4) | 30 | - | 2-5000 char | ❌ |
| Wan-AI | Wan 2.7 I2V | Wan-AI/wan2.7-i2v | 2-15s | 720P, 1080P (16:9, 9:16, 1:1, 4:3, 3:4) | 30 | First, Last | 2-5000 char | ❌ |
| Wan-AI | Wan 2.7 R2V | Wan-AI/wan2.7-r2v | 2-10s | 720P, 1080P (16:9, 9:16, 1:1, 4:3, 3:4) | 30 | - | 2-5000 char | ❌ |
| Wan-AI | Wan 2.7 Video Edit | Wan-AI/wan2.7-videoedit | 2-10s | 720P, 1080P (16:9, 9:16, 1:1, 4:3, 3:4) | 30 | - | 2-5000 char | ❌ |
| Vidu | Vidu 2.0 | vidu/vidu-2.0 | 8s | 1920×1080, 1080×1080, 1080×1920, 1280×720, 720×720, 720×1280, 640×360, 360×360, 360×640 | 24 | First, Last | 2-3000 char | ✅ |
| Vidu | Vidu Q1 | vidu/vidu-q1 | 5s | 1920×1080, 1080×1080, 1080×1920 | 24 | First, Last | 2-3000 char | ❌ |
| OpenAI | Sora 2 | openai/sora-2 | 8s | 1280×720, 720×1280 | - | First | 1-4000 char | ❌ |
| OpenAI | Sora 2 Pro | openai/sora-2-pro | 8s | 1280×720, 720×1280 | - | First | 1-4000 char | ❌ |
Troubleshooting
Video doesn’t match prompt well:- Increase
guidance_scaleto 8-10 - Make prompt more descriptive and specific
- Add
negative_promptto exclude unwanted elements
- Reduce
guidance_scale(keep below 12) - Increase
stepsto 30-40 - Adjust
fpsif motion looks unnatural
- Reduce
steps(try 10-20 for testing) - Use shorter
secondsduring development - Lower
fpsfor slower-paced scenes
- Download videos immediately after completion
- Don’t rely on URLs for long-term storage