| Feature | Limit |
|---|---|
| Reference images | Up to nine |
| Reference videos | Up to three |
| Reference audios | Up to three |
| Frame images (first, last) | Up to two |
| Duration | 4 to 15 seconds (integer) |
| Resolutions | 480p, 720p, 1080p, 4k |
| Audio output | Generated by default |
ByteDance/Seedance-2.0.
Text-to-video
Generate a video from a text prompt. Video generation is asynchronous: you create a job, receive a job ID, and poll for the result.settings.audio to false.
Image-to-video
Animate a still image by passing it as the first frame throughmedia.frame_images.
First and last frame control
Pass twoframe_images (one with frame: "first", one with frame: "last") to control both the starting and ending frames. The model generates smooth motion between the two keyframes.
If you pass one image without
frame, it’s used as the first frame. If you pass two without frame, they’re used as first and last in order.Reference-guided generation
Generate video featuring specific characters, objects, or scenes by passing reference images, reference videos, or both. Seedance 2.0 maintains identity, style, and composition from the references throughout the generated video. Multiple references combine for multi-character scenes.Audio-guided generation
Drive video generation with an audio file by passing it throughmedia.reference_audios. The model synchronizes the generated video to the audio, which is useful for lip sync, beat-matched motion, and narration-driven scenes. Audio-guided generation requires at least one reference image or reference video to anchor the visual subject.
If no reference audio is provided, Seedance 2.0 still generates synchronized audio (dialogue, ambient sound, and effects) based on the prompt and visual content.
Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
prompt | string | Text description of the video to generate (2 to 3,000 characters). | Required |
model | string | ByteDance/Seedance-2.0. | Required |
resolution | string | Output resolution tier: 480p, 720p, 1080p, or 4k. Cannot be combined with width/height. | "720p" |
ratio | string | Aspect ratio: 16:9, 9:16, 1:1, 4:3, 3:4, or 21:9. | "16:9" |
width | integer | Explicit output width in pixels. Must be paired with height. | - |
height | integer | Explicit output height in pixels. Must be paired with width. | - |
seconds | string | Video duration in seconds, integer between 4 and 15. | "5" |
settings.audio | boolean | Whether to generate synchronized audio. Pass via extra_body in the Python SDK. | true |
media | object | Media inputs for the request (see below). | - |
Media object
Themedia object is the unified way to pass images, videos, and audio into a Seedance 2.0 request.
| Field | Type | Description |
|---|---|---|
frame_images | array | Up to two keyframe images. Each item: {input_image, frame} where frame is "first" or "last". With one item and no frame, it’s used as the first frame. With two items and no frame, they’re used as first and last in order. |
reference_images | array | Up to nine reference images for character, object, or scene consistency. Each item is a URL or base64-encoded image. |
reference_videos | array | Up to three reference videos for motion or composition guidance. Each item: {video: "url"}. |
reference_audios | array | Up to three reference audio files to drive video generation. Each item is a URL. Requires at least one reference_images or reference_videos entry. |
Input compatibility
frame_images cannot be combined with any reference input. Use one of the following modes per request:
| Mode | frame_images | reference_images | reference_videos | reference_audios |
|---|---|---|---|---|
| Text-to-video | - | - | - | - |
| Image-to-video | Up to two | - | - | - |
| Reference-guided | - | Up to nine | Up to three | - |
| Audio-guided | - | Up to nine | Up to three | Up to three (requires at least one reference image or video) |
Resolutions and aspect ratios
| Aspect ratio | 480p | 720p | 1080p | 4k |
|---|---|---|---|---|
| 16:9 | 864x496 | 1280x720 | 1920x1080 | 3840x2160 |
| 4:3 | 752x560 | 1112x834 | 1664x1248 | 3326x2494 |
| 1:1 | 640x640 | 960x960 | 1440x1440 | 2880x2880 |
| 3:4 | 560x752 | 834x1112 | 1248x1664 | 2496x3328 |
| 9:16 | 496x864 | 720x1280 | 1080x1920 | 2160x3840 |
| 21:9 | 992x432 | 1470x630 | 2206x946 | 4398x1886 |
width and height directly instead of resolution and ratio.
Pricing
| Mode | 480p | 720p | 1080p | 4k |
|---|---|---|---|---|
| Text-to-video, image-to-video | $0.07 / second | $0.16 / second | $0.40 / second | $0.836 / second |
| Video-to-video | from $0.13 / second | from $0.28 / second | from $0.48 / second | from $1.050 / second |
Prompting tips
Write descriptive prompts. Instead of “a cat walking”, try “A small black cat walks gracefully through a sunlit garden, soft bokeh background, gentle breeze rustling the flowers, cinematic slow motion.” For multi-shot scenes, describe the transitions explicitly. Seedance 2.0 follows shot-by-shot instructions like “Shot 1: wide aerial of the city. Shot 2: cut to a close-up of the protagonist’s face.”Next steps
- Video generation overview for the full parameter reference and supported models.
- API reference: create video for REST API details.
- API reference: get video status for polling and status codes.