Documentation Index
Fetch the complete documentation index at: https://docs.together.ai/llms.txt
Use this file to discover all available pages before exploring further.
Seedance 2.0 is a unified multimodal audio-video generation model from ByteDance. It accepts text, image, video, and audio inputs in any combination, and produces multi-shot videos up to 15 seconds with dual-channel synchronized audio (dialogue, ambient sound, and effects). Seedance 2.0 also supports physics-aware motion, video extension, and instruction-based editing.
| Feature | Limit |
|---|
| Reference images | Up to nine |
| Reference videos | Up to three |
| Reference audios | Up to three |
| Frame images (first, last) | Up to two |
| Duration | 4 to 15 seconds (integer) |
| Resolutions | 480p, 720p, 1080p |
| Audio output | Generated by default |
The model API string is ByteDance/Seedance-2.0.
Text-to-video
Generate a video from a text prompt. Video generation is asynchronous: you create a job, receive a job ID, and poll for the result.
import time
from together import Together
client = Together()
job = client.videos.create(
prompt="A small cute cartoon kitten general in golden armor stands on a cliff, commanding an army of mice charging below. Epic ancient war atmosphere, dramatic clouds over snowy mountains.",
model="ByteDance/Seedance-2.0",
resolution="720p",
ratio="16:9",
seconds="5",
)
print(f"Job ID: {job.id}")
while True:
status = client.videos.retrieve(job.id)
print(f"Status: {status.status}")
if status.status == "completed":
print(f"Video URL: {status.outputs.video_url}")
break
elif status.status == "failed":
print(f"Error: {status.error}")
break
time.sleep(15)
Seedance 2.0 generates synchronized audio by default. To produce a silent video, set settings.audio to false.
job = client.videos.create(
prompt="A graffiti character comes to life off a concrete wall under an urban railway bridge at night.",
model="ByteDance/Seedance-2.0",
resolution="720p",
seconds="5",
extra_body={"settings": {"audio": False}},
)
Image-to-video
Animate a still image by passing it as the first frame through media.frame_images.
import time
from together import Together
client = Together()
job = client.videos.create(
prompt="A black cat curiously gazes up at the sky. The camera slowly rises from eye level to a bird's-eye view.",
model="ByteDance/Seedance-2.0",
resolution="720p",
seconds="5",
media={
"frame_images": [
{
"input_image": "https://example.com/cat.png",
"frame": "first",
}
],
},
)
print(f"Job ID: {job.id}")
while True:
status = client.videos.retrieve(job.id)
if status.status == "completed":
print(f"Video URL: {status.outputs.video_url}")
break
elif status.status == "failed":
print(f"Error: {status.error}")
break
time.sleep(15)
First and last frame control
Pass two frame_images (one with frame: "first", one with frame: "last") to control both the starting and ending frames. The model generates smooth motion between the two keyframes.
job = client.videos.create(
prompt="Smooth cinematic transition with natural motion.",
model="ByteDance/Seedance-2.0",
resolution="720p",
seconds="5",
media={
"frame_images": [
{"input_image": "https://example.com/start.png", "frame": "first"},
{"input_image": "https://example.com/end.png", "frame": "last"},
],
},
)
If you pass one image without frame, it’s used as the first frame. If you pass two without frame, they’re used as first and last in order.
Reference-guided generation
Generate video featuring specific characters, objects, or scenes by passing reference images, reference videos, or both. Seedance 2.0 maintains identity, style, and composition from the references throughout the generated video. Multiple references combine for multi-character scenes.
job = client.videos.create(
prompt="A person dances on a neon-lit stage with dynamic camera motion.",
model="ByteDance/Seedance-2.0",
resolution="1080p",
ratio="16:9",
seconds="6",
media={
"reference_images": [
"https://example.com/character.png",
"https://example.com/outfit.png",
],
"reference_videos": [
{"video": "https://example.com/dance-style.mp4"},
],
},
)
Audio-guided generation
Drive video generation with an audio file by passing it through media.reference_audios. The model synchronizes the generated video to the audio, which is useful for lip sync, beat-matched motion, and narration-driven scenes. Audio-guided generation requires at least one reference image or reference video to anchor the visual subject.
job = client.videos.create(
prompt="The character raps energetically into a microphone, bobbing with the beat.",
model="ByteDance/Seedance-2.0",
resolution="720p",
seconds="10",
media={
"reference_images": [
"https://example.com/rapper.png",
],
"reference_audios": [
"https://example.com/rap-audio.mp3",
],
},
)
If no reference audio is provided, Seedance 2.0 still generates synchronized audio (dialogue, ambient sound, and effects) based on the prompt and visual content.
Parameters
| Parameter | Type | Description | Default |
|---|
prompt | string | Text description of the video to generate (2 to 3,000 characters). | Required |
model | string | ByteDance/Seedance-2.0. | Required |
resolution | string | Output resolution tier: 480p, 720p, or 1080p. Cannot be combined with width/height. | "720p" |
ratio | string | Aspect ratio: 16:9, 9:16, 1:1, 4:3, 3:4, or 21:9. | "16:9" |
width | integer | Explicit output width in pixels. Must be paired with height. | - |
height | integer | Explicit output height in pixels. Must be paired with width. | - |
seconds | string | Video duration in seconds, integer between 4 and 15. | "5" |
settings.audio | boolean | Whether to generate synchronized audio. Pass via extra_body in the Python SDK. | true |
media | object | Media inputs for the request (see below). | - |
The media object is the unified way to pass images, videos, and audio into a Seedance 2.0 request.
{
"prompt": "...",
"model": "ByteDance/Seedance-2.0",
"media": {
"frame_images": [],
"reference_images": [],
"reference_videos": [],
"reference_audios": []
}
}
| Field | Type | Description |
|---|
frame_images | array | Up to two keyframe images. Each item: {input_image, frame} where frame is "first" or "last". With one item and no frame, it’s used as the first frame. With two items and no frame, they’re used as first and last in order. |
reference_images | array | Up to nine reference images for character, object, or scene consistency. Each item is a URL or base64-encoded image. |
reference_videos | array | Up to three reference videos for motion or composition guidance. Each item: {video: "url"}. |
reference_audios | array | Up to three reference audio files to drive video generation. Each item is a URL. Requires at least one reference_images or reference_videos entry. |
frame_images cannot be combined with any reference input. Use one of the following modes per request:
| Mode | frame_images | reference_images | reference_videos | reference_audios |
|---|
| Text-to-video | - | - | - | - |
| Image-to-video | Up to two | - | - | - |
| Reference-guided | - | Up to nine | Up to three | - |
| Audio-guided | - | Up to nine | Up to three | Up to three (requires at least one reference image or video) |
Resolutions and aspect ratios
| Aspect ratio | 480p | 720p | 1080p |
|---|
| 16:9 | 864x496 | 1280x720 | 1920x1080 |
| 4:3 | 752x560 | 1112x834 | 1664x1248 |
| 1:1 | 640x640 | 960x960 | 1440x1440 |
| 3:4 | 560x752 | 834x1112 | 1248x1664 |
| 9:16 | 496x864 | 720x1280 | 1080x1920 |
| 21:9 | 992x432 | 1470x630 | 2206x946 |
To request dimensions outside this matrix, pass width and height directly instead of resolution and ratio.
Pricing
| Mode | 480p | 720p |
|---|
| Text-to-video, image-to-video | $0.07 / second | $0.16 / second |
| Video-to-video | from $0.13 / second | from $0.28 / second |
Prompting tips
Seedance 2.0 supports both Chinese and English prompts. Detailed prompts with subject, action, style, camera movement, and atmosphere produce the best results.
Write descriptive prompts. Instead of “a cat walking”, try “A small black cat walks gracefully through a sunlit garden, soft bokeh background, gentle breeze rustling the flowers, cinematic slow motion.”
For multi-shot scenes, describe the transitions explicitly. Seedance 2.0 follows shot-by-shot instructions like “Shot 1: wide aerial of the city. Shot 2: cut to a close-up of the protagonist’s face.”
Next steps