Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.together.ai/llms.txt

Use this file to discover all available pages before exploring further.

Seedance 2.0 is a unified multimodal audio-video generation model from ByteDance. It accepts text, image, video, and audio inputs in any combination, and produces multi-shot videos up to 15 seconds with dual-channel synchronized audio (dialogue, ambient sound, and effects). Seedance 2.0 also supports physics-aware motion, video extension, and instruction-based editing.
FeatureLimit
Reference imagesUp to nine
Reference videosUp to three
Reference audiosUp to three
Frame images (first, last)Up to two
Duration4 to 15 seconds (integer)
Resolutions480p, 720p, 1080p
Audio outputGenerated by default
The model API string is ByteDance/Seedance-2.0.

Text-to-video

Generate a video from a text prompt. Video generation is asynchronous: you create a job, receive a job ID, and poll for the result.
import time
from together import Together

client = Together()

job = client.videos.create(
    prompt="A small cute cartoon kitten general in golden armor stands on a cliff, commanding an army of mice charging below. Epic ancient war atmosphere, dramatic clouds over snowy mountains.",
    model="ByteDance/Seedance-2.0",
    resolution="720p",
    ratio="16:9",
    seconds="5",
)

print(f"Job ID: {job.id}")

while True:
    status = client.videos.retrieve(job.id)
    print(f"Status: {status.status}")

    if status.status == "completed":
        print(f"Video URL: {status.outputs.video_url}")
        break
    elif status.status == "failed":
        print(f"Error: {status.error}")
        break

    time.sleep(15)
Seedance 2.0 generates synchronized audio by default. To produce a silent video, set settings.audio to false.
job = client.videos.create(
    prompt="A graffiti character comes to life off a concrete wall under an urban railway bridge at night.",
    model="ByteDance/Seedance-2.0",
    resolution="720p",
    seconds="5",
    extra_body={"settings": {"audio": False}},
)

Image-to-video

Animate a still image by passing it as the first frame through media.frame_images.
import time
from together import Together

client = Together()

job = client.videos.create(
    prompt="A black cat curiously gazes up at the sky. The camera slowly rises from eye level to a bird's-eye view.",
    model="ByteDance/Seedance-2.0",
    resolution="720p",
    seconds="5",
    media={
        "frame_images": [
            {
                "input_image": "https://example.com/cat.png",
                "frame": "first",
            }
        ],
    },
)

print(f"Job ID: {job.id}")

while True:
    status = client.videos.retrieve(job.id)
    if status.status == "completed":
        print(f"Video URL: {status.outputs.video_url}")
        break
    elif status.status == "failed":
        print(f"Error: {status.error}")
        break
    time.sleep(15)

First and last frame control

Pass two frame_images (one with frame: "first", one with frame: "last") to control both the starting and ending frames. The model generates smooth motion between the two keyframes.
job = client.videos.create(
    prompt="Smooth cinematic transition with natural motion.",
    model="ByteDance/Seedance-2.0",
    resolution="720p",
    seconds="5",
    media={
        "frame_images": [
            {"input_image": "https://example.com/start.png", "frame": "first"},
            {"input_image": "https://example.com/end.png", "frame": "last"},
        ],
    },
)
If you pass one image without frame, it’s used as the first frame. If you pass two without frame, they’re used as first and last in order.

Reference-guided generation

Generate video featuring specific characters, objects, or scenes by passing reference images, reference videos, or both. Seedance 2.0 maintains identity, style, and composition from the references throughout the generated video. Multiple references combine for multi-character scenes.
job = client.videos.create(
    prompt="A person dances on a neon-lit stage with dynamic camera motion.",
    model="ByteDance/Seedance-2.0",
    resolution="1080p",
    ratio="16:9",
    seconds="6",
    media={
        "reference_images": [
            "https://example.com/character.png",
            "https://example.com/outfit.png",
        ],
        "reference_videos": [
            {"video": "https://example.com/dance-style.mp4"},
        ],
    },
)

Audio-guided generation

Drive video generation with an audio file by passing it through media.reference_audios. The model synchronizes the generated video to the audio, which is useful for lip sync, beat-matched motion, and narration-driven scenes. Audio-guided generation requires at least one reference image or reference video to anchor the visual subject.
job = client.videos.create(
    prompt="The character raps energetically into a microphone, bobbing with the beat.",
    model="ByteDance/Seedance-2.0",
    resolution="720p",
    seconds="10",
    media={
        "reference_images": [
            "https://example.com/rapper.png",
        ],
        "reference_audios": [
            "https://example.com/rap-audio.mp3",
        ],
    },
)
If no reference audio is provided, Seedance 2.0 still generates synchronized audio (dialogue, ambient sound, and effects) based on the prompt and visual content.

Parameters

ParameterTypeDescriptionDefault
promptstringText description of the video to generate (2 to 3,000 characters).Required
modelstringByteDance/Seedance-2.0.Required
resolutionstringOutput resolution tier: 480p, 720p, or 1080p. Cannot be combined with width/height."720p"
ratiostringAspect ratio: 16:9, 9:16, 1:1, 4:3, 3:4, or 21:9."16:9"
widthintegerExplicit output width in pixels. Must be paired with height.-
heightintegerExplicit output height in pixels. Must be paired with width.-
secondsstringVideo duration in seconds, integer between 4 and 15."5"
settings.audiobooleanWhether to generate synchronized audio. Pass via extra_body in the Python SDK.true
mediaobjectMedia inputs for the request (see below).-

Media object

The media object is the unified way to pass images, videos, and audio into a Seedance 2.0 request.
{
  "prompt": "...",
  "model": "ByteDance/Seedance-2.0",
  "media": {
    "frame_images": [],
    "reference_images": [],
    "reference_videos": [],
    "reference_audios": []
  }
}
FieldTypeDescription
frame_imagesarrayUp to two keyframe images. Each item: {input_image, frame} where frame is "first" or "last". With one item and no frame, it’s used as the first frame. With two items and no frame, they’re used as first and last in order.
reference_imagesarrayUp to nine reference images for character, object, or scene consistency. Each item is a URL or base64-encoded image.
reference_videosarrayUp to three reference videos for motion or composition guidance. Each item: {video: "url"}.
reference_audiosarrayUp to three reference audio files to drive video generation. Each item is a URL. Requires at least one reference_images or reference_videos entry.

Input compatibility

frame_images cannot be combined with any reference input. Use one of the following modes per request:
Modeframe_imagesreference_imagesreference_videosreference_audios
Text-to-video----
Image-to-videoUp to two---
Reference-guided-Up to nineUp to three-
Audio-guided-Up to nineUp to threeUp to three (requires at least one reference image or video)

Resolutions and aspect ratios

Aspect ratio480p720p1080p
16:9864x4961280x7201920x1080
4:3752x5601112x8341664x1248
1:1640x640960x9601440x1440
3:4560x752834x11121248x1664
9:16496x864720x12801080x1920
21:9992x4321470x6302206x946
To request dimensions outside this matrix, pass width and height directly instead of resolution and ratio.

Pricing

Mode480p720p
Text-to-video, image-to-video$0.07 / second$0.16 / second
Video-to-videofrom $0.13 / secondfrom $0.28 / second

Prompting tips

Seedance 2.0 supports both Chinese and English prompts. Detailed prompts with subject, action, style, camera movement, and atmosphere produce the best results.
Write descriptive prompts. Instead of “a cat walking”, try “A small black cat walks gracefully through a sunlit garden, soft bokeh background, gentle breeze rustling the flowers, cinematic slow motion.” For multi-shot scenes, describe the transitions explicitly. Seedance 2.0 follows shot-by-shot instructions like “Shot 1: wide aerial of the city. Shot 2: cut to a close-up of the protagonist’s face.”

Next steps