Seedance 2.0 quickstart

Seedance 2.0 is a unified multimodal audio-video generation model from ByteDance. It accepts text, image, video, and audio inputs in any combination, and produces multi-shot videos up to 15 seconds with dual-channel synchronized audio (dialogue, ambient sound, and effects). Seedance 2.0 also supports physics-aware motion, video extension, and instruction-based editing.

Feature	Limit
Reference images	Up to nine
Reference videos	Up to three
Reference audios	Up to three
Frame images (first, last)	Up to two
Duration	4 to 15 seconds (integer)
Resolutions	480p, 720p, 1080p, 4k
Audio output	Generated by default

The model API string is ByteDance/Seedance-2.0.

Text-to-video

Generate a video from a text prompt. Video generation is asynchronous: you create a job, receive a job ID, and poll for the result.

import time
from together import Together

client = Together()

job = client.videos.create(
    prompt="A small cute cartoon kitten general in golden armor stands on a cliff, commanding an army of mice charging below. Epic ancient war atmosphere, dramatic clouds over snowy mountains.",
    model="ByteDance/Seedance-2.0",
    resolution="720p",
    ratio="16:9",
    seconds="5",
)

print(f"Job ID: {job.id}")

while True:
    status = client.videos.retrieve(job.id)
    print(f"Status: {status.status}")

    if status.status == "completed":
        print(f"Video URL: {status.outputs.video_url}")
        break
    elif status.status == "failed":
        print(f"Error: {status.error}")
        break

    time.sleep(15)

import Together from "together-ai";

const together = new Together();

async function main() {
  const job = await together.videos.create({
    prompt: "A small cute cartoon kitten general in golden armor stands on a cliff, commanding an army of mice charging below. Epic ancient war atmosphere, dramatic clouds over snowy mountains.",
    model: "ByteDance/Seedance-2.0",
    resolution: "720p",
    ratio: "16:9",
    seconds: "5",
  });

  console.log(`Job ID: ${job.id}`);

  while (true) {
    const status = await together.videos.retrieve(job.id);
    console.log(`Status: ${status.status}`);

    if (status.status === "completed") {
      console.log(`Video URL: ${status.outputs.video_url}`);
      break;
    } else if (status.status === "failed") {
      console.log(`Error: ${JSON.stringify(status.error)}`);
      break;
    }

    await new Promise((resolve) => setTimeout(resolve, 15000));
  }
}

main();

curl -X POST "https://api.together.xyz/v2/videos" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ByteDance/Seedance-2.0",
    "prompt": "A small cute cartoon kitten general in golden armor stands on a cliff, commanding an army of mice charging below. Epic ancient war atmosphere, dramatic clouds over snowy mountains.",
    "resolution": "720p",
    "ratio": "16:9",
    "seconds": "5"
  }'

Seedance 2.0 generates synchronized audio by default. To produce a silent video, set settings.audio to false.

job = client.videos.create(
    prompt="A graffiti character comes to life off a concrete wall under an urban railway bridge at night.",
    model="ByteDance/Seedance-2.0",
    resolution="720p",
    seconds="5",
    extra_body={"settings": {"audio": False}},
)

const job = await together.videos.create({
  prompt: "A graffiti character comes to life off a concrete wall under an urban railway bridge at night.",
  model: "ByteDance/Seedance-2.0",
  resolution: "720p",
  seconds: "5",
  // @ts-expect-error settings is a passthrough field
  settings: { audio: false },
});

curl -X POST "https://api.together.xyz/v2/videos" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ByteDance/Seedance-2.0",
    "prompt": "A graffiti character comes to life off a concrete wall under an urban railway bridge at night.",
    "resolution": "720p",
    "seconds": "5",
    "settings": {"audio": false}
  }'

Image-to-video

Animate a still image by passing it as the first frame through media.frame_images.

import time
from together import Together

client = Together()

job = client.videos.create(
    prompt="A black cat curiously gazes up at the sky. The camera slowly rises from eye level to a bird's-eye view.",
    model="ByteDance/Seedance-2.0",
    resolution="720p",
    seconds="5",
    media={
        "frame_images": [
            {
                "input_image": "https://example.com/cat.png",
                "frame": "first",
            }
        ],
    },
)

print(f"Job ID: {job.id}")

while True:
    status = client.videos.retrieve(job.id)
    if status.status == "completed":
        print(f"Video URL: {status.outputs.video_url}")
        break
    elif status.status == "failed":
        print(f"Error: {status.error}")
        break
    time.sleep(15)

import Together from "together-ai";

const together = new Together();

async function main() {
  const job = await together.videos.create({
    prompt: "A black cat curiously gazes up at the sky. The camera slowly rises from eye level to a bird's-eye view.",
    model: "ByteDance/Seedance-2.0",
    resolution: "720p",
    seconds: "5",
    media: {
      frame_images: [{
        input_image: "https://example.com/cat.png",
        frame: "first",
      }],
    },
  });

  console.log(`Job ID: ${job.id}`);

  while (true) {
    const status = await together.videos.retrieve(job.id);
    if (status.status === "completed") {
      console.log(`Video URL: ${status.outputs.video_url}`);
      break;
    } else if (status.status === "failed") {
      console.log(`Error: ${JSON.stringify(status.error)}`);
      break;
    }
    await new Promise((resolve) => setTimeout(resolve, 15000));
  }
}

main();

First and last frame control

Pass two frame_images (one with frame: "first", one with frame: "last") to control both the starting and ending frames. The model generates smooth motion between the two keyframes.

job = client.videos.create(
    prompt="Smooth cinematic transition with natural motion.",
    model="ByteDance/Seedance-2.0",
    resolution="720p",
    seconds="5",
    media={
        "frame_images": [
            {"input_image": "https://example.com/start.png", "frame": "first"},
            {"input_image": "https://example.com/end.png", "frame": "last"},
        ],
    },
)

const job = await together.videos.create({
  prompt: "Smooth cinematic transition with natural motion.",
  model: "ByteDance/Seedance-2.0",
  resolution: "720p",
  seconds: "5",
  media: {
    frame_images: [
      { input_image: "https://example.com/start.png", frame: "first" },
      { input_image: "https://example.com/end.png", frame: "last" },
    ],
  },
});

If you pass one image without frame, it’s used as the first frame. If you pass two without frame, they’re used as first and last in order.

Reference-guided generation

Generate video featuring specific characters, objects, or scenes by passing reference images, reference videos, or both. Seedance 2.0 maintains identity, style, and composition from the references throughout the generated video. Multiple references combine for multi-character scenes.

job = client.videos.create(
    prompt="A person dances on a neon-lit stage with dynamic camera motion.",
    model="ByteDance/Seedance-2.0",
    resolution="1080p",
    ratio="16:9",
    seconds="6",
    media={
        "reference_images": [
            "https://example.com/character.png",
            "https://example.com/outfit.png",
        ],
        "reference_videos": [
            {"video": "https://example.com/dance-style.mp4"},
        ],
    },
)

const job = await together.videos.create({
  prompt: "A person dances on a neon-lit stage with dynamic camera motion.",
  model: "ByteDance/Seedance-2.0",
  resolution: "1080p",
  ratio: "16:9",
  seconds: "6",
  media: {
    reference_images: [
      "https://example.com/character.png",
      "https://example.com/outfit.png",
    ],
    reference_videos: [
      { video: "https://example.com/dance-style.mp4" },
    ],
  },
});

Audio-guided generation

Drive video generation with an audio file by passing it through media.reference_audios. The model synchronizes the generated video to the audio, which is useful for lip sync, beat-matched motion, and narration-driven scenes. Audio-guided generation requires at least one reference image or reference video to anchor the visual subject.

job = client.videos.create(
    prompt="The character raps energetically into a microphone, bobbing with the beat.",
    model="ByteDance/Seedance-2.0",
    resolution="720p",
    seconds="10",
    media={
        "reference_images": [
            "https://example.com/rapper.png",
        ],
        "reference_audios": [
            "https://example.com/rap-audio.mp3",
        ],
    },
)

const job = await together.videos.create({
  prompt: "The character raps energetically into a microphone, bobbing with the beat.",
  model: "ByteDance/Seedance-2.0",
  resolution: "720p",
  seconds: "10",
  media: {
    reference_images: [
      "https://example.com/rapper.png",
    ],
    reference_audios: [
      "https://example.com/rap-audio.mp3",
    ],
  },
});

If no reference audio is provided, Seedance 2.0 still generates synchronized audio (dialogue, ambient sound, and effects) based on the prompt and visual content.

Parameters

Parameter	Type	Description	Default
`prompt`	string	Text description of the video to generate (2 to 3,000 characters).	Required
`model`	string	`ByteDance/Seedance-2.0`.	Required
`resolution`	string	Output resolution tier: `480p`, `720p`, `1080p`, or `4k`. Cannot be combined with `width`/`height`.	`"720p"`
`ratio`	string	Aspect ratio: `16:9`, `9:16`, `1:1`, `4:3`, `3:4`, or `21:9`.	`"16:9"`
`width`	integer	Explicit output width in pixels. Must be paired with `height`.	-
`height`	integer	Explicit output height in pixels. Must be paired with `width`.	-
`seconds`	string	Video duration in seconds, integer between 4 and 15.	`"5"`
`settings.audio`	boolean	Whether to generate synchronized audio. Pass via `extra_body` in the Python SDK.	`true`
`media`	object	Media inputs for the request (see below).	-

Media object

The media object is the unified way to pass images, videos, and audio into a Seedance 2.0 request.

{
  "prompt": "...",
  "model": "ByteDance/Seedance-2.0",
  "media": {
    "frame_images": [],
    "reference_images": [],
    "reference_videos": [],
    "reference_audios": []
  }
}

Field	Type	Description
`frame_images`	array	Up to two keyframe images. Each item: `{input_image, frame}` where `frame` is `"first"` or `"last"`. With one item and no `frame`, it’s used as the first frame. With two items and no `frame`, they’re used as first and last in order.
`reference_images`	array	Up to nine reference images for character, object, or scene consistency. Each item is a URL or base64-encoded image.
`reference_videos`	array	Up to three reference videos for motion or composition guidance. Each item: `{video: "url"}`. Each video must be between 2 and 15 seconds long.
`reference_audios`	array	Up to three reference audio files to drive video generation. Each item is a URL. Requires at least one `reference_images` or `reference_videos` entry.

Reference videos must be between 2 and 15 seconds long. Files outside this range are rejected with invalidDuration (Media seconds must be in [2, 15] seconds range). Trim your clip before submitting the job — for example, ffmpeg -i input.mp4 -t 5 -c copy reference.mp4 produces a 5-second reference clip.

Input compatibility

frame_images cannot be combined with any reference input. Use one of the following modes per request:

Mode	`frame_images`	`reference_images`	`reference_videos`	`reference_audios`
Text-to-video	-	-	-	-
Image-to-video	Up to two	-	-	-
Reference-guided	-	Up to nine	Up to three	-
Audio-guided	-	Up to nine	Up to three	Up to three (requires at least one reference image or video)

Resolutions and aspect ratios

Aspect ratio	480p	720p	1080p	4k
16:9	864x496	1280x720	1920x1080	3840x2160
4:3	752x560	1112x834	1664x1248	3326x2494
1:1	640x640	960x960	1440x1440	2880x2880
3:4	560x752	834x1112	1248x1664	2496x3328
9:16	496x864	720x1280	1080x1920	2160x3840
21:9	992x432	1470x630	2206x946	4398x1886

To request dimensions outside this matrix, pass width and height directly instead of resolution and ratio.

Pricing

Mode	480p	720p	1080p	4k
Text-to-video, image-to-video	$0.07 / second	$0.16 / second	$0.40 / second	$0.836 / second
Video-to-video	from $0.13 / second	from $0.28 / second	from $0.48 / second	from $1.050 / second

Prompting tips

Seedance 2.0 supports both Chinese and English prompts. Detailed prompts with subject, action, style, camera movement, and atmosphere produce the best results.

Write descriptive prompts. Instead of “a cat walking”, try “A small black cat walks gracefully through a sunlit garden, soft bokeh background, gentle breeze rustling the flowers, cinematic slow motion.” For multi-shot scenes, describe the transitions explicitly. Seedance 2.0 follows shot-by-shot instructions like “Shot 1: wide aerial of the city. Shot 2: cut to a close-up of the protagonist’s face.”

Next steps

Video generation overview for the full parameter reference and supported models.
API reference: create video for REST API details.
API reference: get video status for polling and status codes.

GUIDES

MODEL QUICKSTARTS

BUILD APPS

BUILD AGENTS

WORK WITH CODING AGENTS

RAG & SEARCH

SDK INTEGRATIONS

DEDICATED CONTAINERS

Text-to-video

Image-to-video

First and last frame control

Reference-guided generation

Audio-guided generation

Parameters

Media object

Input compatibility

Resolutions and aspect ratios

Pricing

Prompting tips

Next steps

​Text-to-video

​Image-to-video

​First and last frame control

​Reference-guided generation

​Audio-guided generation

​Parameters

​Media object

​Input compatibility

​Resolutions and aspect ratios

​Pricing

​Prompting tips

​Next steps

Text-to-video

Image-to-video

First and last frame control

Reference-guided generation

Audio-guided generation

Parameters

Media object

Input compatibility

Resolutions and aspect ratios

Pricing

Prompting tips

Next steps