Skip to main content
Using a coding agent? Install the together-video skill to let your agent write correct video generation code automatically. Learn more.

Generating a video

Video generation is asynchronous. You create a job, receive a job ID, and poll for completion.
import time
from together import Together

client = Together()

# Create a video generation job
job = client.videos.create(
    prompt="A serene sunset over the ocean with gentle waves",
    model="minimax/video-01-director",
    width=1366,
    height=768,
)

print(f"Job ID: {job.id}")

# Poll until completion
while True:
    status = client.videos.retrieve(job.id)
    print(f"Status: {status.status}")

    if status.status == "completed":
        print(f"Video URL: {status.outputs.video_url}")
        break
    elif status.status == "failed":
        print("Video generation failed")
        break

    # Wait before checking again
    time.sleep(60)
Example output when the job is complete:
{
  "id": "019a0068-794a-7213-90f6-cc4eb62e3da7",
  "model": "minimax/video-01-director",
  "status": "completed",
  "info": {
    "user_id": "66f0bd504fb9511df3489b9a",
    "errors": null
  },
  "inputs": {
    "fps": null,
    "guidance_scale": null,
    "height": 768,
    "metadata": {},
    "model": "minimax/video-01-director",
    "output_quality": null,
    "prompt": "A serene sunset over the ocean with gentle waves",
    "seconds": null,
    "seed": null,
    "steps": null,
    "width": 1366
  },
  "outputs": {
    "cost": 0.28,
    "video_url": "https://api.together.ai/shrt/DwlaBdSakNRFlBxN"
  },
  "created_at": "2025-10-20T06:57:18.154804Z",
  "claimed_at": "0001-01-01T00:00:00Z",
  "done_at": "2025-10-20T07:00:12.234472Z"
}
Job Status Reference:
StatusDescription
queuedJob is waiting in queue
in_progressVideo is being generated
completedGeneration successful, video available
failedGeneration failed, check info.errors
cancelledJob was cancelled

Parameters

ParameterTypeDescriptionDefault
promptstringText description of the video to generateRequired
modelstringModel identifierRequired
widthintegerVideo width in pixels1366
heightintegerVideo height in pixels768
secondsstringLength of video (1-10)"6"
fpsintegerFrames per second15-60
stepsintegerDiffusion steps (higher = better quality, slower)10-50
guidance_scalefloatHow closely to follow prompt6.0-10.0
seedintegerRandom seed for reproducibilityany
output_formatstringVideo format (MP4, WEBM)MP4
output_qualityintegerBitrate/quality (lower = higher quality)20
negative_promptstringWhat to avoid in generation-
frame_imagesarrayKeyframe images for video generation. If size 1, starting frame; if size 2, starting and ending frame; if more than 2, frame must be specified per image.
resolutionstringVideo resolution tier (720P, 1080P). Used by Wan 2.7 models instead of width/height."1080P"
ratiostringAspect ratio (16:9, 9:16, 1:1, 4:3, 3:4). Used by Wan 2.7 models."16:9"
mediaobjectMedia inputs for the request (see schema and compatibility below)-
  • prompt is required for all models except Kling
  • width and height will rely on defaults unless otherwise specified - options for dimensions vary by model
  • Wan 2.7 models use resolution and ratio instead of width/height

Media Object

The media object is the unified way to pass images, videos, and audio into video generation requests.
{
  "prompt": "...",
  "model": "...",
  "media": {
    "frame_images": [],
    "frame_videos": [],
    "reference_images": [],
    "reference_videos": [],
    "source_video": "",
    "audio_inputs": []
  }
}
FieldTypeDescription
frame_imagesarrayKeyframe images for I2V. Each item: {input_image, frame} where frame is "first" or "last".
frame_videosarrayInput video clips for video continuation (I2V). Each item: {video: "url"}.
reference_imagesarrayReference images for character/object consistency (R2V) or visual guidance (Video Edit).
reference_videosarrayReference videos for character/object consistency (R2V). Each item: {video: "url"}.
source_videostringSource video URL to edit (Video Edit).
audio_inputsarrayAudio file URLs to drive generation — lip sync, beat-matched motion, etc. (T2V, I2V). Each item: "url". WAV or MP3, 3-30s, up to 15 MB.
Not all media fields are supported on every model. See the Wan 2.7 quickstart for field compatibility across Wan 2.7 models. These parameters vary by model, please refer to the models table for details. Generate customized videos using the above parameters:
import time
from together import Together

client = Together()

job = client.videos.create(
    prompt="A futuristic city at night with neon lights reflecting on wet streets",
    model="minimax/hailuo-02",
    width=1366,
    height=768,
    seconds="6",
    fps=30,
    steps=30,
    guidance_scale=8.0,
    output_format="MP4",
    output_quality=20,
    seed=42,
    negative_prompt="blurry, low quality, distorted",
)

print(f"Job ID: {job.id}")

# Poll until completion
while True:
    status = client.videos.retrieve(job.id)
    print(f"Status: {status.status}")

    if status.status == "completed":
        print(f"Video URL: {status.outputs.video_url}")
        print(f"Cost: ${status.outputs.cost}")
        break
    elif status.status == "failed":
        print("Video generation failed")
        break

    # Wait before checking again
    time.sleep(60)

Reference Images

Guide your video’s visual style with reference images:
import time
from together import Together

client = Together()

job = client.videos.create(
    prompt="A cat dancing energetically",
    model="minimax/hailuo-02",
    width=1366,
    height=768,
    seconds="6",
    reference_images=[
        "https://cdn.pixabay.com/photo/2020/05/20/08/27/cat-5195431_1280.jpg",
    ],
)

print(f"Job ID: {job.id}")

# Poll until completion
while True:
    status = client.videos.retrieve(job.id)
    print(f"Status: {status.status}")

    if status.status == "completed":
        print(f"Video URL: {status.outputs.video_url}")
        break
    elif status.status == "failed":
        print("Video generation failed")
        break

    # Wait before checking again
    time.sleep(60)

Keyframe Control

Control specific frames in your video for precise transitions. Single Keyframe: Set a single(for the example below this is the first frame) frame to a specific image. Depending on the model you can also specify to set multiple keyframes please refer to the models table for details.
import base64
import requests
import time
from together import Together

client = Together()

# Download image and encode to base64
image_url = (
    "https://cdn.pixabay.com/photo/2020/05/20/08/27/cat-5195431_1280.jpg"
)
response = requests.get(image_url)
base64_image = base64.b64encode(response.content).decode("utf-8")

# Single keyframe at start
job = client.videos.create(
    prompt="Smooth transition from day to night",
    model="minimax/hailuo-02",
    width=1366,
    height=768,
    fps=24,
    frame_images=[{"input_image": base64_image, "frame": 0}],  # Starting frame
)

print(f"Job ID: {job.id}")

# Poll until completion
while True:
    status = client.videos.retrieve(job.id)
    print(f"Status: {status.status}")

    if status.status == "completed":
        print(f"Video URL: {status.outputs.video_url}")
        break
    elif status.status == "failed":
        print("Video generation failed")
        break

    # Wait before checking again
    time.sleep(60)
💡 Tip: Frame number = seconds × fps

Audio Input

For models that support audio-driven generation (such as Wan 2.7 T2V), you can pass an audio file via the media.audio_inputs field. The model synchronizes the generated video to the audio — useful for lip sync, beat-matched motion, or narration-driven scenes.
import time
from together import Together

client = Together()

job = client.videos.create(
    prompt="A cartoon kitten general in golden armor stands on a cliff, commanding an army",
    model="Wan-AI/wan2.7-t2v",
    resolution="720P",
    ratio="16:9",
    seconds="10",
    media={"audio_inputs": ["https://example.com/audio.mp3"]},
)

print(f"Job ID: {job.id}")

# Poll until completion
while True:
    status = client.videos.retrieve(job.id)
    print(f"Status: {status.status}")

    if status.status == "completed":
        print(f"Video URL: {status.outputs.video_url}")
        break
    elif status.status == "failed":
        print("Video generation failed")
        break

    time.sleep(60)
If no audio is provided, the model automatically generates matching background music or sound effects based on the video content.
Audio constraints: WAV or MP3 format, 3–30 seconds, up to 15 MB. If the audio is longer than the video duration, it will be truncated. If shorter, the remaining portion of the video will be silent.

Guidance Scale

Controls how closely the model follows your prompt:
  • 6.0-7.0: More creative, less literal
  • 7.0-9.0: Sweet spot for most use cases
  • 9.0-10.0: Strict adherence to prompt
  • >12.0: Avoid - may cause artifacts
from together import Together

client = Together()

# Low guidance - more creative interpretation
job_creative = client.videos.create(
    prompt="an astronaut riding a horse on the moon",
    model="minimax/hailuo-02",
    guidance_scale=6.0,
    seed=100,
)

# High guidance - closer to literal prompt
job_literal = client.videos.create(
    prompt="an astronaut riding a horse on the moon",
    model="minimax/hailuo-02",
    guidance_scale=10.0,
    seed=100,
)

Quality Control with Steps

Trade off between generation time and quality:
  • 10 steps: Quick testing, lower quality
  • 20 steps: Standard quality, good balance
  • 30-40 steps: Production quality, slower
  • >50 steps: Diminishing returns
# Quick preview
job_quick = client.videos.create(
    prompt="A person walking through a forest",
    model="minimax/hailuo-02",
    steps=10,
)

# Production quality
job_production = client.videos.create(
    prompt="A person walking through a forest",
    model="minimax/hailuo-02",
    steps=40,
)

Supported Model Details

See our supported video models and relevant parameters below.
OrganizationNameModel API StringDurationDimensionsFPSKeyframesPromptReference Images
MiniMaxMiniMax 01 Directorminimax/video-01-director5s1366×76825First2-3000 char
MiniMaxMiniMax Hailuo 02minimax/hailuo-0210s1366×768, 1920×108025First2-3000 char
GoogleVeo 2.0google/veo-2.05s1280×720, 720×128024First, Last2-3000 char
GoogleVeo 3.0google/veo-3.08s1280×720, 720×1280, 1920×1080, 1080×192024First2-3000 char
GoogleVeo 3.0 + Audiogoogle/veo-3.0-audio8s1280×720, 720×1280, 1920×1080, 1080×192024First2-3000 char
GoogleVeo 3.0 Fastgoogle/veo-3.0-fast8s1280×720, 720×1280, 1920×1080, 1080×192024First2-3000 char
GoogleVeo 3.0 Fast + Audiogoogle/veo-3.0-fast-audio8s1280×720, 720×1280, 1920×1080, 1080×192024First2-3000 char
ByteDanceSeedance 1.0 LiteByteDance/Seedance-1.0-lite5s864×480, 736×544, 640×640, 960×416, 416×960, 1248×704, 1120×832, 960×960, 1504×640, 640×150424First, Last2-3000 char
ByteDanceSeedance 1.0 ProByteDance/Seedance-1.0-pro5s864×480, 736×544, 640×640, 960×416, 416×960, 1248×704, 1120×832, 960×960, 1504×640, 640×150424First, Last2-3000 char
PixVersePixVerse v5pixverse/pixverse-v55s640×360, 480×360, 360×360, 270×360, 360×640, 960×540, 720×540, 540×540, 405×540, 540×960, 1280×720,
960×720, 720×720, 540×720, 720×1280, 1920×1080, 1440×1080, 1080×1080, 810×1080, 1080×192016, 24First, Last2-2048 char
KuaishouKling 2.1 MasterkwaivgI/kling-2.1-master5s1920×1080, 1080×1080, 1080×192024First2-2500 char
KuaishouKling 2.1 StandardkwaivgI/kling-2.1-standard5s1920×1080, 1080×1080, 1080×192024First-
KuaishouKling 2.1 ProkwaivgI/kling-2.1-pro5s1920×1080, 1080×1080, 1080×192024First, Last-
KuaishouKling 2.0 MasterkwaivgI/kling-2.0-master5s1280×720, 720×720, 720×128024First2-2500 char
KuaishouKling 1.6 StandardkwaivgI/kling-1.6-standard5s1920×1080, 1080×1080, 1080×192030, 24First2-2500 char
KuaishouKling 1.6 ProkwaivgI/kling-1.6-pro5s1920×1080, 1080×1080, 1080×192024First-
Wan-AIWan 2.2 I2VWan-AI/Wan2.2-I2V-A14B-----
Wan-AIWan 2.2 T2VWan-AI/Wan2.2-T2V-A14B-----
Wan-AIWan 2.7 T2VWan-AI/wan2.7-t2v2-15s720P, 1080P (16:9, 9:16, 1:1, 4:3, 3:4)30-2-5000 char
Wan-AIWan 2.7 I2VWan-AI/wan2.7-i2v2-15s720P, 1080P (16:9, 9:16, 1:1, 4:3, 3:4)30First, Last2-5000 char
Wan-AIWan 2.7 R2VWan-AI/wan2.7-r2v2-10s720P, 1080P (16:9, 9:16, 1:1, 4:3, 3:4)30-2-5000 char
Wan-AIWan 2.7 Video EditWan-AI/wan2.7-videoedit2-10s720P, 1080P (16:9, 9:16, 1:1, 4:3, 3:4)30-2-5000 char
ViduVidu 2.0vidu/vidu-2.08s1920×1080, 1080×1080, 1080×1920, 1280×720, 720×720, 720×1280, 640×360, 360×360, 360×64024First, Last2-3000 char
ViduVidu Q1vidu/vidu-q15s1920×1080, 1080×1080, 1080×192024First, Last2-3000 char
OpenAISora 2openai/sora-28s1280×720, 720×1280-First1-4000 char
OpenAISora 2 Proopenai/sora-2-pro8s1280×720, 720×1280-First1-4000 char

Troubleshooting

Video doesn’t match prompt well:
  • Increase guidance_scale to 8-10
  • Make prompt more descriptive and specific
  • Add negative_prompt to exclude unwanted elements
Video has artifacts:
  • Reduce guidance_scale (keep below 12)
  • Increase steps to 30-40
  • Adjust fps if motion looks unnatural
Generation is too slow:
  • Reduce steps (try 10-20 for testing)
  • Use shorter seconds during development
  • Lower fps for slower-paced scenes
URLs expire:
  • Download videos immediately after completion
  • Don’t rely on URLs for long-term storage