Skip to main content

Wan 2.7 T2V

Wan 2.7 T2V generates video from text prompts with optional audio-driven synchronization. It outputs 720P or 1080P video at 30fps in MP4 format, up to 15 seconds long.
ModelAPI StringBest ForDuration
Wan 2.7 T2VWan-AI/wan2.7-t2vText-to-video with audioUp to 15s

Text-to-Video

Generate a video from a text prompt. Video generation is asynchronous — you create a job, receive a job ID, and poll for the result.
import time
from together import Together

client = Together()

job = client.videos.create(
    prompt="A small cute cartoon kitten general in golden armor stands on a cliff, commanding an army of mice charging below. Epic ancient war atmosphere, dramatic clouds over snowy mountains.",
    model="Wan-AI/wan2.7-t2v",
    resolution="720P",
    ratio="16:9",
    seconds="10",
)

print(f"Job ID: {job.id}")

while True:
    status = client.videos.retrieve(job.id)
    print(f"Status: {status.status}")

    if status.status == "completed":
        print(f"Video URL: {status.outputs.video_url}")
        break
    elif status.status == "failed":
        print(f"Error: {status.error}")
        break

    time.sleep(5)

Text-to-Video with Audio

Drive video generation with an audio file using media.audio_inputs. The model synchronizes the generated video to the audio — useful for lip sync, beat-matched motion, or narration-driven scenes. If no audio is provided, the model automatically generates matching background music or sound effects.
import time
from together import Together

client = Together()

job = client.videos.create(
    prompt="A graffiti character comes to life off a concrete wall, rapping energetically under an urban railway bridge at night, lit by a lone streetlamp.",
    model="Wan-AI/wan2.7-t2v",
    resolution="720P",
    ratio="16:9",
    seconds="10",
    media={
        "audio_inputs": [
            {"audio": "https://example.com/rap-audio.mp3"},
        ],
    },
)

print(f"Job ID: {job.id}")

while True:
    status = client.videos.retrieve(job.id)
    print(f"Status: {status.status}")

    if status.status == "completed":
        print(f"Video URL: {status.outputs.video_url}")
        break
    elif status.status == "failed":
        print(f"Error: {status.error}")
        break

    time.sleep(5)
Audio constraints: WAV or MP3 format, 3-30 seconds, up to 15 MB. If the audio is longer than the video duration, it will be truncated. If shorter, the remaining portion of the video will be silent.

Parameters

ParameterTypeDescriptionDefault
promptstringText description of the video to generate (up to 5,000 characters)Required
modelstringModel identifier (Wan-AI/wan2.7-t2v)Required
resolutionstringVideo resolution tier (720P, 1080P)"1080P"
ratiostringAspect ratio (16:9, 9:16, 1:1, 4:3, 3:4)"16:9"
secondsstringVideo duration in seconds (2-15)"5"
seedintegerRandom seed for reproducibility (0-2,147,483,647)Random
negative_promptstringElements to exclude from generation (up to 500 characters)-
mediaobjectMedia inputs for the request (see below)-

Media Object

The media object is the unified way to pass images, videos, and audio into video generation requests. The full schema is:
{
  "prompt": "...",
  "model": "...",
  "media": {
    "frame_images": [],
    "frame_videos": [],
    "reference_images": [],
    "reference_videos": [],
    "source_videos": [],
    "audio_inputs": []
  }
}
FieldTypeDescription
frame_imagesarrayKeyframe images for image-to-video. Each item: {input_image, frame}.
frame_videosarrayInput video clips for video continuation.
reference_imagesarrayReference images for character/object consistency or visual guidance.
reference_videosarrayReference videos for character/object consistency.
source_videosarraySource video for editing workflows.
audio_inputsarrayAudio files to drive generation — lip sync, beat-matched motion, etc. Each item: {audio: "url"}. WAV or MP3, 3-30s, up to 15 MB.
Wan 2.7 T2V only supports audio_inputs. The other media fields are used by other Wan 2.7 models (I2V, R2V, Video Edit).

Prompting Tips

Wan 2.7 supports both Chinese and English prompts. Detailed, descriptive prompts produce the best results — include subject, action, style, camera movement, and atmosphere.
Write descriptive prompts. Instead of “a cat walking,” try “A small black cat walks gracefully through a sunlit garden, soft bokeh background, gentle breeze rustling the flowers, cinematic slow motion.” Use negative prompts to avoid common artifacts:
low resolution, errors, worst quality, low quality, incomplete, extra fingers, bad proportions, blurry, distorted
Control aspect ratio and resolution. Use resolution and ratio to set output dimensions:
Aspect Ratio720P Dimensions1080P Dimensions
16:91280x7201920x1080
9:16720x12801080x1920
1:1960x9601440x1440
4:31104x8321648x1248
3:4832x11041248x1648

Next Steps