Wan 2.7 T2V
Wan 2.7 T2V generates video from text prompts with optional audio-driven synchronization. It outputs 720P or 1080P video at 30fps in MP4 format, up to 15 seconds long.
| Model | API String | Best For | Duration |
|---|
| Wan 2.7 T2V | Wan-AI/wan2.7-t2v | Text-to-video with audio | Up to 15s |
Text-to-Video
Generate a video from a text prompt. Video generation is asynchronous — you create a job, receive a job ID, and poll for the result.
import time
from together import Together
client = Together()
job = client.videos.create(
prompt="A small cute cartoon kitten general in golden armor stands on a cliff, commanding an army of mice charging below. Epic ancient war atmosphere, dramatic clouds over snowy mountains.",
model="Wan-AI/wan2.7-t2v",
resolution="720P",
ratio="16:9",
seconds="10",
)
print(f"Job ID: {job.id}")
while True:
status = client.videos.retrieve(job.id)
print(f"Status: {status.status}")
if status.status == "completed":
print(f"Video URL: {status.outputs.video_url}")
break
elif status.status == "failed":
print(f"Error: {status.error}")
break
time.sleep(5)
Text-to-Video with Audio
Drive video generation with an audio file using media.audio_inputs. The model synchronizes the generated video to the audio — useful for lip sync, beat-matched motion, or narration-driven scenes. If no audio is provided, the model automatically generates matching background music or sound effects.
import time
from together import Together
client = Together()
job = client.videos.create(
prompt="A graffiti character comes to life off a concrete wall, rapping energetically under an urban railway bridge at night, lit by a lone streetlamp.",
model="Wan-AI/wan2.7-t2v",
resolution="720P",
ratio="16:9",
seconds="10",
media={
"audio_inputs": [
{"audio": "https://example.com/rap-audio.mp3"},
],
},
)
print(f"Job ID: {job.id}")
while True:
status = client.videos.retrieve(job.id)
print(f"Status: {status.status}")
if status.status == "completed":
print(f"Video URL: {status.outputs.video_url}")
break
elif status.status == "failed":
print(f"Error: {status.error}")
break
time.sleep(5)
Audio constraints: WAV or MP3 format, 3-30 seconds, up to 15 MB. If the audio is longer than the video duration, it will be truncated. If shorter, the remaining portion of the video will be silent.
Parameters
| Parameter | Type | Description | Default |
|---|
prompt | string | Text description of the video to generate (up to 5,000 characters) | Required |
model | string | Model identifier (Wan-AI/wan2.7-t2v) | Required |
resolution | string | Video resolution tier (720P, 1080P) | "1080P" |
ratio | string | Aspect ratio (16:9, 9:16, 1:1, 4:3, 3:4) | "16:9" |
seconds | string | Video duration in seconds (2-15) | "5" |
seed | integer | Random seed for reproducibility (0-2,147,483,647) | Random |
negative_prompt | string | Elements to exclude from generation (up to 500 characters) | - |
media | object | Media inputs for the request (see below) | - |
The media object is the unified way to pass images, videos, and audio into video generation requests. The full schema is:
{
"prompt": "...",
"model": "...",
"media": {
"frame_images": [],
"frame_videos": [],
"reference_images": [],
"reference_videos": [],
"source_videos": [],
"audio_inputs": []
}
}
| Field | Type | Description |
|---|
frame_images | array | Keyframe images for image-to-video. Each item: {input_image, frame}. |
frame_videos | array | Input video clips for video continuation. |
reference_images | array | Reference images for character/object consistency or visual guidance. |
reference_videos | array | Reference videos for character/object consistency. |
source_videos | array | Source video for editing workflows. |
audio_inputs | array | Audio files to drive generation — lip sync, beat-matched motion, etc. Each item: {audio: "url"}. WAV or MP3, 3-30s, up to 15 MB. |
Wan 2.7 T2V only supports audio_inputs. The other media fields are used by other Wan 2.7 models (I2V, R2V, Video Edit).
Prompting Tips
Wan 2.7 supports both Chinese and English prompts. Detailed, descriptive prompts produce the best results — include subject, action, style, camera movement, and atmosphere.
Write descriptive prompts. Instead of “a cat walking,” try “A small black cat walks gracefully through a sunlit garden, soft bokeh background, gentle breeze rustling the flowers, cinematic slow motion.”
Use negative prompts to avoid common artifacts:
low resolution, errors, worst quality, low quality, incomplete, extra fingers, bad proportions, blurry, distorted
Control aspect ratio and resolution. Use resolution and ratio to set output dimensions:
| Aspect Ratio | 720P Dimensions | 1080P Dimensions |
|---|
| 16:9 | 1280x720 | 1920x1080 |
| 9:16 | 720x1280 | 1080x1920 |
| 1:1 | 960x960 | 1440x1440 |
| 4:3 | 1104x832 | 1648x1248 |
| 3:4 | 832x1104 | 1248x1648 |
Next Steps