Sora 2
OpenAI's flagship video generation with synchronized audio
Sora 2
OpenAI's flagship video and audio generation model that creates richly detailed, dynamic video clips with synchronized audio from natural language prompts or images.
| Property | Value |
|---|---|
| Model ID | openai/sora-2 |
| Provider | Replicate |
| Type | Video generation (text-to-video, image-to-video, audio-sync) |
Basic Usage
Sora 2 is a versatile video model that works with both text prompts and optional start images:
import { compose, generateVideo, videoModel } from "@synthome/sdk";
const execution = await compose(
generateVideo({
model: videoModel("openai/sora-2", "replicate"),
prompt:
"A serene beach sunset with waves gently lapping at the shore, seagulls calling in the distance",
}),
).execute();Options
| Option | Type | Default | Description |
|---|---|---|---|
prompt | string | required | Text description of the video scene |
aspectRatio | "16:9" | "9:16" | - | Video aspect ratio (landscape or portrait) |
duration | number (4-12) | - | Video duration in seconds |
resolution | "720p" | "1080p" | - | Output video resolution |
image | string | - | Starting frame image URL (image-to-video) |
Video Generation
Text-to-Video
Create videos from text descriptions:
generateVideo({
model: videoModel("openai/sora-2", "replicate"),
prompt:
"An orange tabby cat knocks over a ceramic mug on a wooden table, with the sound of ceramic breaking, in warm kitchen lighting",
duration: 8,
resolution: "1080p",
});Image-to-Video
Generate videos starting from an image:
generateVideo({
model: videoModel("openai/sora-2", "replicate"),
prompt: "The tree sways gently in the wind, leaves rustling",
image: "https://example.com/tree.jpg",
duration: 6,
});Audio Synchronization
Sora 2 automatically generates synchronized audio including:
- Background sounds (ambient noise, environment audio)
- Sound effects (matching on-screen actions)
- Dialogue (if specified in prompt)
- Music (when appropriate to the scene)
// Includes synchronized audio
generateVideo({
model: videoModel("openai/sora-2", "replicate"),
prompt:
"A coffee shop with barista making drinks, espresso machine hissing, gentle background chatter, jazz music playing",
duration: 10,
});Aspect Ratios
Choose landscape or portrait formats for different platforms:
// Landscape (16:9) - YouTube, desktop
generateVideo({
model: videoModel("openai/sora-2", "replicate"),
prompt: "A panoramic mountain vista at sunrise",
aspectRatio: "16:9",
});
// Portrait (9:16) - TikTok, Instagram Reels, mobile
generateVideo({
model: videoModel("openai/sora-2", "replicate"),
prompt: "A close-up of blooming flower",
aspectRatio: "9:16",
});Note: The SDK automatically converts standard aspect ratios (16:9, 9:16) to Replicate's format (landscape, portrait). You always use the standard notation in your code.
Resolution
Select output quality based on your needs:
// 720p - Faster generation, smaller files
generateVideo({
model: videoModel("openai/sora-2", "replicate"),
prompt: "Street scene with cars and pedestrians",
resolution: "720p",
});
// 1080p - Higher quality, larger files
generateVideo({
model: videoModel("openai/sora-2", "replicate"),
prompt: "Detailed architectural shot with intricate textures",
resolution: "1080p",
});Duration
Generate videos from 4 to 12 seconds:
// Short video (4-6 seconds)
generateVideo({
model: videoModel("openai/sora-2", "replicate"),
prompt: "A quick magic sparkle effect",
duration: 5,
});
// Longer video (8-12 seconds)
generateVideo({
model: videoModel("openai/sora-2", "replicate"),
prompt:
"A complete story arc: character walks, encounters obstacle, overcomes it",
duration: 12,
});Cinematic Control
Sora 2 understands cinematic terminology for professional-looking results:
generateVideo({
model: videoModel("openai/sora-2", "replicate"),
prompt:
"Cinematic IMAX-scale scene: wide establishing shot of futuristic city, dolly in towards main building, warm morning light, dramatic shadows",
aspectRatio: "16:9",
duration: 10,
resolution: "1080p",
});Examples
Marketing Content
generateVideo({
model: videoModel("openai/sora-2", "replicate"),
prompt:
"Elegant product shot of luxury watch, close-up rotating on velvet surface, soft spotlight, ambient luxury showroom sounds",
resolution: "1080p",
duration: 8,
});Social Media Content
generateVideo({
model: videoModel("openai/sora-2", "replicate"),
prompt:
"Upbeat cooking tutorial: chef chopping vegetables, sizzling sounds in pan, energetic background music, bright kitchen lighting",
aspectRatio: "9:16",
resolution: "720p",
duration: 6,
});Educational Content
generateVideo({
model: videoModel("openai/sora-2", "replicate"),
prompt:
"Scientific animation: water cycle, clouds forming, rain falling, river flowing, calm narrator voice explaining process",
duration: 12,
resolution: "1080p",
});Creative Animation
generateVideo({
model: videoModel("openai/sora-2", "replicate"),
prompt:
"Studio Ghibli style: young girl on flying bicycle over countryside, whimsical music, birds chirping, gentle wind sounds",
duration: 10,
});Best Practices
- Be Specific: Include details about lighting, movement, and sounds in your prompt
- Audio Descriptions: Mention specific sounds ("espresso machine hissing", "footsteps crunching") for better audio sync
- Duration: Shorter videos (4-8 seconds) tend to have better temporal consistency
- Physics: Sora 2 has improved physics simulation - describe realistic movements for best results
- Multi-Shot: For sequences, clearly delineate each shot: "Shot 1 (0-4s): wide shot. Shot 2 (4-8s): close-up"
- Cinematic Terms: Use terms like "dolly in", "pan left", "handheld camera" for precise control
Limitations
- Generation time can be several minutes depending on complexity
- Temporal consistency improves with shorter clips
- Very detailed text rendering may have artifacts
- Some highly complex scenarios may not render exactly as described
- Requires organization verification for OpenAI API keys
Audio Capabilities
Sora 2 generates sophisticated audio alongside video:
- Ambient sounds: Environment noises, weather, background activity
- Sound effects: Actions, impacts, movements synchronized with video
- Music: Background music matching the scene mood and style
- Dialogue: Character speech with lip synchronization
- ** Foley**: Detailed sound effects for enhanced realism
The audio is automatically embedded in the MP4 output with proper synchronization.
How is this guide?