Synthome Docs
Models

Sora 2

OpenAI's flagship video generation with synchronized audio

Sora 2

OpenAI's flagship video and audio generation model that creates richly detailed, dynamic video clips with synchronized audio from natural language prompts or images.

PropertyValue
Model IDopenai/sora-2
ProviderReplicate
TypeVideo generation (text-to-video, image-to-video, audio-sync)

Basic Usage

Sora 2 is a versatile video model that works with both text prompts and optional start images:

import { compose, generateVideo, videoModel } from "@synthome/sdk";

const execution = await compose(
  generateVideo({
    model: videoModel("openai/sora-2", "replicate"),
    prompt:
      "A serene beach sunset with waves gently lapping at the shore, seagulls calling in the distance",
  }),
).execute();

Options

OptionTypeDefaultDescription
promptstringrequiredText description of the video scene
aspectRatio"16:9" | "9:16"-Video aspect ratio (landscape or portrait)
durationnumber (4-12)-Video duration in seconds
resolution"720p" | "1080p"-Output video resolution
imagestring-Starting frame image URL (image-to-video)

Video Generation

Text-to-Video

Create videos from text descriptions:

generateVideo({
  model: videoModel("openai/sora-2", "replicate"),
  prompt:
    "An orange tabby cat knocks over a ceramic mug on a wooden table, with the sound of ceramic breaking, in warm kitchen lighting",
  duration: 8,
  resolution: "1080p",
});

Image-to-Video

Generate videos starting from an image:

generateVideo({
  model: videoModel("openai/sora-2", "replicate"),
  prompt: "The tree sways gently in the wind, leaves rustling",
  image: "https://example.com/tree.jpg",
  duration: 6,
});

Audio Synchronization

Sora 2 automatically generates synchronized audio including:

  • Background sounds (ambient noise, environment audio)
  • Sound effects (matching on-screen actions)
  • Dialogue (if specified in prompt)
  • Music (when appropriate to the scene)
// Includes synchronized audio
generateVideo({
  model: videoModel("openai/sora-2", "replicate"),
  prompt:
    "A coffee shop with barista making drinks, espresso machine hissing, gentle background chatter, jazz music playing",
  duration: 10,
});

Aspect Ratios

Choose landscape or portrait formats for different platforms:

// Landscape (16:9) - YouTube, desktop
generateVideo({
  model: videoModel("openai/sora-2", "replicate"),
  prompt: "A panoramic mountain vista at sunrise",
  aspectRatio: "16:9",
});

// Portrait (9:16) - TikTok, Instagram Reels, mobile
generateVideo({
  model: videoModel("openai/sora-2", "replicate"),
  prompt: "A close-up of blooming flower",
  aspectRatio: "9:16",
});

Note: The SDK automatically converts standard aspect ratios (16:9, 9:16) to Replicate's format (landscape, portrait). You always use the standard notation in your code.

Resolution

Select output quality based on your needs:

// 720p - Faster generation, smaller files
generateVideo({
  model: videoModel("openai/sora-2", "replicate"),
  prompt: "Street scene with cars and pedestrians",
  resolution: "720p",
});

// 1080p - Higher quality, larger files
generateVideo({
  model: videoModel("openai/sora-2", "replicate"),
  prompt: "Detailed architectural shot with intricate textures",
  resolution: "1080p",
});

Duration

Generate videos from 4 to 12 seconds:

// Short video (4-6 seconds)
generateVideo({
  model: videoModel("openai/sora-2", "replicate"),
  prompt: "A quick magic sparkle effect",
  duration: 5,
});

// Longer video (8-12 seconds)
generateVideo({
  model: videoModel("openai/sora-2", "replicate"),
  prompt:
    "A complete story arc: character walks, encounters obstacle, overcomes it",
  duration: 12,
});

Cinematic Control

Sora 2 understands cinematic terminology for professional-looking results:

generateVideo({
  model: videoModel("openai/sora-2", "replicate"),
  prompt:
    "Cinematic IMAX-scale scene: wide establishing shot of futuristic city, dolly in towards main building, warm morning light, dramatic shadows",
  aspectRatio: "16:9",
  duration: 10,
  resolution: "1080p",
});

Examples

Marketing Content

generateVideo({
  model: videoModel("openai/sora-2", "replicate"),
  prompt:
    "Elegant product shot of luxury watch, close-up rotating on velvet surface, soft spotlight, ambient luxury showroom sounds",
  resolution: "1080p",
  duration: 8,
});

Social Media Content

generateVideo({
  model: videoModel("openai/sora-2", "replicate"),
  prompt:
    "Upbeat cooking tutorial: chef chopping vegetables, sizzling sounds in pan, energetic background music, bright kitchen lighting",
  aspectRatio: "9:16",
  resolution: "720p",
  duration: 6,
});

Educational Content

generateVideo({
  model: videoModel("openai/sora-2", "replicate"),
  prompt:
    "Scientific animation: water cycle, clouds forming, rain falling, river flowing, calm narrator voice explaining process",
  duration: 12,
  resolution: "1080p",
});

Creative Animation

generateVideo({
  model: videoModel("openai/sora-2", "replicate"),
  prompt:
    "Studio Ghibli style: young girl on flying bicycle over countryside, whimsical music, birds chirping, gentle wind sounds",
  duration: 10,
});

Best Practices

  1. Be Specific: Include details about lighting, movement, and sounds in your prompt
  2. Audio Descriptions: Mention specific sounds ("espresso machine hissing", "footsteps crunching") for better audio sync
  3. Duration: Shorter videos (4-8 seconds) tend to have better temporal consistency
  4. Physics: Sora 2 has improved physics simulation - describe realistic movements for best results
  5. Multi-Shot: For sequences, clearly delineate each shot: "Shot 1 (0-4s): wide shot. Shot 2 (4-8s): close-up"
  6. Cinematic Terms: Use terms like "dolly in", "pan left", "handheld camera" for precise control

Limitations

  • Generation time can be several minutes depending on complexity
  • Temporal consistency improves with shorter clips
  • Very detailed text rendering may have artifacts
  • Some highly complex scenarios may not render exactly as described
  • Requires organization verification for OpenAI API keys

Audio Capabilities

Sora 2 generates sophisticated audio alongside video:

  • Ambient sounds: Environment noises, weather, background activity
  • Sound effects: Actions, impacts, movements synchronized with video
  • Music: Background music matching the scene mood and style
  • Dialogue: Character speech with lip synchronization
  • ** Foley**: Detailed sound effects for enhanced realism

The audio is automatically embedded in the MP4 output with proper synchronization.

How is this guide?

On this page