Synthome Docs
Models

OpenAI Whisper

Speech-to-text transcription model

OpenAI Whisper

OpenAI's speech recognition model for transcribing audio to text.

PropertyValue
Model IDopenai/whisper
ProviderReplicate
TypeSpeech-to-text

Basic Usage

Used primarily with the captions() function for automatic subtitle generation:

import { compose, captions, audioModel } from "@synthome/sdk";

const execution = await compose(
  captions({
    video: "https://example.com/video.mp4",
    model: audioModel("openai/whisper", "replicate"),
  }),
).execute();

How It Works

Whisper analyzes the audio track of a video and produces:

  • Full text transcription
  • Sentence-level timestamps
  • Language detection

Use Cases

Auto-Captioning

captions({
  video: "https://example.com/video.mp4",
  model: audioModel("openai/whisper", "replicate"),
  style: { preset: "youtube" },
});

Transcription Pipeline

const execution = await compose(
  captions({
    video: merge([
      "https://example.com/clip1.mp4",
      "https://example.com/clip2.mp4",
    ]),
    model: audioModel("openai/whisper", "replicate"),
  }),
).execute();

Language Support

Whisper automatically detects and transcribes:

  • English
  • Spanish
  • French
  • German
  • Italian
  • Portuguese
  • And 90+ other languages

Example

Subtitle Generation

import {
  compose,
  captions,
  audioModel,
  videoModel,
  generateVideo,
} from "@synthome/sdk";

const execution = await compose(
  captions({
    video: generateVideo({
      model: videoModel("bytedance/seedance-1-pro", "replicate"),
      prompt: "A person giving a presentation",
    }),
    model: audioModel("openai/whisper", "replicate"),
    style: {
      preset: "minimal",
      marginV: 30,
    },
  }),
).execute();

How is this guide?