Synthome Docs
Operations

Captions

Add auto-generated or custom subtitles to videos

captions()

Add captions to videos with automatic transcription or custom timing.

import { compose, captions, audioModel } from "@synthome/sdk";

const execution = await compose(
  captions({
    video: "https://example.com/video.mp4",
    model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
  }),
).execute();

Auto-Generated Captions

Use a transcription model to automatically generate captions:

captions({
  video: "https://example.com/video.mp4",
  model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
});

Available Transcription Models

ModelProviderSpeedNotes
vaibhavs10/incredibly-fast-whisperreplicateVery fastRecommended for most
openai/whisperreplicateStandardOriginal Whisper model
// Fast transcription
captions({
  video: "https://example.com/video.mp4",
  model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
});

// Standard Whisper
captions({
  video: "https://example.com/video.mp4",
  model: audioModel("openai/whisper", "replicate"),
});

Custom Captions

Provide your own word-level timing:

captions({
  video: "https://example.com/video.mp4",
  captions: [
    { word: "Hello", start: 0.0, end: 0.5 },
    { word: "world", start: 0.5, end: 1.0 },
    { word: "this", start: 1.2, end: 1.4 },
    { word: "is", start: 1.4, end: 1.6 },
    { word: "a", start: 1.6, end: 1.7 },
    { word: "video", start: 1.7, end: 2.2 },
  ],
});

Caption Styles

Style Presets

Use built-in presets for popular platforms:

captions({
  video: "https://example.com/video.mp4",
  model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
  style: { preset: "tiktok" },
});
PresetDescription
tiktokBold, centered, mobile-optimized
youtubeClean, bottom-positioned
storyVertical video friendly
minimalSubtle, unobtrusive
cinematicFilm-style subtitles

Custom Font Styling

captions({
  video: "https://example.com/video.mp4",
  model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
  style: {
    fontFamily: "Arial",
    fontSize: 48,
    fontWeight: "bold",
    color: "#FFFFFF",
    outlineColor: "#000000",
    outlineWidth: 2,
  },
});

Positioning

captions({
  video: "https://example.com/video.mp4",
  model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
  style: {
    alignment: "center",
    marginV: 50, // Vertical margin from bottom
    marginL: 20, // Left margin
    marginR: 20, // Right margin
  },
});

Word Highlighting

Highlight the currently spoken word:

captions({
  video: "https://example.com/video.mp4",
  model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
  style: {
    highlightActiveWord: true,
    activeWordColor: "#FFFF00", // Yellow highlight
    inactiveWordColor: "#FFFFFF", // White for other words
  },
});

Animation Styles

style: {
  highlightActiveWord: true,
  animationStyle: "color",  // Options: "none", "color", "scale", "glow"
  activeWordScale: 1.2,     // Scale up active word
}

Caption Behavior

Control how captions are grouped and displayed:

captions({
  video: "https://example.com/video.mp4",
  model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
  style: {
    wordsPerCaption: 5, // Show 5 words at a time
    maxCaptionDuration: 3, // Max 3 seconds per caption
    maxCaptionChars: 40, // Max 40 characters per line
  },
});

With Generated Videos

Caption a Generated Video

import {
  compose,
  captions,
  generateVideo,
  videoModel,
  audioModel,
} from "@synthome/sdk";

const execution = await compose(
  captions({
    video: generateVideo({
      model: videoModel("bytedance/seedance-1-pro", "replicate"),
      prompt: "Person giving a presentation",
    }),
    model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
    style: { preset: "youtube" },
  }),
).execute();

Caption After Merge

const execution = await compose(
  captions({
    video: merge([
      generateVideo({
        model: videoModel("bytedance/seedance-1-pro", "replicate"),
        prompt: "Scene 1",
      }),
      generateVideo({
        model: videoModel("bytedance/seedance-1-pro", "replicate"),
        prompt: "Scene 2",
      }),
    ]),
    model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
  }),
).execute();

This pipeline:

  1. Generates two videos in parallel
  2. Merges them into one
  3. Transcribes and adds captions

Full Style Reference

CaptionStyle

PropertyTypeDescription
presetstringStyle preset (tiktok, youtube, etc.)
fontFamilystringFont name
fontSizenumberFont size in pixels
fontWeightstring | numberFont weight (bold, 700, etc.)
colorstringText color (hex)
outlineColorstringOutline color (hex)
backColorstringBackground color (hex)
borderStylenumberBorder style
outlineWidthnumberOutline width in pixels
shadowDistancenumberShadow offset
alignmentstringText alignment
marginVnumberVertical margin
marginLnumberLeft margin
marginRnumberRight margin
wordsPerCaptionnumberWords shown at once
maxCaptionDurationnumberMax seconds per caption
maxCaptionCharsnumberMax characters per caption
highlightActiveWordbooleanEnable word highlighting
activeWordColorstringColor for active word
inactiveWordColorstringColor for inactive words
activeWordScalenumberScale multiplier for active word
animationStylestringAnimation: none, color, scale, glow

API Reference

captions(options)

ParameterTypeDescription
optionsCaptionsOptionsCaption configuration

CaptionsOptions

PropertyTypeRequiredDescription
videostring | VideoOperationYesVideo URL or generated video
modelAudioModel*Transcription model
captionsCaptionWord[]*Custom word-level captions
styleCaptionStyleNoStyling options

* Either model or captions is required.

CaptionWord

PropertyTypeDescription
wordstringThe word text
startnumberStart time in seconds
endnumberEnd time in seconds

How is this guide?