Captions
Add auto-generated or custom subtitles to videos
captions()
Add captions to videos with automatic transcription or custom timing.
import { compose, captions, audioModel } from "@synthome/sdk";
const execution = await compose(
captions({
video: "https://example.com/video.mp4",
model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
}),
).execute();Auto-Generated Captions
Use a transcription model to automatically generate captions:
captions({
video: "https://example.com/video.mp4",
model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
});Available Transcription Models
| Model | Provider | Speed | Notes |
|---|---|---|---|
vaibhavs10/incredibly-fast-whisper | replicate | Very fast | Recommended for most |
openai/whisper | replicate | Standard | Original Whisper model |
// Standard Whisper
captions({
video: "https://example.com/video.mp4",
model: audioModel("openai/whisper", "replicate"),
});Transcription Correction
When using TTS-generated audio, transcription models like Whisper often misrecognize brand names, technical terms, or uncommon words. Use originalText to automatically correct these errors:
captions({
video: "https://example.com/video.mp4",
model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
originalText: "Synthome makes video editing easy with AI-powered tools.",
});How It Works
- Whisper transcribes the audio and returns word-level timestamps
- The transcription is compared against your original text using AI
- Misrecognized words are corrected while preserving the original timestamps
Common Use Cases
Brand name correction:
// Without originalText: "Sintom" or "Sin Thome"
// With originalText: "Synthome" (correctly spelled)
captions({
video: ttsGeneratedVideo,
model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
originalText: "Synthome is the best video platform.",
});Technical terms:
captions({
video: productDemo,
model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
originalText: "Configure your Kubernetes cluster with kubectl apply.",
});Multi-language Support
The correction works with any language since it uses AI to match the transcription against the original text:
captions({
video: frenchVideo,
model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
originalText: "Bienvenue sur notre plateforme Synthome.",
});The originalText parameter requires an OpenAI API key configured in your
integrations. The correction uses GPT-4o-mini for fast, accurate results.
Custom Captions
Provide your own word-level timing:
captions({
video: "https://example.com/video.mp4",
captions: [
{ word: "Hello", start: 0.0, end: 0.5 },
{ word: "world", start: 0.5, end: 1.0 },
{ word: "this", start: 1.2, end: 1.4 },
{ word: "is", start: 1.4, end: 1.6 },
{ word: "a", start: 1.6, end: 1.7 },
{ word: "video", start: 1.7, end: 2.2 },
],
});Caption Styles
Style Presets
Use built-in presets for popular platforms:
captions({
video: "https://example.com/video.mp4",
model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
style: { preset: "tiktok" },
});| Preset | Description |
|---|---|
tiktok | Bold, centered, mobile-optimized |
youtube | Clean, bottom-positioned |
story | Vertical video friendly |
minimal | Subtle, unobtrusive |
cinematic | Film-style subtitles |
Custom Font Styling
captions({
video: "https://example.com/video.mp4",
model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
style: {
fontFamily: "Arial",
fontSize: 48,
fontWeight: "bold",
letterSpacing: 2,
color: "#FFFFFF",
outlineColor: "#000000",
outlineWidth: 2,
},
});Background Styling
Add a background box behind your captions:
captions({
video: "https://example.com/video.mp4",
model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
style: {
fontFamily: "Arial",
fontSize: 48,
fontWeight: "normal",
color: "#FFFFFF",
backgroundColor: "#000000",
padding: 20,
},
});When backgroundColor is set, an opaque box is automatically added behind the text. Use padding to control the space between the text and the box edges.
Positioning
captions({
video: "https://example.com/video.mp4",
model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
style: {
alignment: "center",
marginV: 50, // Vertical margin from bottom
marginL: 20, // Left margin
marginR: 20, // Right margin
},
});Word Highlighting
Highlight the currently spoken word:
captions({
video: "https://example.com/video.mp4",
model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
style: {
highlightActiveWord: true,
activeWordColor: "#FFFF00", // Yellow highlight
inactiveWordColor: "#FFFFFF", // White for other words
},
});Animation Styles
style: {
highlightActiveWord: true,
animationStyle: "color", // Options: "none", "color", "scale", "glow"
activeWordScale: 1.2, // Scale up active word
}Caption Behavior
Control how captions are grouped and displayed:
captions({
video: "https://example.com/video.mp4",
model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
style: {
wordsPerCaption: 5, // Show 5 words at a time
maxCaptionDuration: 3, // Max 3 seconds per caption
maxCaptionChars: 40, // Max 40 characters per line
},
});With Generated Videos
Caption a Generated Video
import {
compose,
captions,
generateVideo,
videoModel,
audioModel,
} from "@synthome/sdk";
const execution = await compose(
captions({
video: generateVideo({
model: videoModel("bytedance/seedance-1-pro", "replicate"),
prompt: "Person giving a presentation",
}),
model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
style: { preset: "youtube" },
}),
).execute();Caption After Merge
const execution = await compose(
captions({
video: merge([
generateVideo({
model: videoModel("bytedance/seedance-1-pro", "replicate"),
prompt: "Scene 1",
}),
generateVideo({
model: videoModel("bytedance/seedance-1-pro", "replicate"),
prompt: "Scene 2",
}),
]),
model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
}),
).execute();This pipeline:
- Generates two videos in parallel
- Merges them into one
- Transcribes and adds captions
Reusing Transcripts
When you need the same transcript for multiple operations (e.g., captions and position keyframes), use the transcribe() function to create a reusable transcript:
import {
compose,
transcribe,
captions,
generatePositionKeyframes,
layers,
audioModel,
} from "@synthome/sdk";
// Create a reusable transcript
const transcript = transcribe({
video: "https://example.com/speaking-head.mp4",
model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
});
// Use the same transcript for captions
const captionedVideo = captions({
video: "https://example.com/speaking-head.mp4",
transcript: transcript, // Reuse the transcript
style: { preset: "tiktok" },
});
// And for position keyframes
const positions = generatePositionKeyframes({
timestamps: transcript, // Same transcript
positions: ["w-2/3 bottom-left", "w-2/3 bottom", "w-2/3 bottom-right"],
});When to Use Transcript
- Multiple uses: When you need timestamps for both captions and position keyframes
- Audio-first workflow: When transcribing generated audio before video creation
- Pipeline optimization: Avoids duplicate transcription jobs
Transcribing Audio Directly
You can transcribe audio files directly without a video:
import { transcribe, generateAudio, audioModel } from "@synthome/sdk";
// Transcribe generated TTS audio
const transcript = transcribe({
audio: generateAudio({
model: audioModel("elevenlabs/turbo-v2.5", "elevenlabs"),
text: "Welcome to our video!",
voiceId: "21m00Tcm4TlvDq8ikWAM",
}),
model: audioModel("vaibhavs10/incredibly-fast-whisper", "replicate"),
originalText: "Welcome to our video!", // Correct transcription
});This is faster than transcribing from video since no audio extraction is needed.
Full Style Reference
CaptionStyle
| Property | Type | Description |
|---|---|---|
preset | string | Style preset (tiktok, youtube, etc.) |
fontFamily | string | Font name |
fontSize | number | Font size in pixels |
fontWeight | string | number | Font weight ("normal", "bold", 400, 700) |
letterSpacing | number | Letter spacing in pixels |
color | string | Text color (hex) |
outlineColor | string | Outline color (hex) |
backgroundColor | string | Background box color (hex) |
padding | number | Padding around text in background box |
borderStyle | number | 1=Outline only, 3=Opaque background box |
outlineWidth | number | Outline width in pixels (when borderStyle is 1) |
shadowDistance | number | Shadow offset |
alignment | string | Text alignment |
marginV | number | Vertical margin |
marginL | number | Left margin |
marginR | number | Right margin |
wordsPerCaption | number | Words shown at once |
maxCaptionDuration | number | Max seconds per caption |
maxCaptionChars | number | Max characters per caption |
highlightActiveWord | boolean | Enable word highlighting |
activeWordColor | string | Color for active word |
inactiveWordColor | string | Color for inactive words |
activeWordScale | number | Scale multiplier for active word |
animationStyle | string | Animation: none, color, scale, glow |
API Reference
captions(options)
| Parameter | Type | Description |
|---|---|---|
options | CaptionsOptions | Caption configuration |
CaptionsOptions
| Property | Type | Required | Description |
|---|---|---|---|
video | string | VideoOperation | Yes | Video URL or generated video |
model | AudioModel | * | Transcription model |
captions | CaptionWord[] | * | Custom word-level captions |
transcript | TranscribeOperation | string | * | Pre-created transcript (reusable) |
originalText | string | No | Original text for transcription correction |
style | CaptionStyle | No | Styling options |
* One of model, captions, or transcript is required.
CaptionWord
| Property | Type | Description |
|---|---|---|
word | string | The word text |
start | number | Start time in seconds |
end | number | End time in seconds |
How is this guide?