Create Transcription
Create Projects
Create Transcription
Generate accurate transcriptions from video and audio content
POST
Create Transcription
For AI agents: a documentation index is at /llms.txt. Every page is also available as markdown, just append .md to the URL.
Overview
Extract accurate transcriptions from your videos using AI-powered speech recognition. This endpoint generates timestamped transcriptions with support for multiple languages, translation, and script format options. Transcription output is available in multiple formats including SRT, VTT, CSV, and TXT.Rate Limiting
This endpoint is rate limited to 10 requests per minute per API key.You must provide either
sourceUrl or uploadId, but not both.Video Requirements
Duration
Minimum: 3 seconds
Maximum: 15 minutes
Maximum: 15 minutes
File Size
Maximum: 5 GB
Format
MP4 or MOV with valid audio streams
Audio Quality
Clear speech produces best transcription results
Plan Limits
| Plan | Concurrent Projects |
|---|---|
| Creator | 3 |
| Studio | 10 |
The Automation API requires an active subscription. View pricing to compare plans.
Response
Unique project identifier
Project title (usually the filename)
Thumbnail URL for the project
Duration in seconds that will be billed to your account
Current processing status
processing- Audio is being transcribedcompleted- Transcription has been generated successfullyfailed- Processing failed due to an error
Type of project (always “transcription” for this endpoint)
Source of the video content
Upload- Uploaded fileYoutube- YouTube URLGeneric- External URL
Primary language of the video content
Whether transcription will be translated
Array of languages for translation
Script format for transcription (“native” or “roman”)
Video file metadata including duration, resolution, format, etc.
Project URLs and assets (populated when processing completes). Includes transcription files in multiple formats: SRT, VTT, CSV, and TXT.
Unix timestamp when the project was created
Unix timestamp when the project was last updated
Example Request
Example Response
Processing Workflow
- Audio Extraction - Audio is extracted from the video file
- Speech Recognition - AI transcribes the speech with word-level timing
- Translation - If a translation language is specified, the transcription is translated
- Format Generation - Output is generated in multiple formats (SRT, VTT, CSV, TXT)
- Completion - Use Get Project Status to monitor progress
Output Formats
When transcription completes, theurls object in Get Project Details includes:
| Format | Field | Description |
|---|---|---|
| SRT | transcription_srt | SubRip subtitle format |
| VTT | transcription_vtt | WebVTT subtitle format |
| CSV | transcription_csv | Comma-separated values |
| TXT | transcription_txt | Plain text transcript |
| Audio | audioFile | Extracted audio file |
Best Practices
- Audio Quality: Clear audio with minimal background noise produces more accurate results
- Language Selection: Specify the language explicitly for better transcription accuracy
- Script Format: Use “roman” for romanized output of non-Latin script languages
- Translation: Combine with
translationLanguageto get translated transcriptions
Use Cases
Content Indexing
Generate searchable text from video libraries at scale
Subtitle Generation
Create SRT/VTT files for video players and platforms
Meeting Notes
Transcribe recorded meetings and webinars
Accessibility
Make video content accessible with accurate transcriptions
Next Steps
After creating a transcription project:- Monitor progress with Get Project Status
- Retrieve the full project with transcription URLs via Get Project Details
- Download transcription files in your preferred format
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
application/json
URL to a video or audio file (alternative to uploadId)
Upload ID from a previously uploaded file (alternative to sourceUrl)
Primary language of the video content (auto-detected if not provided)
Language to translate the transcription to
Script format for transcription output
Available options:
native, roman Response
200 - application/json
Successful response
Available options:
queued, prepped, draft, processing, finalizing, completed, invalid, expired, failed, error Available options:
clipping, captions, reframe, dubbing, transcription Available options:
Upload, Youtube, Vimeo, TwitchVod, Twitter, RumbleEmbed, Generic Available options:
talking, screenshare, gaming Available options:
landscape, portrait, square Available options:
native, roman