Music Video (MV) API
The MV API turns a song (Suno clip or any audio file) into a complete music
video. One endpoint, one task lifecycle, two creation modes — selected via
mode:
mode | Best for |
|---|---|
"fast" | Quick finished MV, lip-sync, social-ready output |
"studio" | Editable B2B production — storyboard review, per-scene control, cost gates |
Both modes share the same input contract, the same output shape (MVView),
the same task envelope, and the same webhook events. The capability flags on
MVView.capabilities tell your UI which controls to render — you don’t
branch on mode beyond the initial create call.
Authentication uses the same x-api-key header as every other OmnAPI surface.
The endpoints
Section titled “The endpoints”| Method | Path | Description |
|---|---|---|
| POST | /api/v1/mv/quote | Unified pricing preview (Fast + Studio variants) |
| POST | /api/v1/mv | Create MV (returns generation taskId + canonical id) |
| GET | /api/v1/mv/:id | Read unified MVView |
| PATCH | /api/v1/mv/:id | Edit top-level metadata + Studio stage outputs |
| DELETE | /api/v1/mv/:id | Archive |
| POST | /api/v1/mv/:id/scenes/:idx/edit | Edit a single scene (Fast: re-render; Studio: in-place patch) |
| POST | /api/v1/mv/:id/scenes/:idx/render | Trigger i2v for one scene (Studio only) |
| POST | /api/v1/mv/:id/scenes/:idx/regenerate-image | Re-roll a scene’s reference image (Studio only) |
| PATCH | /api/v1/mv/:id/scenes/:idx/select-rendering | Pin canonical rendering (Studio only) |
| POST | /api/v1/mv/:id/finalize | Stitch final MP4 |
| POST | /api/v1/mv/:id/auto-pilot | Run all Studio stages inline (Studio only) |
| POST | /api/v1/mv/:id/{stage} | Trigger a single Studio stage by name |
| GET | /api/v1/mv/:id/final | Get a 15-minute presigned URL for the final MP4 |
| GET | /api/v1/mv/templates | Studio style templates catalog |
| GET | /api/v1/mv/models | Render-model catalog |
| GET | /api/v1/mv/pricing | Live MV render pricing rules by provider / model / resolution |
Studio stages ({stage} above): analyze-emotion, draft-concept,
draft-narrative, plan-scenes, lock-character, evaluate-scenes.
Create
Section titled “Create”curl -X POST https://api.omnapi.com/api/v1/mv \ -H "x-api-key: om_live_..." \ -H "Content-Type: application/json" \ -d '{ "mode": "fast", "source": { "type": "suno", "clipId": "484a67d4-..." }, "title": "Sunny Morning", "aspectRatio": "9:16", "resolution": "540p", "lipSync": false, "subtitles": true, "language": "auto" }'Response:
{ "id": "01H...", "mode": "fast", "taskId": "task_01H...", "status": "PENDING", "creditsReserved": 1724, "warningCodes": []}Poll the task until terminal, then GET /api/v1/mv/:id to read the
unified MVView.
Input fields
Section titled “Input fields”| Field | Type | Notes |
|---|---|---|
mode | "fast" | "studio" | Required. Picks the engine. |
source | object | Required. See Source variants. |
referenceImages | string[] | 0–7 images (URL, data: base64, or R2 key). Fast: passed to the managed renderer; if omitted and the source is a Suno clip, the clip’s cover image is auto-injected and MV_REFERENCE_FROM_SUNO_COVER is emitted. Studio: [0] is the character anchor candidate when characterImage is unset. |
characterImage | string | Studio-only. Single anchor portrait. 400 MV_CHARACTER_IMAGE_FAST_NOT_ALLOWED if used with mode=fast. |
prompt | string | ≤3000 chars. Style hint + scene direction. |
aspectRatio | enum | 16:9, 9:16 (default), 1:1, 4:3, 3:4. |
resolution | enum | 540p (default), 720p, 1080p. |
lipSync | bool | Fast-only. 400 MV_LIPSYNC_UNSUPPORTED_IN_STUDIO if used with mode=studio. Adds a dynamic Vidu surcharge, roughly 24 OmnAPI credits/sec at the default rate. |
subtitles | bool | Burn subtitles into the final MP4. See Subtitle behavior. |
subtitleColor | string | Hex, default #FFFFFF. |
language | "auto" | "en" | "zh" | Default "auto". |
srtUrl | string | Explicit SRT override (URL or data: base64). |
title | string | ≤200 chars. Studio: rendered as 3-second title card; Fast: stored on project. |
config | object | Mode-specific. See Mode configs. |
callbackUrl | string | Per-task webhook URL. |
metadata | object | Free-form, echoed on read. |
tags | string[] | Free-form labels for filtering. |
priority | int | 1–10 task queue priority, default 5. |
expectedVersion | int | Reserved for idempotency on retries. |
Source variants
Section titled “Source variants”// Suno clip — backend resolves audio URL, duration, word-level timeline{ "type": "suno", "clipId": "<suno-clip-id>" }
// External audio URL — caller-provided{ "type": "audio", "audioUrl": "https://example.com/song.mp3", "durationSec": 60, "lyrics": "optional plain text"}
// Inline base64 upload (≤20MB decoded){ "type": "audio-upload", "contentBase64": "...", "contentType": "audio/mpeg", "durationSec": 60}Allowed contentType for audio-upload: audio/mpeg, audio/wav,
audio/wave, audio/x-wav, audio/aac, audio/mp4, audio/x-m4a.
Duration must be 10–180 seconds regardless of variant.
Mode configs
Section titled “Mode configs”For mode: "studio":
{ "draft": false, // skip scene-images; 100 credits instead of 250 "preview": false, // render scenes at 720p/24fps/≤4s for cost control "templateId": "tmpl_...", // optional MVTemplate id "characterPrompt": "...", // text-only anchor (mutex with characterImage) "videoProvider": "<model-id-from-/models>", "maxLipsync": 1 // max scenes marked lipsync framing}templateId is silently ignored in mode=fast (warning code
MV_TEMPLATE_IGNORED_IN_FAST).
curl -H "x-api-key: om_live_..." \ https://api.omnapi.com/api/v1/mv/<id>Returns the unified MVView:
type MVView = { id: string; mode: "fast" | "studio"; status: "PENDING" | "GENERATING" | "READY" | "RENDERING" | "FINALIZING" | "COMPLETE" | "EDITING" | "FAILED" | "ARCHIVED"; version: number;
source: { type, clipId?, audioUrl?, durationSec, lyrics? }; prompt: string | null; title: string | null; config: { aspectRatio, resolution, lipSync, subtitles, ... };
characterAnchor: { url, r2Key } | null; referenceImages: { url: string }[]; scenes: MVSceneView[]; finalMv: MVFinalView | null; stages: { emotionMap, creativeConcept, narrativeArc };
warningCodes: string[]; lastErrorCode: string | null; lastErrorMessage: string | null; failedAt: string | null; generationTaskId: string | null; createdAt: string; updatedAt: string;
capabilities: { canEditScenePrompt: boolean; // both modes canEditSceneImage: boolean; // studio only canEditSceneFraming: boolean; // studio only canTriggerRender: boolean; // studio only canFinalize: boolean; canPatchStageOutput: boolean; // studio only };};
type MVSceneView = { index: number; startSec: number; endSec: number; lyricsWindow: string | null; framing: string | null; prompt: string; imageUrl: string | null; // studio: scene still; fast: always null videoUrl: string | null; // presigned providerJobId: string | null; // fast only, opaque id status: "PLANNED" | "IMAGE_READY" | "RENDERING" | "READY" | "FAILED" | "STALE"; renderingHistory: Array<{ id, videoUrl, durationSec, isSelected, createdAt }>;};Use MVView.capabilities to drive your UI. Don’t branch on mode
directly — capability flags may unlock controls that
today are studio-only, and vice versa.
All presigned URLs (scene images, scene videos, character anchor, final MP4)
have a 15-minute TTL. Refresh by reading MVView again, or call
GET /api/v1/mv/:id/final for just the final URL.
Edit a scene
Section titled “Edit a scene”curl -X POST https://api.omnapi.com/api/v1/mv/<id>/scenes/0/edit \ -H "x-api-key: om_live_..." \ -d '{ "prompt": "tighter close-up on the rapper's face, golden hour" }'Fast: spawns an edit-scene task; the new MP4 replaces the scene’s
current videoUrl. Returns { taskId, sceneIndex, version }.
Studio: synchronous in-place update of the scene’s imagePrompt,
videoPrompt, framing, or referenceImageUrl (call
render separately to actually re-render i2v). Returns
{ taskId: null, sceneIndex, version }.
| Body field | Fast | Studio |
|---|---|---|
prompt | required | optional; fans out to both imagePrompt + videoPrompt when alone |
imagePrompt | 400 MV_NOT_SUPPORTED_IN_FAST | optional |
videoPrompt | 400 MV_NOT_SUPPORTED_IN_FAST | optional |
framing | 400 MV_NOT_SUPPORTED_IN_FAST | optional |
referenceImageUrl | 400 MV_NOT_SUPPORTED_IN_FAST | optional |
expectedVersion | optional | recommended |
Studio per-scene workflow
Section titled “Studio per-scene workflow”Render
Section titled “Render”curl -X POST https://api.omnapi.com/api/v1/mv/<id>/scenes/0/render \ -d '{ "videoProvider": "<model-id-from-/models>", "resolution": "720p", "durationSec": 4, "draft": false }'Returns { taskId, creditsReserved }. Body fields are all optional —
defaults inherit from the storyboard’s config.
Select canonical rendering
Section titled “Select canonical rendering”curl -X PATCH https://api.omnapi.com/api/v1/mv/<id>/scenes/0/select-rendering \ -d '{ "renderingId": "01H..." }'Synchronous; no task, no credits. Required before finalize if you’ve
re-rolled scenes.
Regenerate scene image
Section titled “Regenerate scene image”curl -X POST https://api.omnapi.com/api/v1/mv/<id>/scenes/0/regenerate-image \ -d '{ "imagePromptOverride": "more dramatic lighting" }'Returns { taskId, creditsReserved } (15 credits). After completion the
scene’s imageUrl updates.
Studio stage-by-stage
Section titled “Studio stage-by-stage”Each LLM stage is a separately callable task. Useful for B2B power-users who
want to inspect or PATCH intermediate outputs.
| Endpoint | Stage | Output column |
|---|---|---|
POST /api/v1/mv/:id/analyze-emotion | Analyze song structure and mood | emotionMap |
POST /api/v1/mv/:id/draft-concept | Draft the visual concept | creativeConcept |
POST /api/v1/mv/:id/lock-character | Lock a character anchor | character anchor |
POST /api/v1/mv/:id/draft-narrative | Draft the narrative arc | narrativeArc |
POST /api/v1/mv/:id/plan-scenes | Plan scene prompts and timing | scenes[] |
POST /api/v1/mv/:id/evaluate-scenes | Evaluate scene consistency | (read-only) |
POST /api/v1/mv/:id/auto-pilot | All inline | all stages |
Body for all of them:
{ "maxLipsync": 1, "expectedVersion": 7}lock-character accepts additional { refImageR2Key, description }.
Between stages, use PATCH /api/v1/mv/:id to overwrite a stage’s output:
curl -X PATCH https://api.omnapi.com/api/v1/mv/<id> \ -d '{ "expectedVersion": 8, "emotionMap": { "...": "..." } }'Patchable fields: title, prompt, characterImage, emotionMap,
creativeConcept, narrativeArc, scenesArray. A 409
MV_VERSION_CONFLICT means another writer bumped the row — re-fetch and
retry.
Finalize
Section titled “Finalize”curl -X POST https://api.omnapi.com/api/v1/mv/<id>/finalize -d '{}'Returns { taskId, creditsReserved }.
Finalize concatenates selected scene MP4s, overlays the source audio, burns subtitles + title card when requested, and returns a final MP4 URL.
When the task completes:
curl https://api.omnapi.com/api/v1/mv/<id>/finalReturns:
{ "id": "<mvId>", "videoUrl": "https://r2.../signed?...", "expiresInSec": 900}The presigned URL is 15 minutes. The final MV file itself stays in R2 for 30 days before lifecycle expiry.
# Fast quotecurl -X POST https://api.omnapi.com/api/v1/mv/quote \ -d '{ "mode": "fast", "durationSec": 60, "lipSync": true, "resolution": "540p" }'
# Studio storyboard quotecurl -X POST https://api.omnapi.com/api/v1/mv/quote \ -d '{ "mode": "studio", "step": "storyboard", "draft": false }'
# Studio per-scene render quotecurl -X POST https://api.omnapi.com/api/v1/mv/quote \ -d '{ "mode": "studio", "step": "render-scene", "videoProvider": "<model-id-from-/models>", "resolution": "720p", "durationSec": 4, "draft": false }'
# Studio total estimatecurl -X POST https://api.omnapi.com/api/v1/mv/quote \ -d '{ "mode": "studio", "step": "total", "videoProvider": "<model-id-from-/models>", "resolution": "720p", "perSceneDurationSec": 4, "estimatedSceneCount": 8 }'All return { credits, breakdown, warningCodes? }.
Pricing summary:
| Mode | Item | Credits |
|---|---|---|
| Fast | quote | 0 |
| Fast | Vidu Q2-pro-equivalent 540p | Dynamic; 60s ≈ 1,724 |
| Fast | Vidu Q2-pro-equivalent 720p | Dynamic; 60s ≈ 3,476 |
| Fast | Vidu Q2-pro-equivalent 1080p | Dynamic; 60s ≈ 5,400 |
| Fast | lip-sync | Dynamic, about +24 credits/sec |
| Fast | compose/final copy | 100 |
| Studio | storyboard draft / full | 100 / 250 |
| Studio | analyze emotion / draft concept / draft narrative | 10 each |
| Studio | plan scenes | 20 |
| Studio | lock character / evaluate scenes | 0 |
| Studio | scene image regenerate | 15 |
| Studio | select rendering / patch / reads | 0 |
| Studio | finalize | 50 |
Studio render pricing defaults are shown below. Active production rows may come
from mv_provider_pricing_rules; use GET /api/v1/pricing/catalog or
POST /api/v1/mv/quote for the live value.
| Provider / model | Resolution | FPS | Draft | Credits / billable second |
|---|---|---|---|---|
Replicate prunaai/p-video | 720p | 24 | off | 30 |
Replicate prunaai/p-video | 720p | 24 | on | 8 |
Replicate prunaai/p-video | 1080p | 24 | off | 60 |
Replicate prunaai/p-video | 1080p | 24 | on | 15 |
Replicate prunaai/p-video | 720p | 48 | off | 45 |
Replicate prunaai/p-video | 720p | 48 | on | 12 |
Replicate prunaai/p-video | 1080p | 48 | off | 90 |
Replicate prunaai/p-video | 1080p | 48 | on | 23 |
MiniMax MiniMax-Hailuo-02 | 512P | 24 | off | 21 |
MiniMax MiniMax-Hailuo-02 | 768P | 24 | off | 81 |
MiniMax MiniMax-Hailuo-02 | 1080P | 24 | off | 134 |
Vidu vidu2.0 | 360p | 24 | off | 38 |
Vidu vidu2.0 | 540p | 24 | off | 60 |
Vidu vidu2.0 | 720p | 24 | off | 90 |
Vidu vidu2.0 | 1080p | 24 | off | 150 |
Subtitle behavior
Section titled “Subtitle behavior”| Source | subtitles=true | srtUrl | Mode | Behavior |
|---|---|---|---|---|
| suno | yes | — | fast | OmnAPI converts the Suno timeline to subtitles. |
| suno | yes | yes | fast | Caller-supplied SRT wins. |
| suno | yes | — | studio | OmnAPI burns word-level subtitles from the Suno timeline. |
| audio / audio-upload | yes | yes | fast | Caller-supplied SRT is used. |
| audio / audio-upload | yes | yes | studio | OmnAPI burns subtitles from the parsed SRT. |
| audio / audio-upload | yes | — | fast | Subtitle timing is inferred when available. |
| audio / audio-upload | yes | — | studio | Silently disabled. warningCodes: ["MV_SUBTITLE_DISABLED_NO_TIMELINE"] attached. |
| any | false | any | any | No subtitles. |
Lifecycle states
Section titled “Lifecycle states”MVView.status | Meaning |
|---|---|
PENDING | Generation task queued |
GENERATING | MV generation is running |
READY | Ready to render scenes (Studio) or compose (Fast) |
RENDERING | At least one scene render in flight (Studio) |
FINALIZING | Final stitch in progress |
COMPLETE | finalMv.videoUrl is populated |
EDITING | User has edits since last finalize |
FAILED | Pipeline hard-failed |
ARCHIVED | User deleted via DELETE /api/v1/mv/:id |
MVSceneView.status | Meaning |
|---|---|
PLANNED | Studio: scene plan exists, image not yet generated (draft mode) |
IMAGE_READY | Studio: image generated, no video yet |
RENDERING | Scene rendering is in flight |
READY | Scene has a selected (Studio) / latest (Fast) video |
FAILED | Generation failed |
STALE | A newer edit invalidated this scene |
Error codes
Section titled “Error codes”| Code | HTTP | Meaning |
|---|---|---|
MV_MODE_REQUIRED | 400 | mode field absent |
MV_MODE_INVALID | 400 | Unknown mode value |
MV_SOURCE_REQUIRED | 400 | source field absent |
MV_SOURCE_INVALID | 400 | Unknown source variant |
MV_SUNO_CLIP_NOT_READY | 409 | Suno clip exists but not COMPLETED |
MV_AUDIO_DURATION_INVALID | 400 | Outside 10–180s |
MV_AUDIO_CODEC_UNSUPPORTED | 400 | Not MP3/WAV/AAC/M4A |
MV_TOO_MANY_REFERENCES | 400 | More than 7 reference images |
MV_REFERENCE_IMAGE_INVALID | 400 | Malformed image input |
MV_IMAGE_PAYLOAD_TOO_LARGE | 413 | Image exceeds 50MB single or 20MB base64 total |
MV_CHARACTER_IMAGE_FAST_NOT_ALLOWED | 400 | characterImage with mode=fast |
MV_LIPSYNC_UNSUPPORTED_IN_STUDIO | 400 | lipSync=true with mode=studio |
MV_ASPECT_RATIO_INVALID | 400 | Unknown aspect ratio |
MV_RESOLUTION_INVALID | 400 | Unknown resolution |
MV_PROMPT_TOO_LONG | 400 | prompt.length > 3000 |
MV_NOT_SUPPORTED_IN_FAST | 400 | Studio-only operation called on Fast MV |
MV_NOT_SUPPORTED_IN_STUDIO | 400 | Reserved |
MV_NOT_FOUND | 404 | id unknown |
MV_VERSION_CONFLICT | 409 | expectedVersion stale |
INSUFFICIENT_CREDITS | 402 | Quote exceeds balance |
RATE_LIMITED | 429 | Per-account concurrency cap hit |
Warning codes (non-fatal, surfaced on MVView.warningCodes)
Section titled “Warning codes (non-fatal, surfaced on MVView.warningCodes)”| Code | Trigger |
|---|---|
MV_SUBTITLE_DISABLED_NO_TIMELINE | Studio + audio-source + subtitles=true + no SRT |
MV_LIBASS_UNAVAILABLE | Worker lacks libass; subtitles skipped |
MV_CJK_FONT_UNAVAILABLE | Worker lacks CJK font; subtitles skipped for zh |
MV_PHOTOMAKER_BYPASSED_FRAMING | Studio: ≥1 scene skipped face-lock due to framing |
MV_TEMPLATE_IGNORED_IN_FAST | Fast: templateId provided but has no effect |
MV_REFERENCE_FROM_SUNO_COVER | Fast: referenceImages was empty; Suno clip cover auto-injected |
Webhook events
Section titled “Webhook events”{ "event": "mv.created" | "mv.generation.completed" | "mv.generation.failed" | "mv.scene.rendering.completed" | "mv.scene.rendering.failed" | "mv.scene.edited" | "mv.finalize.completed" | "mv.finalize.failed" | "mv.stage.<name>.completed" | "mv.archived", "mode": "fast" | "studio", "mvId": "<id>", "taskId": "task_...", "sceneIndex": 0, // when applicable "timestamp": "2026-06-01T...", "data": { /* MVView snapshot */ }}See Webhook Events for delivery semantics.
Common pitfalls
Section titled “Common pitfalls”See also
Section titled “See also”- Async Jobs — task polling cadence + lifecycle
- Webhook Events — delivery semantics
- Credits & Billing — refund semantics on failure
- Errors — full
MV_*error catalog