TimedSubs
Workflow guide

Workflow guide

Forced alignment for subtitles explained

Forced alignment is how TimedSubs converts a finished script and voiceover into a timed subtitle file. Instead of transcribing audio to guess the words, it takes the words you already approved and finds where each one appears in the audio.

Input example

approved script + final voiceover audio for this publishing workflow

Output asset example

SRT/VTT subtitle assets plus quality notes for downstream upload or editor handoff

Common review point

Late narration edits shift subtitle timing against the approved script.

Decision points

What forced alignment does

Forced alignment takes a text input and an audio file, then locates each word in the audio stream to assign a precise timestamp. The output is a timed subtitle file where every line comes from your script, not from a transcription guess.

Why it preserves approved wording

Generic auto captions start from the audio and work backwards to text — which means speech recognition errors, name misspellings, and changed product terms end up in your subtitle file. Forced alignment starts from your text and works forward to timing, so the wording is locked from the start.

When it fits your workflow

Forced alignment is the right approach when you already have an approved script, TTS-generated voiceover, product demo narration, or course content where the exact wording has been signed off. If you are still editing the script, use the Script + Audio workflow after the script is final.

Practical workflow

  1. 1

    Finalize your approved script text (TXT, MD, or plain text).

  2. 2

    Upload the script and matching voiceover audio to TimedSubs.

  3. 3

    Review alignment results, resolve any quality issues, and export SRT, VTT, or other supported formats.

Product boundary

Forced alignment requires both a script and matching audio. If you only have audio, TimedSubs is not the right tool — use a transcription service first.

FAQ

Is forced alignment different from transcription?

Yes. Transcription starts from audio and generates text using speech recognition, which can change wording silently. Forced alignment starts from your approved text and uses audio only for timing. The words in your subtitle file are the words you submitted — not what a model guessed from the recording.

What happens when the audio does not match the script?

TimedSubs flags the mismatch as a review note rather than silently correcting the script. You can see which lines have timing confidence issues, check the audio at that point, and decide whether to re-record, adjust the script, or accept the deviation. The original script text stays intact unless you explicitly change it.