Skip to content

Cerlancism/chatgpt-subtitle-translator

Repository files navigation

ChatGPT API SRT Subtitle Translator

Large language models (LLMs), such as ChatGPT, have demonstrated their capabilities as a robust translator, capable of handling common natural languages with contextual understanding, as well as unconventional forms of writing like word scrambling. However, it may not always produce a deterministic output or adhere to line-to-line correlation, potentially disrupting the timing of subtitles, even when instructed to follow precise instructions and with the model temperature parameter set to 0.

This utility uses the OpenAI ChatGPT API to translate text, with a specific focus on line-based translation, especially for SRT subtitles. The translator optimizes token usage by removing SRT overhead and grouping text into batches, resulting in arbitrary length translations without excessive token consumption while ensuring a one-to-one match between line input and output.

Upgrading from v2? See the v2 -> v3 Migration Guide for breaking changes.

Features

  • Web User Interface (Web UI) and Command Line Interface (CLI)
  • Supports Structured Output: for more concise results, enabled by default in the Web UI and CLI
  • Supports Prompt Caching: by including the full context of translated data, the system instruction and translation context are packaged to work well with prompt caching, controlled with -c, --context (CLI only)
  • Supports any OpenAI API compatible providers such as running Ollama locally
  • Line-based batching: avoids token limit per request, reduces overhead token wastage, and maintains translation context to a certain extent
  • Optional OpenAI Moderation tool check: prevents token wastage if the model is highly likely to refuse to translate, enabled with --use-moderator (CLI only)
  • Streaming process output
  • Request per minute (RPM) rate limits
  • Progress resumption (CLI only)

Setup

Reference: https://github.com/openai/openai-quickstart-node#setup

  • Node.js version >= 20 required. This README assumes bash shell environment
  • Clone this repository
    git clone https://github.com/Cerlancism/chatgpt-subtitle-translator
  • Navigate into the directory
    cd chatgpt-subtitle-translator
  • Install the requirements
    npm install
  • Give executable permission
    chmod +x cli/translator.mjs
  • Copy .example.env to .env
    cp .env.example .env
  • Add your API key to the newly created .env file

CLI

cli/translator.mjs --help

Usage: translator [options]

Translation tool based on ChatGPT API

Options:

  • --from <language>
    Source language (default: "")

  • --to <language>
    Target language (default: "English")

  • -s, --system-instruction <instruction>
    Override the prompt system instruction template Translate ${from} to ${to} with this text, ignoring --from and --to options

  • -i, --input <file>
    Input source text with the content of this file, in .srt format or plain text

  • -o, --output <file>
    Output file name, defaults to be based on input file name

  • -r, --structured <mode> Structured response format mode. (default: array, choices: array, object, timestamp, agent, none)

    • array Structures the input and output into an array format.
    • object Structures the input and output as a keyed object.
    • timestamp Provides the model with start/end timestamps alongside each entry's text, allowing it to merge adjacent entries into one. A batch is only retried when the output time span boundaries don't match the input - unlike other modes which retry on any line count mismatch - significantly reducing token wastage from retries. Uses more tokens per batch due to timestamps in input and a merge remarks field in output. Output entry count may differ from input, so progress file resumption is not supported.
    • agent Alias for the agent subcommand with default options. Use the dedicated subcommand for full configuration. See Agent Mode.
    • none Legacy compatibility mode, disables structured output.
  • -c, --context <tokens> Include translation history up to a token budget to work well with prompt caching. Default: 2000. Set to 0 to include history without a token limit check.

    The token budget is tracked from actual model response token counts. The history is chunked into user/assistant message pairs using the last value in --batch-sizes.

    Recommended value: set <tokens> up to ~30% less than the model's max context length to leave room for the current batch and system prompts. For example, for a 128K context model: --context 90000.

  • -b, --batch-sizes <sizes> Batch sizes of increasing order for translation prompt slices in JSON Array

    The number of lines to include in each translation prompt, provided that they are estimated to be within the token limit.
    In case of mismatched output line quantities, this number will be decreased step by step according to the values in the array, ultimately reaching one.

    Larger batch sizes generally lead to more efficient token utilization and potentially better contextual translation.
    However, mismatched output line quantities or exceeding the token limit will cause token wastage, requiring resubmission of the batch with a smaller batch size.

    When omitted, batch size is determined automatically per batch based on the --context token budget. On failure, the size is reduced and retried down to a minimum, then resets on the next successful batch.

  • -g, --guard-repetition <threshold> Minimum number of pattern repeats before aborting a streaming response (default: 10). When the model falls into a repetition loop during streaming, the response is aborted and retried with a smaller batch. Set to 0 to disable repetition detection.

  • --initial-prompts <prompts> Initial prompts for the translation in JSON (default: "[]")

  • --use-moderator Use the OpenAI Moderation tool

  • --moderation-model <model> (default: "omni-moderation-latest") https://developers.openai.com/api/docs/models

  • --no-prefix-number Don't prefix lines with numerical indices. Ignored in -r, --structured array|object|timestamp - prefix numbers are always disabled there.

  • --no-line-matching Don't enforce one-to-one line quantity input output matching. Ignored in -r, --structured timestamp - line matching is always disabled there since entries may be merged.

  • -p, --plain-text <text> Input source text with this plain text argument. Not supported in -r, --structured timestamp mode, or when using the agent subcommand with -r timestamp.

  • --no-stream Disable stream progress output to terminal (streaming is on by default)

  • --log-level <level>
    Log level (default: debug, choices: trace, debug, info, warn, error, silent)

  • --silent
    Same as --log-level silent

  • --quiet Same as --log-level silent

Additional Options for GPT: https://developers.openai.com/api/reference/resources/chat/subresources/completions/methods/create

  • -m, --model <model> (default: "gpt-4o-mini") https://developers.openai.com/api/docs/models
  • --reasoning_effort <reasoning_effort> Constrains effort on reasoning for reasoning models. Accepted values depend on the model (e.g. "low", "medium", "high"), follows the model's default when not set. "none" disables reasoning/thinking entirely (supported by OpenAI o-series/GPT-5+ and open models via Ollama such as Qwen3).
  • -t, --temperature <temperature> Sampling temperature to use, should set a low value such as 0 to be more deterministic for translation (default: 0)
  • --top_p <top_p> Nucleus sampling parameter, top_p probability mass
  • --presence_penalty <presence_penalty> Penalty for new tokens based on their presence in the text so far
  • --frequency_penalty <frequency_penalty> Penalty for new tokens based on their frequency in the text so far
  • --logit_bias <logit_bias> Modify the likelihood of specified tokens appearing in the completion

Agent Mode

Subcommand for multi-pass agentic translation. Accepts all standard translation options.

cli/translator.mjs agent --help

Agent mode runs multiple passes before translating:

Overview - Samples the file to produce a content overview (file identity, duration, genre/tone, character names) and detects the source language.
Planning - Scans the file in token-bounded windows. Each window produces a batch summary (characters, locations, events, tone). Summaries are consolidated and used to generate a refined translation instruction.
Translation - Translates using the enriched instruction. After the first batch, a sample of the output is checked to confirm the target language before proceeding.

Structured mode defaults to array, pass --structured timestamp to use timestamp mode instead.

# Default (array delegate)
cli/translator.mjs agent --input subtitles.srt --from Japanese --to English

# Timestamp delegate
cli/translator.mjs agent --input subtitles.srt --structured timestamp --from Japanese --to English
  • --skip-refine
    Skip the final instruction refinement step at the end of the planning pass and use the original system instruction directly.
  • --no-fitting
    Skip LLM-based token-range fitting for planning summaries and consolidation. Summaries are used as-is regardless of token range.
  • --context-summary <summary>
    Provide a context summary directly, bypassing the planning pass entirely and proceeding straight to translation.

Examples

Plain text

cli/translator.mjs --plain-text "你好"

Standard Output

Hello.

Emojis

cli/translator.mjs --to "Emojis" --plain-text "Chuck Norris can walk with the animals, talk with the animals; grunt and squeak and squawk with the animals... and the animals, without fail, always say 'yessir Mr. Norris'."

Standard Output

👨‍🦰💪🚶‍♂️🦜🐒🐘🐅🐆🐎🐖🐄🐑🦏🐊🐢🐍🐿️🐇🐿️❗️🌳💬😲👉🤵👨‍🦰👊=🐕🐑🐐🦌🐘🦏🦍🦧🦓🐅🦌🦌🦌🐆🦍🐘🐘🐗🦓=👍🤵.

Scrambling

cli/translator.mjs --system-instruction "Scramble characters of words while only keeping the start and end letter" --no-prefix-number --no-line-matching --plain-text "Chuck Norris can walk with the animals, talk with the animals;"

Standard Output

Cuhck Nroris can wakl wtih the aiamnls, talk wtih the aiamnls;

Unscrambling

cli/translator.mjs --system-instruction "Unscramble characters back to English" --no-prefix-number --no-line-matching --plain-text "Cuhck Nroris can wakl wtih the aiamnls, talk wtih the aiamnls;"

Standard Output

Chuck Norris can walk with the animals, talk with the animals;

Plain text file

cli/translator.mjs --input test/data/test_cn.txt

Input file: test/data/test_cn.txt

你好。
拜拜!

Standard Output

Hello.  
Goodbye!

SRT file

cli/translator.mjs --input test/data/test_ja_small.srt

Input file: test/data/test_ja_small.srt

1
00:00:00,000 --> 00:00:02,000
おはようございます。

2
00:00:02,000 --> 00:00:05,000
お元気ですか?

3
00:00:05,000 --> 00:00:07,000
はい、元気です。

4
00:00:08,000 --> 00:00:12,000
今日は天気がいいですね。

5
00:00:12,000 --> 00:00:16,000
はい、とてもいい天気です。

Output file: test/data/test_ja_small.srt.out_English.srt

1
00:00:00,000 --> 00:00:02,000
Good morning.

2
00:00:02,000 --> 00:00:05,000
How are you?

3
00:00:05,000 --> 00:00:07,000
Yes, I'm doing well.

4
00:00:08,000 --> 00:00:12,000
The weather is nice today, isn't it?

5
00:00:12,000 --> 00:00:16,000
Yes, it's very nice weather.

How it works

SRT indices and timestamps are stripped or simplified before sending to the model, reducing tokens. Lines are batched together into a single prompt - removing repeated per-entry overhead. The default system instruction is a minimal Translate to <language> (3 tokens). Structured output modes enforce a schema so the model returns only the translated text.

Five modes are available via --structured:

array (default)

Lines are sent as a JSON array. The model returns a matching array.

Input (SRT) Prompt (User Message) Transform (Model Response) Output (SRT)

Tokens: 139

Tokens: 52

Tokens: 38

Tokens: 127

1
00:00:00,000 --> 00:00:02,000
おはようございます。

2
00:00:02,000 --> 00:00:05,000
お元気ですか?

3
00:00:05,000 --> 00:00:07,000
はい、元気です。

4
00:00:08,000 --> 00:00:12,000
今日は天気がいいですね。

5
00:00:12,000 --> 00:00:16,000
はい、とてもいい天気です。

(compact JSON, formatted here for readability)

{
  "inputs": [
    "おはようございます。",
    "お元気ですか?",
    "はい、元気です。",
    "今日は天気がいいですね。",
    "はい、とてもいい天気です。"
  ]
}

(compact JSON, formatted here for readability)

{
  "outputs": [
    "Good morning.",
    "How are you?",
    "Yes, I'm doing well.",
    "The weather is nice today, isn't it?",
    "Yes, it's very nice weather."
  ]
}
1
00:00:00,000 --> 00:00:02,000
Good morning.

2
00:00:02,000 --> 00:00:05,000
How are you?

3
00:00:05,000 --> 00:00:07,000
Yes, I'm doing well.

4
00:00:08,000 --> 00:00:12,000
The weather is nice today, isn't it?

5
00:00:12,000 --> 00:00:16,000
Yes, it's very nice weather.

object

Source lines are used as keys in the response schema. The model maps each source key to its translation. No explicit user message is sent - the schema itself conveys the input.

Input (SRT) Prompt (Schema Keys) Transform (Model Response) Output (SRT)

Tokens: 139

Tokens: ~60

Tokens: 85

Tokens: 127

1
00:00:00,000 --> 00:00:02,000
おはようございます。

2
00:00:02,000 --> 00:00:05,000
お元気ですか?

3
00:00:05,000 --> 00:00:07,000
はい、元気です。

4
00:00:08,000 --> 00:00:12,000
今日は天気がいいですね。

5
00:00:12,000 --> 00:00:16,000
はい、とてもいい天気です。

Source lines are encoded as response schema keys (no user message)

{
  "おはようございます。": "string",
  "お元気ですか?": "string",
  "はい、元気です。": "string",
  "今日は天気がいいですね。": "string",
  "はい、とてもいい天気です。": "string"
}

(compact JSON, formatted here for readability)

{
  "おはようございます。": "Good morning.",
  "お元気ですか?": "How are you?",
  "はい、元気です。": "Yes, I'm doing well.",
  "今日は天気がいいですね。": "The weather is nice today, isn't it?",
  "はい、とてもいい天気です。": "Yes, it's very nice weather."
}
1
00:00:00,000 --> 00:00:02,000
Good morning.

2
00:00:02,000 --> 00:00:05,000
How are you?

3
00:00:05,000 --> 00:00:07,000
Yes, I'm doing well.

4
00:00:08,000 --> 00:00:12,000
The weather is nice today, isn't it?

5
00:00:12,000 --> 00:00:16,000
Yes, it's very nice weather.

timestamp

Timestamps are preserved alongside the text. Lines are sent using the compact Toon format (milliseconds). The model may merge subtitle entries when contextually appropriate, which it reports via mergedRemarks.

Input (SRT) Prompt (User Message) Transform (Model Response) Output (SRT)

Tokens: 139

Tokens: 92

Tokens: 104

Tokens: 127

1
00:00:00,000 --> 00:00:02,000
おはようございます。

2
00:00:02,000 --> 00:00:05,000
お元気ですか?

3
00:00:05,000 --> 00:00:07,000
はい、元気です。

4
00:00:08,000 --> 00:00:12,000
今日は天気がいいですね。

5
00:00:12,000 --> 00:00:16,000
はい、とてもいい天気です。

(Toon format - compact, not JSON)

inputs[5]{start,end,text}:
  0,2000,おはようございます。
  2000,5000,お元気ですか?
  5000,7000,はい、元気です。
  8000,12000,今日は天気がいいですね。
  12000,16000,はい、とてもいい天気です。

(compact JSON, formatted here for readability)

{
  "outputs": [
    { "start": 0, "end": 2000, "text": "Good morning." },
    { "start": 2000, "end": 5000, "text": "How are you?" },
    { "start": 5000, "end": 7000, "text": "Yes, I'm doing well." },
    { "start": 8000, "end": 12000, "text": "The weather is nice today, isn't it?" },
    { "start": 12000, "end": 16000, "text": "Yes, it's very nice weather." }
  ],
  "remarksIfContainedMergers": ""
}
1
00:00:00,000 --> 00:00:02,000
Good morning.

2
00:00:02,000 --> 00:00:05,000
How are you?

3
00:00:05,000 --> 00:00:07,000
Yes, I'm doing well.

4
00:00:08,000 --> 00:00:12,000
The weather is nice today, isn't it?

5
00:00:12,000 --> 00:00:16,000
Yes, it's very nice weather.

none

No structured output. Lines are sent as text and the model returns text.

Input (SRT) Prompt (User Message) Transform (Model Response) Output (SRT)

Tokens: 139

Tokens: 59

Tokens: 42

Tokens: 127

1
00:00:00,000 --> 00:00:02,000
おはようございます。

2
00:00:02,000 --> 00:00:05,000
お元気ですか?

3
00:00:05,000 --> 00:00:07,000
はい、元気です。

4
00:00:08,000 --> 00:00:12,000
今日は天気がいいですね。

5
00:00:12,000 --> 00:00:16,000
はい、とてもいい天気です。
1. おはようございます。
2. お元気ですか?
3. はい、元気です。
4. 今日は天気がいいですね。
5. はい、とてもいい天気です。
1. Good morning.
2. How are you?
3. Yes, I'm doing well.
4. The weather is nice today, isn't it?
5. Yes, it's very nice weather.
1
00:00:00,000 --> 00:00:02,000
Good morning.

2
00:00:02,000 --> 00:00:05,000
How are you?

3
00:00:05,000 --> 00:00:07,000
Yes, I'm doing well.

4
00:00:08,000 --> 00:00:12,000
The weather is nice today, isn't it?

5
00:00:12,000 --> 00:00:16,000
Yes, it's very nice weather.

Packages

 
 
 

Contributors

Languages