Large language models (LLMs), such as ChatGPT, have demonstrated their capabilities as a robust translator, capable of handling common natural languages with contextual understanding, as well as unconventional forms of writing like word scrambling. However, it may not always produce a deterministic output or adhere to line-to-line correlation, potentially disrupting the timing of subtitles, even when instructed to follow precise instructions and with the model temperature parameter set to 0.
This utility uses the OpenAI ChatGPT API to translate text, with a specific focus on line-based translation, especially for SRT subtitles. The translator optimizes token usage by removing SRT overhead and grouping text into batches, resulting in arbitrary length translations without excessive token consumption while ensuring a one-to-one match between line input and output.
Upgrading from v2? See the v2 -> v3 Migration Guide for breaking changes.
Web Interface: https://cerlancism.github.io/chatgpt-subtitle-translator
- Web User Interface (Web UI) and Command Line Interface (CLI)
- Supports Structured Output: for more concise results, enabled by default in the Web UI and CLI
- Supports Prompt Caching: by including the full context of translated data, the system instruction and translation context are packaged to work well with prompt caching, controlled with
-c, --context(CLI only) - Supports any OpenAI API compatible providers such as running Ollama locally
- Line-based batching: avoids token limit per request, reduces overhead token wastage, and maintains translation context to a certain extent
- Optional OpenAI Moderation tool check: prevents token wastage if the model is highly likely to refuse to translate, enabled with
--use-moderator(CLI only) - Streaming process output
- Request per minute (RPM) rate limits
- Progress resumption (CLI only)
Reference: https://github.com/openai/openai-quickstart-node#setup
- Node.js version
>= 20required. This README assumesbashshell environment - Clone this repository
git clone https://github.com/Cerlancism/chatgpt-subtitle-translator
- Navigate into the directory
cd chatgpt-subtitle-translator - Install the requirements
npm install
- Give executable permission
chmod +x cli/translator.mjs
- Copy
.example.envto.envcp .env.example .env
- Add your API key to the newly created
.envfile
cli/translator.mjs --helpUsage: translator [options]
Translation tool based on ChatGPT API
Options:
-
--from <language>
Source language (default: "") -
--to <language>
Target language (default: "English") -
-s, --system-instruction <instruction>
Override the prompt system instruction templateTranslate ${from} to ${to}with this text, ignoring--fromand--tooptions -
-i, --input <file>
Input source text with the content of this file, in.srtformat or plain text -
-o, --output <file>
Output file name, defaults to be based on input file name -
-r, --structured <mode>Structured response format mode. (default:array, choices:array,object,timestamp,agent,none)arrayStructures the input and output into an array format.objectStructures the input and output as a keyed object.timestampProvides the model with start/end timestamps alongside each entry's text, allowing it to merge adjacent entries into one. A batch is only retried when the output time span boundaries don't match the input - unlike other modes which retry on any line count mismatch - significantly reducing token wastage from retries. Uses more tokens per batch due to timestamps in input and a merge remarks field in output. Output entry count may differ from input, so progress file resumption is not supported.agentAlias for theagentsubcommand with default options. Use the dedicated subcommand for full configuration. See Agent Mode.noneLegacy compatibility mode, disables structured output.
-
-c, --context <tokens>Include translation history up to a token budget to work well with prompt caching. Default:2000. Set to0to include history without a token limit check.The token budget is tracked from actual model response token counts. The history is chunked into user/assistant message pairs using the last value in
--batch-sizes.Recommended value: set
<tokens>up to ~30% less than the model's max context length to leave room for the current batch and system prompts. For example, for a128Kcontext model:--context 90000. -
-b, --batch-sizes <sizes>Batch sizes of increasing order for translation prompt slices in JSON ArrayThe number of lines to include in each translation prompt, provided that they are estimated to be within the token limit.
In case of mismatched output line quantities, this number will be decreased step by step according to the values in the array, ultimately reaching one.Larger batch sizes generally lead to more efficient token utilization and potentially better contextual translation.
However, mismatched output line quantities or exceeding the token limit will cause token wastage, requiring resubmission of the batch with a smaller batch size.When omitted, batch size is determined automatically per batch based on the
--contexttoken budget. On failure, the size is reduced and retried down to a minimum, then resets on the next successful batch. -
-g, --guard-repetition <threshold>Minimum number of pattern repeats before aborting a streaming response (default:10). When the model falls into a repetition loop during streaming, the response is aborted and retried with a smaller batch. Set to0to disable repetition detection. -
--initial-prompts <prompts>Initial prompts for the translation in JSON (default:"[]") -
--use-moderatorUse the OpenAI Moderation tool -
--moderation-model <model>(default:"omni-moderation-latest") https://developers.openai.com/api/docs/models -
--no-prefix-numberDon't prefix lines with numerical indices. Ignored in-r, --structuredarray|object|timestamp- prefix numbers are always disabled there. -
--no-line-matchingDon't enforce one-to-one line quantity input output matching. Ignored in-r, --structuredtimestamp- line matching is always disabled there since entries may be merged. -
-p, --plain-text <text>Input source text with this plain text argument. Not supported in-r, --structuredtimestampmode, or when using theagentsubcommand with-r timestamp. -
--no-streamDisable stream progress output to terminal (streaming is on by default) -
--log-level <level>
Log level (default:debug, choices:trace,debug,info,warn,error,silent) -
--silent
Same as--log-level silent -
--quietSame as--log-level silent
Additional Options for GPT: https://developers.openai.com/api/reference/resources/chat/subresources/completions/methods/create
-m, --model <model>(default:"gpt-4o-mini") https://developers.openai.com/api/docs/models--reasoning_effort <reasoning_effort>Constrains effort on reasoning for reasoning models. Accepted values depend on the model (e.g."low","medium","high"), follows the model's default when not set."none"disables reasoning/thinking entirely (supported by OpenAI o-series/GPT-5+ and open models via Ollama such as Qwen3).-t, --temperature <temperature>Sampling temperature to use, should set a low value such as0to be more deterministic for translation (default:0)--top_p <top_p>Nucleus sampling parameter, top_p probability mass--presence_penalty <presence_penalty>Penalty for new tokens based on their presence in the text so far--frequency_penalty <frequency_penalty>Penalty for new tokens based on their frequency in the text so far--logit_bias <logit_bias>Modify the likelihood of specified tokens appearing in the completion
Subcommand for multi-pass agentic translation. Accepts all standard translation options.
cli/translator.mjs agent --helpAgent mode runs multiple passes before translating:
Overview - Samples the file to produce a content overview (file identity, duration, genre/tone, character names) and detects the source language.
Planning - Scans the file in token-bounded windows. Each window produces a batch summary (characters, locations, events, tone). Summaries are consolidated and used to generate a refined translation instruction.
Translation - Translates using the enriched instruction. After the first batch, a sample of the output is checked to confirm the target language before proceeding.
Structured mode defaults to array, pass --structured timestamp to use timestamp mode instead.
# Default (array delegate)
cli/translator.mjs agent --input subtitles.srt --from Japanese --to English
# Timestamp delegate
cli/translator.mjs agent --input subtitles.srt --structured timestamp --from Japanese --to English--skip-refine
Skip the final instruction refinement step at the end of the planning pass and use the original system instruction directly.--no-fitting
Skip LLM-based token-range fitting for planning summaries and consolidation. Summaries are used as-is regardless of token range.--context-summary <summary>
Provide a context summary directly, bypassing the planning pass entirely and proceeding straight to translation.
cli/translator.mjs --plain-text "你好"Standard Output
Hello.
cli/translator.mjs --to "Emojis" --plain-text "Chuck Norris can walk with the animals, talk with the animals; grunt and squeak and squawk with the animals... and the animals, without fail, always say 'yessir Mr. Norris'."Standard Output
👨🦰💪🚶♂️🦜🐒🐘🐅🐆🐎🐖🐄🐑🦏🐊🐢🐍🐿️🐇🐿️❗️🌳💬😲👉🤵👨🦰👊=🐕🐑🐐🦌🐘🦏🦍🦧🦓🐅🦌🦌🦌🐆🦍🐘🐘🐗🦓=👍🤵.
cli/translator.mjs --system-instruction "Scramble characters of words while only keeping the start and end letter" --no-prefix-number --no-line-matching --plain-text "Chuck Norris can walk with the animals, talk with the animals;"Standard Output
Cuhck Nroris can wakl wtih the aiamnls, talk wtih the aiamnls;
cli/translator.mjs --system-instruction "Unscramble characters back to English" --no-prefix-number --no-line-matching --plain-text "Cuhck Nroris can wakl wtih the aiamnls, talk wtih the aiamnls;"Standard Output
Chuck Norris can walk with the animals, talk with the animals;
cli/translator.mjs --input test/data/test_cn.txtInput file: test/data/test_cn.txt
你好。
拜拜!
Standard Output
Hello.
Goodbye!
cli/translator.mjs --input test/data/test_ja_small.srtInput file: test/data/test_ja_small.srt
1
00:00:00,000 --> 00:00:02,000
おはようございます。
2
00:00:02,000 --> 00:00:05,000
お元気ですか?
3
00:00:05,000 --> 00:00:07,000
はい、元気です。
4
00:00:08,000 --> 00:00:12,000
今日は天気がいいですね。
5
00:00:12,000 --> 00:00:16,000
はい、とてもいい天気です。Output file: test/data/test_ja_small.srt.out_English.srt
1
00:00:00,000 --> 00:00:02,000
Good morning.
2
00:00:02,000 --> 00:00:05,000
How are you?
3
00:00:05,000 --> 00:00:07,000
Yes, I'm doing well.
4
00:00:08,000 --> 00:00:12,000
The weather is nice today, isn't it?
5
00:00:12,000 --> 00:00:16,000
Yes, it's very nice weather.SRT indices and timestamps are stripped or simplified before sending to the model, reducing tokens. Lines are batched together into a single prompt - removing repeated per-entry overhead. The default system instruction is a minimal Translate to <language> (3 tokens). Structured output modes enforce a schema so the model returns only the translated text.
Five modes are available via --structured:
Lines are sent as a JSON array. The model returns a matching array.
| Input (SRT) | Prompt (User Message) | Transform (Model Response) | Output (SRT) |
|---|---|---|---|
|
Tokens: |
Tokens: |
Tokens: |
Tokens: |
1
00:00:00,000 --> 00:00:02,000
おはようございます。
2
00:00:02,000 --> 00:00:05,000
お元気ですか?
3
00:00:05,000 --> 00:00:07,000
はい、元気です。
4
00:00:08,000 --> 00:00:12,000
今日は天気がいいですね。
5
00:00:12,000 --> 00:00:16,000
はい、とてもいい天気です。 |
(compact JSON, formatted here for readability) {
"inputs": [
"おはようございます。",
"お元気ですか?",
"はい、元気です。",
"今日は天気がいいですね。",
"はい、とてもいい天気です。"
]
} |
(compact JSON, formatted here for readability) {
"outputs": [
"Good morning.",
"How are you?",
"Yes, I'm doing well.",
"The weather is nice today, isn't it?",
"Yes, it's very nice weather."
]
} |
1
00:00:00,000 --> 00:00:02,000
Good morning.
2
00:00:02,000 --> 00:00:05,000
How are you?
3
00:00:05,000 --> 00:00:07,000
Yes, I'm doing well.
4
00:00:08,000 --> 00:00:12,000
The weather is nice today, isn't it?
5
00:00:12,000 --> 00:00:16,000
Yes, it's very nice weather. |
Source lines are used as keys in the response schema. The model maps each source key to its translation. No explicit user message is sent - the schema itself conveys the input.
| Input (SRT) | Prompt (Schema Keys) | Transform (Model Response) | Output (SRT) |
|---|---|---|---|
|
Tokens: |
Tokens: |
Tokens: |
Tokens: |
1
00:00:00,000 --> 00:00:02,000
おはようございます。
2
00:00:02,000 --> 00:00:05,000
お元気ですか?
3
00:00:05,000 --> 00:00:07,000
はい、元気です。
4
00:00:08,000 --> 00:00:12,000
今日は天気がいいですね。
5
00:00:12,000 --> 00:00:16,000
はい、とてもいい天気です。 |
Source lines are encoded as response schema keys (no user message) {
"おはようございます。": "string",
"お元気ですか?": "string",
"はい、元気です。": "string",
"今日は天気がいいですね。": "string",
"はい、とてもいい天気です。": "string"
} |
(compact JSON, formatted here for readability) {
"おはようございます。": "Good morning.",
"お元気ですか?": "How are you?",
"はい、元気です。": "Yes, I'm doing well.",
"今日は天気がいいですね。": "The weather is nice today, isn't it?",
"はい、とてもいい天気です。": "Yes, it's very nice weather."
} |
1
00:00:00,000 --> 00:00:02,000
Good morning.
2
00:00:02,000 --> 00:00:05,000
How are you?
3
00:00:05,000 --> 00:00:07,000
Yes, I'm doing well.
4
00:00:08,000 --> 00:00:12,000
The weather is nice today, isn't it?
5
00:00:12,000 --> 00:00:16,000
Yes, it's very nice weather. |
Timestamps are preserved alongside the text. Lines are sent using the compact Toon format (milliseconds). The model may merge subtitle entries when contextually appropriate, which it reports via mergedRemarks.
| Input (SRT) | Prompt (User Message) | Transform (Model Response) | Output (SRT) |
|---|---|---|---|
|
Tokens: |
Tokens: |
Tokens: |
Tokens: |
1
00:00:00,000 --> 00:00:02,000
おはようございます。
2
00:00:02,000 --> 00:00:05,000
お元気ですか?
3
00:00:05,000 --> 00:00:07,000
はい、元気です。
4
00:00:08,000 --> 00:00:12,000
今日は天気がいいですね。
5
00:00:12,000 --> 00:00:16,000
はい、とてもいい天気です。 |
(Toon format - compact, not JSON) inputs[5]{start,end,text}:
0,2000,おはようございます。
2000,5000,お元気ですか?
5000,7000,はい、元気です。
8000,12000,今日は天気がいいですね。
12000,16000,はい、とてもいい天気です。 |
(compact JSON, formatted here for readability) {
"outputs": [
{ "start": 0, "end": 2000, "text": "Good morning." },
{ "start": 2000, "end": 5000, "text": "How are you?" },
{ "start": 5000, "end": 7000, "text": "Yes, I'm doing well." },
{ "start": 8000, "end": 12000, "text": "The weather is nice today, isn't it?" },
{ "start": 12000, "end": 16000, "text": "Yes, it's very nice weather." }
],
"remarksIfContainedMergers": ""
} |
1
00:00:00,000 --> 00:00:02,000
Good morning.
2
00:00:02,000 --> 00:00:05,000
How are you?
3
00:00:05,000 --> 00:00:07,000
Yes, I'm doing well.
4
00:00:08,000 --> 00:00:12,000
The weather is nice today, isn't it?
5
00:00:12,000 --> 00:00:16,000
Yes, it's very nice weather. |
No structured output. Lines are sent as text and the model returns text.
| Input (SRT) | Prompt (User Message) | Transform (Model Response) | Output (SRT) |
|---|---|---|---|
|
Tokens: |
Tokens: |
Tokens: |
Tokens: |
1
00:00:00,000 --> 00:00:02,000
おはようございます。
2
00:00:02,000 --> 00:00:05,000
お元気ですか?
3
00:00:05,000 --> 00:00:07,000
はい、元気です。
4
00:00:08,000 --> 00:00:12,000
今日は天気がいいですね。
5
00:00:12,000 --> 00:00:16,000
はい、とてもいい天気です。 |
|
|
1
00:00:00,000 --> 00:00:02,000
Good morning.
2
00:00:02,000 --> 00:00:05,000
How are you?
3
00:00:05,000 --> 00:00:07,000
Yes, I'm doing well.
4
00:00:08,000 --> 00:00:12,000
The weather is nice today, isn't it?
5
00:00:12,000 --> 00:00:16,000
Yes, it's very nice weather. |