Subtitle: A Python Library for Subtitle Processing


Open-source subtitle generation for seamless content translation.

Key Features:

  • Open-source: Freely available for use, modification, and distribution.
  • Self-hosted: Run the tool on your own servers for enhanced control and privacy.
  • AI-powered: Leverage advanced machine learning for accurate and natural-sounding subtitles.
  • Multilingual support: Generate subtitles for videos in a wide range of languages.
  • Easy integration: Seamlessly integrates into your existing workflow.

I made this project for fun, but I think it could also be useful for other people.



First, you need to install FFmpeg. Here's how you can do it:

# On Linux
sudo apt install ffmpeg


You can run the script from the command line using the following command:

python <filepath | video_url> [--model <modelname>]

Replace <filepath | video_url> with the path to your video file. The --model argument is optional. If not provided, it will use 'base' as the default model.

For example:

python /path/to/your/video.mp4 --model base

This will run the script on the video at /path/to/your/video.mp4 using the base model. Please replace /path/to/your/video.mp4 with the actual path to your video file.


Here are the models you can use: Note: Use the .en model only when the video is in English.

  • tiny.en
  • tiny
  • tiny-q5_1
  • tiny.en-q5_1
  • base.en
  • base
  • base-q5_1
  • base.en-q5_1
  • small.en
  • small.en-tdrz
  • small
  • small-q5_1
  • small.en-q5_1
  • medium
  • medium.en
  • medium-q5_0
  • medium.en-q5_0
  • large-v1
  • large-v2
  • large
  • large-q5_0


You can modify the behaviour by using these parameters whisper binary as follows:

./whisper [options] file0.wav file1.wav ...


Here are the options you can use with the whisper binary: 

-h, --help Show help message and exit
-t N, --threads N4Number of threads to use during computation
-p N, --processors N1Number of processors to use during computation
-ot N, --offset-t N0Time offset in milliseconds
-on N, --offset-n N0Segment index offset
-d N, --duration N0Duration of audio to process in milliseconds
-mc N, --max-context N-1Maximum number of text context tokens to store
-ml N, --max-len N0Maximum segment length in characters
-sow, --split-on-wordfalseSplit on word rather than on token
-bo N, --best-of N2Number of best candidates to keep
-bs N, --beam-size N-1Beam size for beam search
-wt N, --word-thold N0.01Word timestamp probability threshold
-et N, --entropy-thold N2.40Entropy threshold for decoder fail
-lpt N, --logprob-thold N-1.00Log probability threshold for decoder fail
-debug, --debug-modefalseEnable debug mode (eg. dump log_mel)
-tr, --translatefalseTranslate from source language to English
-di, --diarizefalseStereo audio diarization
-tdrz, --tinydiarizefalseEnable tinydiarize (requires a tdrz model)
-nf, --no-fallbackfalseDo not use temperature fallback while decoding
-otxt, --output-txttrueOutput result in a text file
-ovtt, --output-vttfalseOutput result in a vtt file
-osrt, --output-srtfalseOutput result in a srt file
-olrc, --output-lrcfalseOutput result in a lrc file
-owts, --output-wordsfalseOutput script for generating karaoke video
-fp, --font-path/System/Library/Fonts/Supplemental/Courier New Bold.ttfPath to a monospace font for karaoke video
-ocsv, --output-csvfalseOutput result in a CSV file
-oj, --output-jsonfalseOutput result in a JSON file
-ojf, --output-json-fullfalseInclude more information in the JSON file
-of FNAME, --output-file FNAME Output file path (without file extension)
-ps, --print-specialfalsePrint special tokens
-pc, --print-colorsfalsePrint colors
-pp, --print-progressfalsePrint progress
-nt, --no-timestampsfalseDo not print timestamps
-l LANG, --language LANGenSpoken language ('auto' for auto-detect)
-dl, --detect-languagefalseExit after automatically detecting language
--prompt PROMPT Initial prompt
-m FNAME, --model FNAMEmodels/ggml-base.en.binModel path
-f FNAME, --file FNAME Input WAV file path
-oved D, --ov-e-device DNAMECPUThe OpenVINO device used for encode inference
-ls, --log-scorefalseLog best decoder scores of tokens
-ng, --no-gpufalseDisable GPU

Example for running Binary

Here's an example of how to use the whisper binary:

./whisper -m models/ggml-tiny.en.bin -f Rev.mp3 out.wav -nt --output-vtt

Reference & Credits

🚀 About Me

Just try to being a Developer!


For support, email

Download Details:

Author: innovatorved
Source Code: 
License: MIT license

#python #opensource #subtitle #generate 

Subtitle: A Python Library for Subtitle Processing
1.45 GEEK