How to Transcribe & Translate Audio with Whisper

Published — Edited

After following this guide, you will be able to use OpenAI's Whisper to transcribe audio, translate the transcription into English, and to generate subtitles from the transcription.

My interest in this technology is to generate subtitles for lesser-known media, and to generate English subtitles for media that has previously only been available in its original language.

This guide assumes that you are using Linux, specifically Ubuntu in my case, and that you have enough knowledge to follow along with any linked pages and examples.

Requirements

Whisper Installation


# Create and enter a new directory.
mkdir subtitle_generation
cd subtitle_generation

# Clone all required Git repositories.
git clone https://github.com/ggerganov/whisper.cpp.git
git clone https://github.com/openai/whisper.git
git clone https://huggingface.co/openai/whisper-large-v3.git

# Compile whisper.cpp. You may need to install "make" and
# other tools first.
cd whisper.cpp
make
cd ../

# Convert the whisper-large-v3 model from ".h5" to ".ggml".
python3 ./whisper.cpp/models/convert-h5-to-ggml.py ./whisper-large-v3 ./whisper .

Audio Preparation

Whisper will only accept a 16kHz .wav file, so you may need to extract and/or convert your audio with FFMPEG. The following commands are examples of how to do this. You will likely need to spend time learning more about FFMPEG if your situation is more complicated than extracting a single audio track from a video or converting an audio file.


# This will extract the first audio track of a video file, convert the
# audio to 16kHz, and save it as a ".wav" audio file.
ffmpeg -i "input_file.mp4" -c:a pcm_s16le -ar 16000 "output_file.wav"


# This will convert an audio file to 16kHz and save it as a ".wav" file.
ffmpeg -i "input_file.mp3" -c:a pcm_s16le -ar 16000 "output_file.wav"

Running Whisper

There are many parameters that you can use with Whisper. To view them, cd into the whisper.cpp folder and then run ./main.

As an example, assume you have a Dutch movie and that you want to generate English subtitles for it. You could use the -l nl option to tell Whisper that the file is in Dutch, the -tr option to translate the Dutch transcription into English, and the -osrt option to create an .srt subtitle file with the translated transcription.


# Place your ".wav" file in the same folder as "ggml_model.bin"
./whisper.cpp/main -l nl -tr -osrt --model ./ggml-model.bin -f ./input_file.wav

Notes