Local Whispers
Tuesday, May 7th 2024, 15:15 BST
London, UK
For most of the videos that I make, I also like to have subtitles, because sometimes it's easier to just read along.
I used to make these subtitles with an online service called Otter.io, but they stopped allowing uploading of video files.
And then I found Whisper, which allows me to upload audio files to create subtitles. Whisper is an API from OpenAI, mainly known for ChatGPT.
I didn't like having to upload everything to them either, as that means that they could train their model with my original video audio.
Whisper never really worked that well, because it broke up the sentences in weird places, and I had to make lots of edits. It look a long time to make subtitles.
I recently found out that it's actually possible to run Whisper locally, with an open source project on GitHub. I started looking into this to see whether I could use this to create subtitles instead.
The first thing that their documentation tells you to do is to run: pip install openai-whisper
.
But I am on a Debian machine, and here Python is installed through distribution packages, and I don't really want to mess that up. apt-get
actually suggests to create a virtual environment for Python.
In a virtual environment, you can install packages without affecting your system setup. Once you've made this virtual environment, there's actually Python binaries symlinked in there, that you can then use for installing things.
You create the virtual environment with:
python3 -m venv `pwd`/whisper-local
cd whisper-local
In the bin
directory you then have python
and pip
. That's the one you then use for installing packages.
Now let me run pip again, with the same options as before to install Whisper:
bin/pip install -U openai-whisper
It takes quite some time to download. Once it is done, there is a new whisper
binary in our bin
directory.
You also need to install fmpeg
:
sudo apt-get install ffmpeg
Now we can run Whisper on a video I had made earlier:
./bin/whisper ~/media/movie/xdebug33-from-exception.webm
The first time I ran this, I had some errors.
My video card does not have enough memory (2GB only). I don't actually have a very good video card at all, and was better off disabling it, by instructing "Torch" that I do not have one:
export CUDA_VISIBLE_DEVICES=""
And then run Whisper again:
./bin/whisper ~/media/movie/xdebug33-from-exception.webm
It first detects the language, which you can pre-empt by using --language English
.
While it runs, it starts showing information in the console. I quickly noticed it was misspelling lots of things, such as my name Derick as Derek, and Xdebug as XDbook.
I also noticed that it starts breaking up sentences in a odd way after a while. Just like what the online version was doing.
I did not get a good result this first time.
It did create a JSON file, xdebug33-from-exception.json
, but it is all in one line.
I reformatted it by installing the yajl-tools
package with apt-get
, and flowing the data through json_reformat
:
sudo apt-get install yajl-tools
cat xdebug33-from-exception.json | json_reformat >xdebug33-from-exception_reformat.json
The reformatted file still has our full text in a line, but then a segments
section follows, which looks like:
"segments": [
{
"id": 0,
"seek": 0,
"start": 3.6400000000000006,
"end": 11.8,
"text": " Hi, I'm Derick. For most of the videos that I make, I also like to have subtitles, because",
"tokens": [
50363, 15902, 11, 314, 1101, 9626, 624, 13, 1114, 749, 286,
262, 5861, 326, 314, 787, 11, 314, 635, 588, 284, 423, 44344,
11, 780, 50960
],
"temperature": 0.0,
"avg_logpro