Extracting an embedded subtitle from a video file using FFMPEG
Have you ever used SuperSonic, the SubSonic fork? This is a video and music streaming platform written in Java. You can install it and configure it in order to manage you media collection. You should check it out and try it (it’s open source): http://sourceforge.net/projects/supersonic/. There are lots of nice features that I’ll let you discover by yourself.
However, it misses some things. One of them is selecting the subtitle track you want when subtitles are embedded into the video container. I mean as a separate track from the video because in this case, the subtitle will be played along with the video (which is what we want). The embedded subtitles in the file are very good (no mistranslation, no horrible mistakes, …) and it would be good to: extract them, rename them like the video file (except for the extension) and have the JWPlayer “caption” extension display the subtitle.
Ok, I want to extract the subtitle and write them into an SRT formated file. I used the “IT guy survival reflex”, I googled. As SuperSonic uses FFMPEG (http://ffmpeg.org) for all the streaming stuff, I want to do the same. I must admit that the results were quite disappointing. It sounded to me like “Oh My God, don’t do this. Do not use FFMEG, use <insert random crappy software here> instead”, makes no sense when someone wants to use ffmpeg.
Fine ! The InternetZ guys won’t help me this time, let’s have a look at FFMPEG documentation. *Ouch, my head*. Ok, now let’s try to understand some basics.
There are multiple streams inside video files. This includes video, audio and subtitles all “packed” into a container such as MKV, for example. Each stream has a codec (that is used to encode/decode), like, let’s say h264 for high defition video or ac3 for audio. Subtitles have too!
Let’s use the latest FFMPEG version for our tests. I disabled yasm support because I did not have it on the system I used and I did not need it either. As it’s just for testing, I will not install it and just use the compiled binary:
git clone git://source.ffmpeg.org/ffmpeg.git ffmpeg
So here we are with our freshly compiled ffmpeg. As we want to write subtitles to a file we need a suitable encoder.
> $FFMPEG -codecs | grep "^...S.."
..S... = Subtitle codec
DES... dvd_subtitle DVD subtitles (decoders: dvdsub ) (encoders: dvdsub )
DES... dvb_subtitle DVB subtitles (decoders: dvbsub ) (encoders: dvbsub )
DES... xsub XSUB
DES... ssa SSA (SubStation Alpha) / ASS (Advanced SSA) subtitle (decoders: ass ) (encoders: ass )
DES... mov_text MOV text
DES... srt SubRip subtitle with embedded timing
DES... subrip SubRip subtitle
It seems the srt codec is available for both encoding and decoding. That’s perfect. Note: “D” stands for decoding, “E” for encoding and “S” for subtitle codec.
Listing the streams in the files
As said above, there are several streams present in video files. I’ll take the file name mytestmovie.mkv and list the different streams. It seems that there are no other way than just input the file to FFMPEG without any other parameters. The is not really clean because we will get an error for not providing an output file. Plus this output is done on the error output. This does not ease the automation of retreiving the streams list. FFMPEG actually lacks this feature. No big deal! We are just testing. I’ve stripped the useless stuff in the following output
> $FFMPEG -i mytestmovie.mkv
Stream #0:0(eng): Audio: ac3, 48000 Hz, stereo, s16, 384 kb/s (default)
Stream #0:1(fre): Subtitle: subrip (default)
Stream #0:2(eng): Subtitle: subrip
Stream #0:3(eng): Video: h264 (High), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], 29.97 fps, 29.97 tbr, 1k tbn, 59.94 tbc (default)
We can observe 1 audio stream, 1 video stream and 2 subtitle tracks. The thing we see after “Stream” are stream identifiers: 0.0, 0.1, 0.2, 0.3. These identifiers are requiered for several ffmpeg operations and that will be our case.
Extracting the subtitles, at last !
For the impatients, I’ll give it right away:
$FFMPEG -i mytestmovie.mkv -vn -an -codec:s:0.1 srt sub.srt
Now here are some explanations:
- -i : as before, the input file.
- -vn : disables the video stream.
- -an : disables the audio stream.
- -codec:s:0.1 srt : this is the interesting part. -codec:<stream_type>:<stream identifier> <codec> is to precise the codec we want to use for what type of stream and which one. ‘s’ in the above command stands for subtitle and ’0.1′ is the stream identifier we retreived with the previous command. Easy, isn’t it ? ‘srt’ is the codec we want to use.
- sub.srt is the output file.
And there we go, have a look at the sub.srt and you will find your subtitles just waiting to be used in SuperSonic or VLC !
Hope this helped. Leave a comment if you are in trouble : – )