Today I successfully extracted and transcribed audio from our Sonic Foundry Mediasite recorder.  These produce windows media video files so I converted to mp3 before continuing with our transcription workflow.

I used the opensource ffmpeg software to convert between formats:

ffmpeg -i video.wmv -vn -acodec mp3 output.mp3

"-i" marks the input filename

"-vn" is to say that the output should not include a video stream

"-acodec mp3" is to say that the output audio should be encoded as mp3

"output.mp3" is the output filename

This produced an mp3 file which I uploaded to our transcription machine. Unfortunately our recording level was extremely low, which produced extremely poor transcription results on my first attempt. For my second attempt I applied a big volume boost during transcoding to mp3 format.

ffmpeg -i video.wmv -vn -acodec mp3 -vol 2048 output.mp3

When using the audio with boosted volume for transcription the results are much better, but still not good (as there is a lot of microphone amplifier hiss on the recording).

I spent somewhere between 1-2minutes per input audio minute following an approach for rapid transcription turnaround described in this earlier post

I did not do two pass correction here. Instead I mostly corrected any misrecognised words, removed badly misrecognised phrases entirely, added linking words to connect the dots, and inserted punctuation. Some sentences where the speaker is mumbling very quietly down in the hiss required me to type from scratch, but this was rare.

I used our transcription tools to produce an SRT format subtitle file, and uploaded the video to youtube to share with colleague and collaborator on this transcription project Dan Buzzo. Dan is also the person speaking in the video

The first few minutes of the video are up on youtube here (this is the same video as linked at the beginning of this post):


