Automatic Transcription of the audio (AI transcription)

Podigee offers the option of running an audio transcription algorithm (AI) when encoding. We save this transcript once as a JSON and once as a VTT file on our servers. The URLs to the files are shown as a link tag via <podcast:transcript> in your RSS feed.

What can the transcription algorithm do?

The algorithm attempts to convert spoken words into text and summarize sections of speech in the text into time units. 

Once the transcript has been created, it can be called up in the "Transcript" tab in the episode settings and edited in the transcript editor. This is what the created text document looks like in the transcript editor, without manual editing:

In the editor, there is the option of simply listening to short passages again and editing the text directly at this point, otherwise confusing text passages can sometimes occur.

You also have the option of downloading the transcript directly as a text file. A downloaded transcript can look like the following image, for example.

What cannot the transcription algorithm do?

When using automatic transcription, it is important not to have too high expectations of the result. Although speech recognition is getting better and better, the algorithm still has problems, especially with technical terms, unclear pronunciation and poor sound quality. And even in a very good studio production, where all speakers speak pure High German, recognition errors can occur. 

This is why post-editing of the text is always necessary. A 100% automated transcription is not possible at the moment. However, the time required to edit the automatically generated text document is still less than doing the transcription completely by hand.

Does transcription cost extra and how do I activate it?

Yes, with the Advanced Plus Plan | No, with the Business Pro Plan this feature is free of charge 
We charge €0.10 / $0.12 per transcribed minute. A separate invoice will be created for this. In the podcast settings, the feature can be switched on or off at the episode settings level. Under the "Media File" tab, you can either check the checkbox Automatic transcript in the pop-up window under "Advanced options" and also check the box Save as default settings if you want to use these settings for each subsequent episode. For episodes that have already been published, you can also click on "Update audio file" and then select the Automatic  transcript in the pop-up window under "Advanced options". Then click on Save and the audio file will be re-encoded and a transcript will be created. 

Please note: If the transcription option is activated, it will be applied to every newly encoded episode. There is no free re-encoding within 7 days for a transcription. If you want to re-encode an episode that has been transcribed, you should first disable this feature.

Before publication and as the default setting:

After publication:

What does the transcript look like in the web player and the Podigee blog?

Podigee Web-Player:

Podigee Blog of the episode, with speaker:

.., without speaker:

Why do I need transcripts?

A major disadvantage of podcasts compared to text is their searchability. You can only search for keywords once the podcast has been converted to text. One use case for transcripts would therefore be a text search for all podcast content. Ideally, all podcasts are transcribed and corrected so that a specific term can be searched for across all episodes.
A second use case would be for people who do not understand 100% of the language in the podcast and can use the transcript to read what is being said in the podcast. Helpful for people who are learning a language or whose hearing is temporarily or permanently impaired, but still want to listen to your podcast.
A third use case is interesting for friends of statistics: How much did the different participants in your podcasts speak? Ideally, you should have a transcript that assigns the speaking parts to a person (see Best Practice). The length of the individual parts of speech makes it easy to see who said how much in an episode or over an entire podcast.

There are certainly many more application examples that are not mentioned here. However, we are relying entirely on the creativity of our users.

Best practice

There are a few things you can keep in mind when recording and editing: Audio quality should be as good as possible, speakers should not talk too much out of order and as clearly as possible. Basically, these are all aspects that make a good podcast in general, but here in particular help to make it easier for the algorithm to recognize.