Testing AI Transcription Tools

icon AUDIO
icon JST
Photo - Testing AI Transcription Tools
Automatic transcription of audio and video recordings is becoming increasingly popular among journalists, podcasters, and bloggers who want to convert their voice recordings into text quickly. Just a few years ago, this task had to be done manually, requiring significant time and effort.
Artificial intelligence has now come to the rescue, enabling quick and efficient transcription of audio files.

We tested several popular AI transcription applications, evaluating them based on accuracy, processing speed, and ease of use. We also looked at additional features and value for money.

For the test, we used a one-minute trailer for the movie The Batman, which includes voice-over narration, a vocal track, and various sound effects (mechanical noises, explosions, gunshots). Our goal was to see if the AI could accurately identify and separate these elements into different channels.

Trint

Trint was founded by Jeff Kofman, a veteran war correspondent with over 30 years of experience at leading American and European publications. He was awarded an Emmy for his coverage of the Libyan revolution and the final days of Gaddafi.  
Kofman began using automatic transcription for interviews in 2014 and later decided to create his own AI service to address the challenges journalists face when transcribing manually.
Trint is designed for creative teams working on professional content creation. The application supports over 40 languages, making it a versatile tool for users worldwide.

It is user-friendly and simple to use. 
To transcribe, you need to upload a video file and select the desired language. The AI then takes over. In our test, it took 90 seconds to create a text version of a 58-second clip.
The application accurately separated the narrative text from the song lyrics, proving that Trint can effectively transcribe conversations involving multiple speakers.
Trint can recognize voices

Trint can recognize voices

To the right of the text, there's a verification field. Checking this box indicates that the transcription is accurate. This feature is handy for collaborative work, as it tracks who made changes and when.

For collaborative work, Trint offers a basic "Share" function.
Trint can be used for teamwork

Trint can be used for teamwork

The generated text file can be imported as subtitles into the uploaded video.
The Create captions section allows further text adjustments and setting of timestamps for when the text should appear on screen.
Creating subtitles with Trint

Creating subtitles with Trint

We tested all these features using the demo version available on Trint's website.

Our Conclusion: Trint is an excellent and intuitive application for media editors or translation agencies.
It can also be beneficial for freelancers who frequently need transcription services. However, for occasional use, Trint may not be cost-effective, as the subscription is quite expensive.
The monthly starter package for one user costs $80.
The annual subscription, paid upfront, reduces the cost to $52 per month ($624 for 12 months).

Otter

Otter.ai is an AI-powered transcription application that works with audio and video recordings in real-time. It is primarily marketed as a tool for transcribing business meetings and online negotiations, converting conversations into text as they happen.
After testing, we found that the service can also be effectively used in several other areas:
  1. Video Interviews: Recruiters can use Otter.ai to focus on the interviewee without worrying about taking notes.
  2. Lectures and Seminars: The app effectively records speaker presentations, eliminating the need for students to handwrite their notes using shorthand.
  3. Court Proceedings and Hearings: This tool is useful for all participants, from lawyers to jurors.

We used the basic free plan for our test.
The application performed reasonably well in transcribing a YouTube video, though there were minor errors (e.g., missing the word “violence” in the first sentence and transcribing "Just me and you" as "just mean" in the last one). These errors can be corrected manually. 
Transcribing a video file with Otter

Transcribing a video file with Otter

However, we realized that using Otter for such tasks was not quite right, as it was designed for other purposes.
Summary of the trailer transcription in Otter

Summary of the trailer transcription in Otter

The app can integrate with Google Meet, Microsoft Teams, Zoom, and Slack. 
 It can:
  • track your meeting calendar;
  • identify speakers from your contacts in the transcription;
  • send notifications when you are mentioned or assigned tasks;
  • generate short meeting summaries and send them to all participants;
  • analyze meetings by keywords;
  • publish meeting summaries (date, number of participants, main speakers, topics discussed).
Otter can record your meetings in Google Meet

Otter can record your meetings in Google Meet

Additionally, the corporate version of the app includes a dedicated chat where you can ask the AI if you were mentioned in a meeting you didn’t attend and what decisions were made. This feature is invaluable for employees working remotely and in different time zones.
Individual chat with AI in Otter

Individual chat with AI in Otter

Our Conclusion: Otter is a convenient application for companies with many freelancers. However, it has limited practical use for individual users—it's like using a microscope to hammer a nail.

Otter offers four pricing plans:

  • Basic: Free, with 30 minutes of transcription per month;
  • Pro: $9.17 per user per month;
  • Business: $20 per user per month;
  • Enterprise (for large companies): Pricing is determined by the developers based on the required features and customization needs.

Beey

Beey is an affordable and highly user-friendly transcription application, making it an excellent choice for budget-conscious users like students, YouTubers, and budding journalists. The service offers 30 free minutes of audio transcription for those who want to test the capabilities of the AI.
We took advantage of this trial and can highlight the following features of the service.

  1. High Accuracy: We checked English and German voice files and found no errors in the transcription. We can't comment on the accuracy for less common languages (Beey supports 30 languages in total).
  2. Quick Transcription: It took 85 seconds to transcribe our file.
  3. Machine Translation: The transcribed document can be translated into 20 languages.
  4. User-Friendly Interface: Files are easy to upload, edit, and share.
  5. Flexible Pricing: Beey doesn't require a monthly subscription. Users pay based on usage, with one minute of AI transcription costing €0.13 (+ VAT).
  6. Convenient Settings: Users can specify the number of speakers and indicate background noise (like music with lyrics) before transcription.
Beey Settings

Beey Settings

Beey provided the best text version of the trailer we tested. The AI missed only one word, “Who.” In other challenging spots where the vocalist's voice overlapped with the narrator's, the transcription was accurate.
Among other features, users also have the option to add subtitles to their videos.
Subtitle Settings in Beey

Subtitle Settings in Beey

Beey offers numerous settings, but even without customization, the AI transcribes quickly and accurately by default. For beginners, this is the most suitable option among the services we tested.
Our Conclusion: We highly recommend Beey to anyone looking to automate routine tasks and free up time for creative work.