Module 2.d: Voice and Audio AI for GovCon

Text to voice (TTS) and voice to text (transcription) has been around for quite a while, commonly embodied by the overly robotic voice that we all know from 80's sci-fi movies. Artificial Intelligence (AI) technology is making inroads in this space, making the voices much less robotic and enable developers to integrate the capabilities into their own applications. There are a many good use cases for GovCon, and contractors have a wealth of resources at their fingertips to streamline their operations, create engaging content, and improve accessibility.

I. Practical Use Cases for AI Voice and Audio Generation

Do you have a product or service?

Do you have a walkthrough video posted on your website?

Do you post videos about your products and services on social media?

If the answer is "no" then you should think about it, more and more customers and contracting officers are using search engines to find company websites and social media to do their market research. Ricky Howard has a good intro to leveraging social media for engaging with customers.

The challenge is building content to put out there, which is why we're introducing you to each capability in turn. You saw how to use generative models create, text, images, and even video in previous modules. Now you can pair the human voice that would bring a promo video or piece of social media content to life. But this can be intimidating, maybe you don't like the sound of your voice or don't have professional recording equipment.

Another great use for TTS is reading out loud or transcribing your dictation using the features built into tools like Microsoft Word or Adobe Acrobat to read your content back to you. Some of you may have been using these features for years, if so skip to he next section. If you’re not, you should check them out.

It can be annoying to read what you just wrote, and often times your ears pick up on things your eyes miss. This is actually a pro tip that a lot of copy writers suggest because the more ways your brain ingests the information, the better it can process and intuit better compositions for what you are trying to say. Not sure if your proposal or blog post reads well, drop it into MS Word and give it a listen, you'll be surprised the things you can pick up by listening to your words read back to you.

The same goes in reverse. If you're a little fatigued with hand-writing a document, the built-in transcription tools in the modern word processor suites are actually quite good. A pro tip is to perhaps use a microphone so that the computer picks up all of the words you're speaking. It may seem somewhat artificial at first to dictate your thoughts and have to articulate your punctuation marks, but depending on how you think and write it may be rather convenient.

II. Open Source AI Voice and Audio Tools

Lets jump into some of the emergent tools that are coming to fore with the rise of AI. We'll once again turn to HuggingFace to check out the newest of the new in this realm.

The SpeechT5 model originally created by Microsoft does a pretty decent job at generating humanlike voices from input text. It is a little stilted, which seems pretty par for the course, but it gives you a number of voice options to choose from.

MMS-TTS is pretty similar in form and function, though the intonation seems a little more natural. MMS stands for Facebook’s Massively Multilingual Speech project, so MMS has the ability to also translate as it is creating audio, which could be pretty useful if you’re making multi-lingual content.

So, why would you bother using one of these open source models to convert your text script to audio rather than just doing it with MS Word? Because you can download it. If for instance you are making a promo video in Canva for your product or service, you can pop your text into the model, make an audio clip, download it, and upload it to your production project. You can’t do that with MS Word.

Now, voice is not the only audio that you can make with text, you can also make music from text, which could com in hand for punchy product promo videos or even standardized training videos. Using an open source model allows you to make your own custom music without any license worries.

III. Paid AI Voice and Audio Tools

Paid AI tools often come with additional features and support that may not be available with open source alternatives. This all makes sense, there are plenty of open source tools that developers can build into purpose-built products.

One of the most impressive platforms is Coqui, which comes with multiple voices, attitudes, and very finite control over voice tones.

Similarly Murf has very professional voice models (pun intended) that are designed for voiceover tasks.

The nice thing is that the platforms are relatively inexpensive and you can use them on month-to-month subscriptions. So, for the small businesses that need to up their advertising and presence game but don't have thousands of dollars to pay voice actors, these tools are incredibly potent.

VI. Practical Exercise

Now that you've learned about these tools, it's time to put your knowledge into practice! Create your own content using an AI voice and audio tool of your choice and share it in the group. We're excited to see what you come up with!