There’s a wave of advancements in both text-to-speech and AI-generated text that will impact podcasting as we know it. But rather than replace humans, this might be a boon for creative podcasters.
By now you’ve probably heard of both GPT-3 and Descript’s Overdub feature. Briefly, the first is “automatic writing”, but done by deep-learning, predictive language techno-wizardry instead of mediums pretending to talk to the dead. The second is a text-to-speech generator using your own voice that’s being hailed as the “deep fake” of podcasting.
Because the output generated by these systems is eerily good — leaps and bounds better than what we’ve seen (and heard) before — podcasters are taking note. Yesterday, Brian McCullough used Overdub to generate the last half of the Techmeme Ride Home episode for that day. It was an excellent side-by-side (or first-then-second) comparison of Brian’s actual voice and the Overdub recreated voice. And I as I said, it was eerily good.
But it wasn’t perfect. For advancements in this area, perfection is approached on an asymptotic curve, and the uncanny valley is quite deep.
But is it good enough for podcasters to make use of today?
Will it be good enough in our immediate future, rather than looking 10–20 years ahead?
I think it rather unlikely that either technology will completely replace the role of human creativity in podcasting. If you’re holding out for a future where your computer generates a script in your own style, then generates narration in your own voice, and then automatically assembles and publishes the episode… keep hope alive, I suppose.
But these technologies might enable more human creativity for podcasters.
A New Podcast Ad Format
Podcast ads are problematic for a number of reasons. Host-read ads are the gold standard, but some high-profile hosts won’t do them for the implied endorsement implications. Pre-packaged — or re-packaged — “radio” spots feel odd to listeners, regardless of whether or not they get results. Producer-read ads are common with large podcast networks, but I think they provide more branding for the network than they do the podcast running the ad.
Given this new system’s existing ability to be seeded (trained, really) by anyone’s voice, any podcast could easily and inexpensively have a custom voice-over “person” ready to kick out ads at any time. Or a slew of “persons” to perfectly narrate ads instantly and in one-take.
Customized Canned Content
Quite often, podcasts will have narration bits inside their episodes that experience little change from episode to episode. You’ve heard shows that always open the exact same way, with a little (or a lot) introduction to the overall podcast. The same goes for the final seconds/minutes of an episode, where the production crew is credited, social sites are mentioned, and more. These chunks of audio are often “canned” so that they can be added during the assembly of the episode, saving time during the production process.
There’s a tradeoff for the convenience: they can’t be customized. Not without going back to the voice talent with a new script, which rather defeats the purpose.
Uness your voice talent is made of silicon, that is. Tweak the text a bit, feed it to the system, and you have episode-by-episode customizations to your intro, your outro, and any relevant parts in between. Best of all; they sound human.
Localizing Podcast Episodes
How many more people could your podcast reach if it was available in a different language? So far, it’s taken a lot of money and time to localize podcast content. The original episode has to be transcribed and translated first, then given to local actors to modify or ad-lib so it sounds like it really comes from their “voice”. That’s even harder to do with the interview portion, especially if the context is unfamiliar to the voice actor, which is almost always.
But you don’t have to worry about getting convincingly-human speech out of your silicon-based talent (or talent team). You can spend all of your time getting the script perfect in that language, and leave it to the system to generate the audio. No, the narration won’t be perfect, so you’d want to let the second-language audience in on the trick. Or the third-language. Or the 30th. This scales nicely.
Non-Critical Narrative Bits
Narrative podcasts are a huge amount of work, even if they are voiced by one person. But sometimes, a solo-effort narrative episode would be made better if there were other voices featured. Not as a co-host (we’re a long way from that), but to add color to portions of the episode that would otherwise sound “flat”. Maybe it allows you to give a voice (or voices) to text-based content from a third party you’d normally just narrate on your own. Now you won’t have to narrate it. And there will be a new voice on your show.
Giving Voice To NPCs In Podcast Fiction
In case you had a less-geeky childhood than me, NPCs are non-playing characters. Immersive video games make heavy use of NPCs to add flair and flesh to the worlds occupied by players. Similarly, podcast fiction shows — especially those that are dialog-heavy — often have bit roles of only a single or handful of lines. At the risk of being accused of taking money out of the mouths of hard-working voice actors, some of these parts could be synthesized. Again, I love that voice actors are finding jobs in the podcasting space, But for smaller productions that lack the budget to pay for a narrator on every single part, this could be an avenue to let them bring their project to life.
C3PO Won’t Replace Me Anytime Soon
I have no plans of turning Podcast Pontifications over to an android anytime soon. I’d lose a big part of my daily therapy if I did that! Nor do I think you should worry (or dream) about this tech getting so good that your own voice is no longer required. But if you decide to use an AI as your co-host, I totally want to listen.
I think these advancements will allow smart and creative podcasters to use this technology to not only streamline some of their processes but also invent new formats and uses that will give listeners even more reasons to listen to podcasts.
I know this a contentious topic. I know that some people see warning signs and are concerned about the ramifications to all of podcasting. But the future has a habit of showing up whether we’re ready for it or not.
Do you have some resistant friends in the podcasting space who have a less-than-flattering view of the role AI-generated text and voices will play in podcasting? Send them this episode to give them a fresh perspective. Or just to get them fired up, I suppose.
If you enjoyed my perspective on this (and other) matters, remember that I am not a robot, and I still need to both eat and drink. You can buy me a virtual coffee to show your appreciation and keep the show going over at BuyMeACoffee.com/EvoTerra.
I’ll be back tomorrow for yet another Podcast Pontification.
Originally published at https://podcastpontifications.com, where it started life as an episode of my four-times-a-week short-form podcast called, oddly enough, Podcast Pontifications. It’s a podcast for working podcasters that’s focused on trends in our growing industry and ideas on ways to make podcasting not just easier, but better. Yes, you should listen. Here’s an easy way: 👇
Evo Terra (hey, that’s me!) has been podcasting since 2004, is the author of Podcasting For Dummies and Expert Podcasting Practices for Dummies, and is the CEO and founder of Simpler Media Productions, a strategic podcast consultancy working with businesses, brands, and professional service providers all around the world.