Legally-mandated closed captioning — not transcripts — may soon be coming to podcasting. The technology to enable this already exists. And it might usher in a re-imagining of what we used to call enhanced podcasts.
Jones v Gimlet could be a landmark case and possibly a turning point for the disability community’s long-standing struggles for acceptance in podcasting. In short, the class action lawsuit claims that Gimlet Media is violating provisions of the American’s With Disability Act by not providing closed captioning services for content the podcasting company produces.
Yes, you read that right: This lawsuit is about closed captions. And closed captions are not transcripts.
A transcript is often a single document that can be read on its own vs listening to the audio of a podcast episode.
Closed captions are the little snippets of text that appear on your television screen, changing scene-by-scene, with rarely more than one or two lines of dialog or narration at a time.
A legal requirement (something-something enforcement, international, exceptions, etc.) for podcasts to include a closed caption option would be of great interest to the community that has hearing loss. I am among their number, in case you didn’t know. I’m fortunate that my hearing loss is correctable. Just like glasses can correct vision, hearing aids make some of our ears function at whatever the equivalent of 20/20 is for hearing.
But the future triggered if this class action suit prevails is of interest to everyone who listens to podcasts. Not just the 10–13% of the population with hearing loss.
Closed Captions Provide A Different Experience
In our house, we always have closed captioning on when we’re watching anything on the television. Subtitles appear when we’re watching tv shows, movies, documentaries, the Hamilton musical, even live sports. Back when we used to watch live sports.
But let’s talk about the elephant in the room: Podcasting is audio, not video. So exactly where would these closed captions for podcasts appear?
In a podcast listening app, of course. Perhaps not Apple Podcasts, Spotify, Pandora, or Google Podcasts. Although they might quickly follow, I think it takes someone creating a podcast listening app that is designed for people with hearing loss — even those who are completely deaf. That app would, by default, provide closed captions on screen and “to the beat” of the audio that’s playing at that moment.
Overcoming The Technical Hurdle Of Closed Captions For Podcasts
But how is that done on the fly? What about dynamically inserted audio? Isn’t this just one more burden to place on the shoulders of already overworked podcasters and podcast production teams?
All are good questions. But before you get too twisted up in them, I invite you to think back to when you used to go to a noisy bar that had a dozen TVs playing various content. It’s quite possible, assuming the bar owner was respectful of their client tell, that closed captions were playing on one or all of those TVs, since the audio from all the programs playing at-volume and at the same time would make for a rather unpleasant din not conducive to enjoying a plate of loaded nachos and a pitcher of beer.
Those captions that appear on the screen were added in real-time, or at least near-real-time.
And the same goes for your local news programs or a national broadcast from the Rose Garden: Broadcast television already has the technology and processes to handle real-time closed captioning.
Is it perfect? Not at all, and there’s much room for improvement. But this process mostly works. And it should be a straightforward process to replicate those same processes and technologies to work in a dedicated podcast app on your phone.
Maybe it has to have a connection to the cloud to do the processing. Maybe it means the media needs to be cached. Maybe it means a delay of a few seconds. Not to diminish the size of the ask, but none of those “maybes” are insurmountable issues. We can work around them.
Does a solution for closed captioning podcasts already exist?
Spoiler: It does. Descript has already implemented the technology to do this. I use their AI engine to make the transcripts of my episodes, but their software does a lot more than that. Descript already “times” the transcription to the audio file. So when you hit “play” in a Descript transcript, the words highlight along with the audio, as you can see in this video:
I didn’t have to do anything other than upload the .mp3 file of this episode to Descript. That’s rather the point!
So the technology exists to do real-time captioning of podcast audio files today. This technology could allow us to have on-screen, real-time captions as the audio is playing. Very much like a karaoke machine, oddly enough. Whatever words are said by the podcaster, their guests, the actors, the narrator… whomever. We can have their actual words display on the screens of our mobile devices in time with the audio as it is playing.
That’s great for people like me who are often lazy about wearing our hearing aids. It’s even better for deaf people who today can’t experience the content of the show as it was actually delivered. Not just reading a big document of the words that were said, but a timed delivery of those words as-text as they are presented in the actual audio.
Now that dramatic pause you put in for effect in your audio delivery is effective in text. And it doesn’t take a lot of imagination to figure out how text treatment, like bold, italics. or emoji 💩 could be used to better communicate emphasis, subtly, or tone. Though I recognize a lot of work would be needed on that front. Baby steps, Evo.
Not All Closed Captioning Is Created Equal
Not all podcasts require “on the fly” captioning. Some of the most popular podcasts have a months-long development cycle per episode. For those, it’s not terribly arduous to imagine the development of an “official” subtitle track as part of the post-production process.
For those shows, they can layer in the text treatments I mentioned to make sure they nail the tone they were looking for. But why stop there?
Since someone is already designing a visual interface layered on top of the audio for consumption in a dedicated app, why not add more than just rich text? Designers could still fulfill the intent — close captioning for those with hearing loss — by adding in images and other content that enhances the audio experience. And not just for those with hearing loss. (But that enhancement must primarily benefit the target audience — those with hearing loss!)
If that reminds of you of enhanced podcasting, it should. And if you’re remembering all the times that enhanced podcasting has been re-invented and failed many, many times over, you are smart to do so.
But keep this in mind: One of the many reasons most of those attempts failed because the end-user — the everyday podcast listener — didn’t find the “enhanced” experience compelling enough to change their behavior.
None of those failed efforts targeted a motivated and underserved audience: those with hearing loss. Instead, they targeted everyone. Or an imagined minority of everyone who wanted to watch their screens while a podcast plays. This new effort is aimed at an actual, real audience who is terribly underserved with today’s listening apps, big and small.
Yes, People With Hearing Loss Consume Podcasts
Even though the pieces exist, I know that building a podcast app that seamlessly handles closed captions — either generated on-the-fly or baked into the metadata of an episode — takes work.
But unlike every other podcast app, this podcast app is designed for a specific user base. And 10–13% of the population sure sounds like an addressable market to me.
Plus, I think an app like this would have appeal broader than its target base. I put “karaoke” in the title of this episode on purpose. Call me crazy, but an on-screen “follow the text’ experience could be kinda fun for hard-core fans who like memorizing dialog and delivery. Ask me to quote you anything from Snatch. You’ll be amazed. Or horrified at my terrible British accents.
Again, much of the inventing necessary to enable this has been done. Though I’m not the guy who’s not going to do any of the work, it seems a straightforward process to assemble those prior inventions in a way to make closed captions for podcasts a very real thing.
Regardless of the outcome of the current lawsuit, it’s good to have this conversation. Anything we can do to make podcasts more accessible is a Very Good Thing, every rational person would agree. You don’t have to be an activist in the disability community to think so.
But maybe you know an activist already? Chances are, they know about the Gimlet v. Jones lawsuit and are rooting for the plaintiff. But they might not have considered the app-based solution. I’d really appreciate it if you sent them a link to this episode. Do it via email, a direct message, or even a text. One-to-one outreach really helps the show grow.
Since you got this far (and going against what I just said), how about mashing that 👏 button a few dozen times to let me know you dig the written-word version of my thoughts on these podcasting topics? I’d sure appreciate it!
This article started life as an episode of my four-times-a-week short-form podcast called, oddly enough, Podcast Pontifications. It’s a podcast for working podcasters that’s focused on trends in our growing industry and ideas on ways to make podcasting not just easier, but better. Yes, you should listen. Here’s an easy way: 👇
Evo Terra (hey, that’s me!) has been podcasting since 2004, is the author of Podcasting For Dummies and Expert Podcasting Practices for Dummies, and is the CEO and founder of Simpler Media Productions, a strategic podcast consultancy working with businesses, brands, and professional service providers all around the world.