A scene from Netflix's Korean-language survival drama “Squid Game.” Hits like this will become a lot more common as technology allows dubbing to happen more efficiently and effectively according to those in the auto-dubbing industry. (Youngkyu Park/AFP/Getty Images)
Few cultural experiences are as surreal as stumbling upon a dubbed Hollywood movie. There’s the cartoonish lip movements. The mistimed action. The strange baritone that doesn’t sound at all like you remember Tom Hanks sounding. It turns out that after all these decades a new class of start-ups is hoping to address this unintentional performance art. Using artificial intelligence and machine learning, they aim to make the dubbing process more efficient and more natural, part of an emerging movement known as “auto-dubbing.”
The implications could be sweeping. In the grand vision of auto-dubbers, any video content, from “Squid Game” on down, will one day be available in a custom language at the flick of a button. Even more important, it would fully feel like the original. The resulting world would be one of seamless interchangeability: a piece of entertainment would emanate not from a particular place, but pop up unexpectedly as the seeming creation of whatever language its viewer wants to watch it. As Forrest Gump famously said, “das Leben war wie eine Schachtel Schokoladen.” “The potential here is so big,” said Scott Mann, a Hollywood director who co-founded one of these start-ups, Flawless. “Most of us are not even aware how much great content is in the world. Now we can watch it.” Yet for all its shimmering cross-culturalism, hidden and often dark social implications abound. Entertainment in an easy-dub time could lose all of its local flavor. Consumers around the world, meanwhile, may never be exposed to the sounds of a foreign language. Dubbing has long been a painstaking exercise, and that’s before anyone even watches. Traditional dubbing often works like this. A studio or local distributor, having decided it wants a local-language release, pays to translate a script, hire a set of voice actors to play the characters, rent out engineering equipment, put the actors through numerous voice takes, record them and then splice their readings back into the original video — a mighty grapple to achieve a smooth final product. The whole process can take months. Auto-dubbing can work like this. The original actor records five minutes of random text in their own language. Then the machines take over: A neural network learns the actor’s voice, a program digests that vocal information and applies it to a digital translation of the script, then the AI spits out perfectly timed lines from the film in the foreign language and drops them into the action. The whole process could take weeks. “We have the technology to fill a big gap,” said Oz Krakowski, chief marketing officer of deepdub, a Dallas- and Tel Aviv-based start-up that employs essentially this process. “We can give studios what they want and give consumers a totally unique experience.” The company is about to put its claim to the test. It is releasing “Every Time I Die,” a 2019 thriller whose English-language version is on Netflix, in Spanish and Portuguese versions dubbed entirely by AI. Auto-dubbing companies take a range of approaches. Deepdub focuses on the audio, digitally redeploying the original actor’s voice off a machine translation, but leaving the video unchanged. Another firm, London-based Papercup, doubles down on this tack, using so-called synthetic voices. Flawless goes the other way, relying on live (and labor-intensive) voice actors but editing on-screen lips and faces so they look like they’re actually speaking the language. All three companies bring humans into the process at various points for quality control.
(There is some research from tech giants like Amazon, but no commercial product yet. Several other firms, such as the video-focused Synthesia and the voice-centered Respeecher, are also working on related technologies. Amazon founder Jeff Bezos owns The Washington Post.) Though all services use some form of manipulation, most say they’re not engaging in deepfakes, wary of either making their material vulnerable to political manipulation or, at least, of the controversy faced by a recent CNN documentary about the late Anthony Bourdain that used his AI voice. Of course, one man’s deepfake is another man’s digital enhancement. Venture capital firms have started to bet on auto-dubbing. Papercup’s executives said that in December they raised $10.5 million from a group of investors that included Arlington, Va.’s Sands Capital Ventures and media companies like The Guardian on top of several million raised previously. Flawless recently concluded its undisclosed Series A financing; deepdub is in the middle of such a round. It’s easy to understand their interest. Foreign-language content is a vast unmined frontier for Hollywood. Netflix’s “Squid Game” has become the service’s No. 1 show in many countries, including the United States. If the Korean survival drama can do this largely with subtitles, the auto-dubbers say, imagine what would happen with on-demand dialogue? An endless parade of foreign-language Stateside smashes is not hard to conceive. (There are already some dubbed versions of “Squid Game,” but they’re...not so well-received.) As a kind of new spin on the Tower of Babel — everyone speaks different languages but still understands each other — auto-dubbing also means non-English speakers won’t need to learn English to grasp the dialogue in a Hollywood movie. (Subtitles, presumably, would fade away.) But these rich possibilities also come with concerns. A world without foreign dialogue in its entertainment means a world in which millions of viewers, in both the U.S. and abroad, may never be exposed to a language outside their own.
“If everything you listen to is dubbed, you lose all the phonetics, all the information, all the empathy,” said Siva Reddy, an assistant professor of linguistics and computer science at McGill University. “You can have a monolithic way of looking at everyone.” The idea of quotable dialogue could also be thrown into question. Some of history’s most famous movie lines developed because they were heard, and eventually endlessly repeated, in that singular way. A perfect piece of on-screen speech that hits in 30 different languages from the outset may never turn into a classic. Papercup co-founder Jesse Shemen, whose company has worked with Discovery and Sky, says that the benefits would outweigh these worries. “I believe hearing thoughts and philosophies that we never would have heard is far more accretive and additive than being limited by your own language,” he said, citing everything from a financial expert in Nigeria to an NBA commentator in South America. Watching auto-dubbing can be unsettling. It’s almost like the little rips and wrinkles of normal dubbing serve the purpose of reminding us that a movie came from somewhere else. Slightly jarring, for instance, is seeing Jack Nicholson erupt on the witness stand in a “A Few Good Men” while his mouth enunciates perfect French.
Not that all the technical air bubbles have been smoothed out. Getting digital voices to sound human, with the many required lilts and inflections, is something that AI can still struggle with. And automatic translations tend to be literal, missing key context. “We have to be honest about where the tech is,” Shemen noted. “Matching the performance level of humans is not a simple task. And forget emulating a high-quality voice artist.”
Others in the auto-dubbing movement, however, say that this could lead to new opportunities, like synthetic voices with signature styles. “We don’t have to replicate what Hollywood has done until now,” said Flawless co-founder Mann. “I think we can create a lot of new rules for what foreign movies will sound like,” he added, of course alluding to Doc Brown’s iconic line to Marty McFly that “where we’re going, nous n’avons pas besoin de routes.”