Putting words in your mouth – Photoshop for sound

Say what? New Adobe software changes what you say. . .

On the 2nd November at the Adobe MAX conference, Adobe developer Zeyu Jin demonstrated a new piece of software that, using an audio clip, replicated his own voice almost exactly. The multinational company, working in collaboration with Princeton University, named their new venture Project VoCo. As well as constructing speech, the application also includes standard audio editing abilities like voice editing and noise cancellation. According to Jin, Adobe developers have created a ‘Photoshop for audio’ that will change the way that engineers work with sound. What are the implications of creating words that nobody ever said?

As creepy as the creation of unspoken words might sound, Adobe have stated a number of positive applications for their new project. These include cleaning up recordings and podcasts, correcting information and adding clarification to narration. You can see how this might be a valuable tool for production teams in the media, allowing them to improve the quality of the info that is communicated to the public. So there are some benevolent uses for Project VoCo. However, it’s also easy to see why this sort of technology makes people uneasy. If somebody can get hold of your voice, they can make you say anything they like. With the recent phone hacking scandal, this doesn’t seem to be much of a challenge. As far as mass adoption goes, it’s uncertain as to when – or even if – Project VoCo will enter the consumer market. If it does, it’s likely to be in a much simpler format, probably in the form of a smartphone app, and would theoretically be used for harmless entertainment. This doesn’t mean that it’s not worth exercising caution. Ethically, it’s wrong to steal someone’s identity, and our voices particularly in relation to public statements are very much part of that. Any company or app that presents these ethical challenges is going to have a hard time getting people to accept their product. It’s not clear if Adobe want to use this as a consumer product but if they do, there will be massive consequences.

What disruption will audio reconstruction cause?
At the moment, Adobe’s algorithm needs a recording of 20 minutes in order to accurately reconstruct someone’s voice. This means that anybody who speaks publicly for an extended period of time is at risk of having their words twisted. Imagine using this tool in the political sphere to claim that a politician said something that they didn’t. Public figures might even claim that their speech was edited (when it wasn’t) to avoid accountability. The software will undoubtedly bring disruption to the entertainment industry, specifically to production teams. Producers and editors would have far more power to check and edit content, changing speech almost in real time. . . Say goodbye to amusing, accidental swearwords aired before the watershed. There would also be less pressure on presenters and commentators, as any mistakes could be cleared up by the software. Although the programme looks like an asset to media companies, it’s not such a positive story for other sectors. Take banking, for example. A number of banks now use ‘voiceprint checks’ to verify the identity of customers over the phone. Imagine if somebody could replicate your voice and then access your savings via a phone call. That’s frightening, to say the least. If Project VoCo goes mainstream, voice ID may become obsolete. On top of this, there’s an argument that it could undermine trust in journalism further because journalists rely on digital media to construct narratives. It would also massively complicate legal trials that use audio clips as evidence. Basically, Project VoCo has the potential to disrupt any sector that relies, at least in part, on sound, and that’s, well, all of them.

The effect on business. . .
Businesses working within the media are likely to want to get their hands on Project VoCo, or software like it. This may lead to the imposition of regulations to make sure that audio editing is ethical and minimal. Speech reconstruction could be used against companies by their competitors to damage the victim’s image, or by cybercriminals as blackmail. This could happen to individuals, but big companies (particularly their CEOs) are an easier and more profitable target. Of course, it’s not all about Project VoCo. . . Other companies will want to offer rival software, which will open up a new market full of fresh competition and no doubt we’ll see Freeware and open source versions. The more companies that begin to develop voice creation applications, the harder it will be to identify an original clip.

Whilst there are some serious security concerns associated with voice replication, as long as regulations are put in place, the ability to create speech isn’t necessarily a bad thing. Presumably, the quality of the audio clips we hear will be improved and a higher quality of information will be communicated. If Project VoCo is readily adopted, it will be increasingly difficult to tell the difference between a genuine recording and one that’s been edited. Adobe’s project has been compared to Photoshop, which fundamentally changed advertising. Ironically, the wider adoption of image editing led consumers to scoff at ad campaigns, dismissing them as fake. Perhaps, the same thing will happen with VoCo, and recordings will be taken with a pinch of salt. However, it will be a completely different story when the application is able to string entire sentences together. For now, Adobe will have to work to guarantee security at a time where protecting digital information is paramount.

Could your business be enhanced by voice replication software? Could it also be challenged by it? Do you think that applications such as Project VoCo will be widely adopted? Comment below with your thoughts.