In the same amount of time it would take to toast a slice of bread, you could clone the voice of President Joe Biden and share it on social media. You could have him mutter in his slow and gravelly voice: “I’ve always known Covid-19 was a hoax, it’s just useful to pretend it’s real,” then superimpose the audio on a photo of the president grinning, upload it to TikTok, YouTube and Facebook, and wait.
Then a funny thing would happen. The first two sites would take your clip down. But the biggest platform – the one with three billion users – would not. Facebook would slap a warning label on the clip but leave it up for people to click through, listen to and share with others. That antiquated policy could prove disastrous in a divisive election year.
Several examples already show how possible that scenario is. In September last year, faked audio of a Slovak political leader “discussing” ways to buy votes was shared on Facebook within days of a closely fought national election. Parent company Meta Platforms Inc doesn’t ban fake audio clips in the same way it takes down fake videos, so Facebook let it remain with a label saying it had been manipulated. Two days later, that same party leader lost the election. It’s impossible to know if that swayed votes, but the country also had a 48-hour media blackout before the election, which meant there was no one else around to debunk the forgery.
In the world of misinformation, fake audio can have a more sinister effect than video. While fake “photos” of Donald Trump have a glossy, plastic look that belies the AI machinery behind them, fake versions of his voice are harder to scrutinise and distinguish. And AI-generated voices can sound hyper-realistic now thanks to a passel of new tools originally designed to help podcasters and marketers.
Companies like Eleven Labs, Voice AI and Respeecher sell services that can synthesise voices of actors so they can, for instance, read audio books in different languages, and some only require a couple of minutes of a voice recording to clone it. Voice AI startups raised about US$1.6bil (RM7.44bil) from venture capital investors in 2023, according to market-research firm Pitchbook. (Overall investment growth in these companies has gone down in the last two years, though, in part because because larger companies like Amazon.com Inc and OpenAI are taking more business.)
Some companies like Respeecher have features in place to prevent misuse, or they require permission from people to have their voice cloned. But that doesn’t stop others from exploiting them anyway. Someone recently cloned the voice of the Mayor of London, for instance, and posted the faked audio clip to TikTok. In it, the voice of Sadiq Khan could be heard saying that Armistice Day should cancelled in favour of a protest to support Palestinians. “Why don’t they have Remembrance Weekend next weekend?” his voice says. The audio caused outrage among Brits who believed the country’s veteran’s day celebrations should be respected, but Khan’s office said that the clip was being “circulated and amplified by a far-right group.” To their likely dismay, it was reposted on Facebook and remains on the site, in at least one case without a warning label.
Another person generated a fake clip of UK Labour Party leader Keir Starmer supposedly calling one of his team members a “bloody moron,” while a second forged clip had Starmer saying that he “hated Liverpool.” The posts were seen thousands of times on TikTok before being taken down. A rival Conservative politician encouraged the public to “ignore it.”
TikTok removed the London mayor’s clip, and a spokeswoman said similar deceptive audio involving politicians would normally be taken down as it violates policy. YouTube also removed postings of the faked mayor’s voice; a spokeswoman said the site takes off “technically manipulated” content that could cause harm. Twitter has a similar rule, though doesn’t seem to enforce it (it has kept the Mayor’s forgery up, for instance).
But the stakes are higher with Facebook given that it has eight times more monthly active users than Twitter, which makes its leniency toward forged audio all the more bizarre.
A spokesman for Facebook said it labeled and left fake audio of politicians “so people have accurate information when they encounter similar content across the Internet.” It’s better to leave a clip up with a warning label, Facebook argues, so that when people see it on other sites like Twitter or Telegram, they’ll be educated on its inauthenticity.
But Facebook relies on stretched teams of fact checkers to do such labeling. “These things are spreading in real time over the Internet,” says Steve Nowottny, editor of the independent fact-checking charity Full Fact, which worked with Facebook to debunk the Khan and Starmer audio clips. It took them two days to check the Labour Party leader’s clip, he says.
One problem is there are still no reliable technical tools for detecting fake AI audio, so Full Fact uses old-fashioned investigative techniques, and its fact-checking team is made up of just 13 people. More broadly, there’s been a decline in the number of people at social media companies working on misinformation too. Alphabet Inc, Meta and Twitter have all pared back their trust and safety teams in the last two years to cut costs, and Meta also recently shuttered a project to build a fact-checking tool, according to CNBC.
“I talked to a large group of fact-checkers and journalists from across Asia in November, and almost everyone was seeing manipulated audio and wasn’t sure how to detect it,” says Sam Gregory, executive director of Witness, a human rights group focused on technology.
Even labeled misinformation can spread rapidly before the warning is properly understood. In moments of fast-paced information sharing, when emotions are running high, not every Facebook or Instagram user will fully comprehend the meaning of a label, or believe it.
Facebook’s policy of only taking down faked videos is out of date. As we head into what could be tumultuous national elections in the UK, India, the US and elsewhere, made all the more messy by AI tools generating all kinds of media and information, the platform should start taking down deceptive audio too. – Bloomberg Opinion/Tribune News Service