This AI can spoof your voice after just three seconds

Share

Artificial intelligence (AI) is having a moment right now, and the wind continues to blow in its sails with the news that Microsoft is working on an AI that can imitate anyone’s voice after being fed a short three-second sample.

The new tool, dubbed VALL-E, has been trained on roughly 60,000 hours of voice data in the English language, which Microsoft says is “hundreds of times larger than existing systems”. Using that knowledge, its creators claim it only needs a small smattering of vocal input to understand how to replicate a user’s voice.

Fizkes/Shutterstock

More impressive, VALL-E can reproduce the emotions, vocal tones, and acoustic environment found in each sample, something other voice AI programs have struggled with. That gives it a more realistic aura and brings its results closer to something that could pass as genuine human speech.

This AI cloned my voice using just three minutes of audio
5 amazing things people have already done with ChatGPT
ChatGPT: how to use the viral AI chatbot that took the world by storm

When compared to other text-to-speech (TTS) competitors, Microsoft says VALL-E “significantly outperforms the state-of-the-art zero-shot TTS system in terms of speech naturalness and speaker similarity.” In other words, VALL-E sounds much more like real humans than rival AIs that encounter audio inputs that they have not been trained on.

On GitHub, Microsoft has created a small library of samples created using VALL-E. The results are mostly very impressive, with many samples that reproduce the lilt and accent of the speakers’ voices. Some of the examples are less convincing, indicating VALL-E is probably not a finished product, but overall the output is convincing.

Huge potential — and risks

Microsoft/Unsplash

In a paper introducing VALL-E, Microsoft explains that VALL-E “may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker.” Such a capable tool for generating realistic-sounding speech raises the specter of ever-more convincing deepfakes, which could be used to mimic anything from a former romantic partner to a prominent international personality.

To mitigate that threat, Microsoft says “it is possible to build a detection model to discriminate whether an audio clip was synthesized by VALL-E.” The company says it will also use its own AI principles when developing its work. Those principles cover areas such as fairness, safety, privacy, and accountability.

VALL-E is just the latest example of Microsoft’s experimentation with AI. Recently, the company has been working on integrating ChatGPT into Bing, using AI to recap your Teams meetings, and grafting advanced tools into apps like Outlook, Word, and PowerPoint. And according to Semafor, Microsoft is looking to invest $10 billion into ChatGPT maker OpenAI, a company it has already plowed significant funds into.

Despite the apparent risks, tools like VALL-E could be especially useful in medicine, for instance, to help people to regain their voice after an accident. Being able to replicate speech with such a small input set could be immensely promising in these situations, provided it is done right. But with all the money being spent on AI — both by Microsoft and others — it’s clear it’s not going away any time soon.

Today’s tech news, curated and condensed for your inbox

Subscribe

Check your inbox!

Please provide a valid email address to continue.

This email address is currently on file. If you are not receiving newsletters, please check your spam folder.

Sorry, an error occurred during subscription. Please try again later.

Privacy Policy

Use a different email

News

Company:

Why Llama 3 is changing everything in the world of AI

This hidden Visible deal could drop your phone bill to ONLY $15 per month — just use this promo code

Exclusive: Meta’s upcoming glasses are the OMG moment that AR needs

Reddit appears to be down for many users, but a fix is coming

Apple Card Promo Offers 10% Daily Cash at Nike

I reviewed the Samsung Galaxy A55. It didn’t go as expected

The Insta360 X4 360-degree action camera is pocket-sized perfection for vloggers

Withings ScanWatch 2 review: Should you buy it?

A content creation laptop for $1,000 isn’t impossible after all

Spigen Liquid Air Samsung Galaxy S24 case review: Should you buy it?

I’ve worn two of the best smart rings. Here’s which one you should buy

I did a camera test with two $1,800 phones. Then something annoying happened

Google Pixel 7a vs. Pixel 7: don’t buy the wrong Pixel

This is the most unusual Galaxy S23 Ultra camera test I’ve ever done

I tested the Galaxy S23 Ultra and iPhone 14 Pro cameras. Only one is a winner

How to easily connect any laptop to a TV

How to Make Your iPhone Screen Less Bright in Bed

How to choose the best RAM for your PC in 2024

How to view Instagram without an account

How to download a video from Facebook

8 iPhone browser apps you should use instead of Safari

Are Facebook and Instagram still down? Here’s what we know

Are Facebook and Instagram still down? Here’s what we know

The 1Password Android app just got a huge upgrade

I never knew I needed this mini Mac app, but now I can’t live without it

This AI can spoof your voice after just three seconds

Huge potential — and risks

Table of contents

Why Llama 3 is changing everything in the world of AI

This hidden Visible deal could drop your phone bill to ONLY $15 per month — just use this promo code

Exclusive: Meta’s upcoming glasses are the OMG moment that AR needs

Reddit appears to be down for many users, but a fix is coming

Apple Card Promo Offers 10% Daily Cash at Nike

More News

Why Llama 3 is changing everything in the world of AI

This hidden Visible deal could drop your phone bill to ONLY $15 per month — just use this promo code

Exclusive: Meta’s upcoming glasses are the OMG moment that AR needs

Reddit appears to be down for many users, but a fix is coming