Adobe’s Project VoCo Lets You Edit Spoken Audio Like Text

At the Adobe MAX 2016 conference – an event for users of the company’s creative products – Adobe’s Zeyu Jin introduced VoCo – a new application that lets you edit spoken word audio as text.

VoCo works for audio editing, letting you cut and past text to edit it as needed. But it can also be used as a creative tool. Once VoCo has analyzed about 20 minutes of a person’s speech, it can be used to synthesize the audio for new text.

24 thoughts on “Adobe’s Project VoCo Lets You Edit Spoken Audio Like Text”

UglyKidMoe

November 7, 2016 at 11:18 am Reply

Creepy.
Goudron

November 7, 2016 at 11:32 am Reply

The next election cycle will be full of bogus audio clips of candidates saying things that would disqualify them from getting elected. Ha ha, I just realized how ridiculous that premise sounds now. But I won’t be surprised if or when edited sound bites are commonly used to bring someone down a few pegs.
1. Joe
  
  November 9, 2016 at 1:49 am Reply
  
  What are you talking about? Trump is already saying the most outrageous things – and he’s about to be president.
2. whormongr
  
  November 14, 2016 at 1:14 pm Reply
  
  I actually have this record – so it has been done before https://www.youtube.com/watch?v=AG-f0BpG_cY
Oberlin

November 7, 2016 at 1:00 pm Reply

It’s remarkable the lengths some people will go to, just to avoid developing communication skills…
Darren Mittermeier

November 7, 2016 at 2:07 pm Reply

Wow…. this is really revolutionary
The Brain

November 7, 2016 at 3:41 pm Reply

I see nothing but dangerous applications for this. Yeah, he mentions watermarking audio. But bullshit — watermarks of audio would be dead easy to get past. Plus, even if someone didn’t have the savvy to do it, the amount of reputation damage someone could suffer from faked audio (even if proven to be fake later) could very quickly be irreparable.

This is a politician’s wet dream. It’s a backstabbing coworker/ex-spouse/insert-personal-enemy-here wet dream.

And on top of that, it’s another way to screw people out of paying work. There’s nothing cool about this other than the technological achievement.
1. SynthManDestiny
  
  November 8, 2016 at 11:47 am Reply
  
  And Photoshop and CGI can’t be used to defame and libel people? Once everyone knows about it they will just become more scrutinous.
herasdalizzard

November 7, 2016 at 3:41 pm Reply

Fuck me G_G
stub

November 7, 2016 at 4:11 pm Reply

Could be very useful for audio repair applications. However, as the Brain says, in the wrong hands, this could be abused in a way that is a little creepy. It’s already bad enough what is done with editing things people actually say– taking things out of context, etc. There are some shady folks out there who wouldn’t hesitate to use this in a bad way. But honestly, people could already edit things to change text. This just makes it easier and perhaps more undetectable.
Xtopher

November 7, 2016 at 8:40 pm Reply

Fairly impressive tech on the analysis and re-synthesis side, but the edit sounded really bad and hacky. No way that would pass professional standards for voice editing. Yet. Also, the synthesized voice part was clearly not the same speaker. Right now, pretty chunky, but give it a couple years.
max neutra

November 7, 2016 at 11:08 pm Reply

Yeah, spooky.
Matthew Stanbro

November 8, 2016 at 12:29 am Reply

Welp, there goes the credibility of audio recordings in the courtroom.
ja

November 8, 2016 at 12:33 am Reply

It might be spooky, but all new things are spooky in the beginning. Once there’s nothing you can trust then the laws will become obsolete.
JP

November 8, 2016 at 12:50 am Reply

Impressive but no way is this cool. As if we needed another tool to help undermine our trust in reality!
alacazam

November 8, 2016 at 2:10 am Reply

Edit was pretty glitchy and obvious to me. Also the demo seemed kind of juvenile. “Tee hee giggle giggle, so and so kissed so and so,” yikes, it reminded me of a goofy premise/plot from some bad sitcom lol. JMO of course.
Zhorro

November 8, 2016 at 4:32 am Reply

Good. but too late. Next election going to be hilarious. Already waiting for it.
Rob

November 8, 2016 at 7:52 am Reply

“[It] may or may not be released as a product or product feature,”

Some are concerned.

http://www.bbc.co.uk/news/technology-37899902
directionless

November 8, 2016 at 9:10 am Reply

It’s for the laymen. The edit is very amateur sounding. I guess I can see it being popular like a T-Pain voice changer app. But useless for anything real. What is weird is you would think the speech-to-text part of it would be harder to code. But their algorithm for a smooth edit is not there.
akiz

November 8, 2016 at 9:55 am Reply

I think that anybody speech can be changed right now (by cutting / switching words for example).
but thanks to the Voco maybe everybody will know about this posibility and people will be skeptic about it
it is like with photoshop. Faked photos had much higher impact about 25 years ago because people didn’t know how easy it is to do it…
And what about text? Everybody knows that it can be faked so everybody is skeptic about written text.

But it will take time.. on the other hand, direct human interaction will be the last thing that still cant be faked 🙂
Tank2000

November 8, 2016 at 11:49 am Reply

I remember that Was (Not Was) track featuring Ronald Reagan saying “Can we deny, the ship of state is out of control?” I guess they did that with tape splicing.
trash80

November 8, 2016 at 1:31 pm Reply

This in combination with video editing tied to facial recognition and pixel tracking could produce some rather interesting and disturbing results.
myself

November 8, 2016 at 3:57 pm Reply

Pretty sure they have been using this same technology for 20+ years to make Ozzy records.
free agent

November 9, 2016 at 3:44 am Reply

It might not be there yet but I can see lots of potential here.

I mean this technology could open so many doors in audio production. If this can recreate someone’s voice, why not recreate the voice of a Minimoog? Or a violin or maybe create reverbs and everything? I think this might be a whole new way to sample. Imagine a combination of VoCo and Melodyne. In a couple of years we will have virtual Elvisses and MJs and Beatles and all. There will be a major fight about copyrights I guess. I might be overreacting but I think this is a friggin revolution in audio.

Synthtopia

Adobe’s Project VoCo Lets You Edit Spoken Audio Like Text

24 thoughts on “Adobe’s Project VoCo Lets You Edit Spoken Audio Like Text”

Leave a Reply Cancel reply