San Francisco research lab OpenAI has introduced Jukebox, a neural net that synthesizes music in a variety of artists’ styles, including rudimentary singing.
They’ve trained the AI with a wide range of artists’ music, including Elvis, Frank Sinatra, Katy Perry and others. The AI directly synthesizes audio, with instruments and faux-vocals.
The results are terrible and awesome, sounding like music from another universe, captured via a vintage tube radio.
Some of the tracks seem like plausible bad songwriting, like faux-Katy Perry singing:
“I count every moment, every hour since I said goodbye.
I count every minute every hour, since your lips were touching mine.
I count every minute, every hour, hoping I’m the one you want.”
Pop, in the style of Katy Perry:
Other tracks seem like something out of a David Lynch dream sequence, like a drunken Frank Sinatra singing “It’s Christmas time, and you know what that means. It’s hot tub time!”
All this needs to be a warped holiday classic is a deep-fake video of Sinatra crooning “It’s hot tub time!” with a martini in his hand.
In addition to synthesizing tracks with original lyrics, the AI can create disturbing new music based on classic lyrics.
The results are nowhere near achieving uncanny-valley near-realism. But listening to ‘faux Katy Perry’ or ‘drunk Frank Sinatra’, it’s clear that the AI is capturing some characteristic aspects of their performance styles.
In addition to doing imitative synthesis of artists’ performances, Jukebox also features recordings that take the AI’s training and apply it in unusual ways.
For example, you may know Marni Nixon’s version of I Whistle a Happy Tune, a feel-good song from the Rodgers & Hammerstein musical The King & I.
The upbeat lyrics sound completely different, though, when paired with AI-generated music and vocals.
Here’s faux drunk Sinatra, with his take on I Whistle a Happy Tune:
The researchers are also exploring creating tracks that combine models, so 0.25 Ella Fitzgerald + 0.75 Frank Sinatra can sing an original song, based on the lyrics to La La Land’s City Of Stars:
Jukebox is a bizarre rabbit hole to fall down – but also a preview of the insane possibilities of a new type of synthesis.
Check out the examples and share your thoughts on AI song synthesis in the comments!
LOL. Brilliantly incoherent!
Frank Sinatra! Elvis! Katy Perry!?
i suspect they are feeding it the words, at the very least….
exciting and frightening at the same time
My brain secretes both pleasure and fear chemicals in response to this. And I like it.
“My brain secretes both pleasure and fear chemicals in response to this. And I like it.”
That’s it in a nutshell!
Laughing in terror. A hitchhiker’s reaction. Put on your peril sensetive sunglasses.
“To train this model, we crawled the web to curate a new dataset of 1.2 million songs…”
OK, but did they have to include the warped records from the back seat of the car??
There is nothing truly innovative here. Pop music has been being “composed” by algorithmic processes for many years now. That is why it is all so hollow, empty, meaningless, soulless and just plain CRAPY!
creepy, music for zombies
This just in: Pop music so formulaic and repetitive it can be made by math equations! lol.
Tell that to the neural network inside your own skull.
Yea, it ain’t just pop music. It’s ALL formulaic. And, one could argue, it is ALL math– humans have the capacity for much complexity, assimilation, derivation, and allowing of happy accidents.
Human intelligence is analog, and it has a great warm sound. AI is digital (follows some hidden rules, and observations), and it has quite a bit of variety in terms of its quirky stupidity.
unfortunately music is mathematical: every modular synthesiser is an analogue computer with uncanny resemblance to patch-programmed computers from 50s and 60s, and every string instrument is basically a pythagorean ratio calculator. and music is repetitive on every level, from oscillations of the string to rhythm to song structure to cultural baggage. unlike modern narrative and visual arts music will never work without some kind of repetition, quotation and feeding of the baggage of past generations. total unrepetitiveness and newness ad absurdum is just white noise, which, in turn, is so predictable, that our brain filters it out given some time to adapt.
this is why copyright is fundamentally flawed, and britney spearce is the greatest auteur of the 20th century.
I didn’t think they would create a new tool for synthwave so soon, but here it is.
Great for a warped video game soundtrack a la Fallout.
Does anyone remember Clubbo Records from the early 2000’s? Their Lazaris Project has finally been realized!
https://www.clubbo.com/2004-lazarus-project/