Nearly a year ago, developers Seth Forsgren and Hayk Martiros released a hobby project called Riffusion that could generate music using not audio but images of audio. It sounds counterintuitive (no pun intended), but it worked — my colleague Devin Coldewey got the rundown here.
While their approach had its limitations, Riffusion netted Forsgren and Martiros a lot of attention — not exactly surprising given the curiosity (and controversy) surrounding AI-generated music tech. Millions of people tried Riffusion, according to Forsgren, and the platform was cited in research papers published out of Big Tech companies including Meta, Google and TikTok parent ByteDance.
Some of the attention came from investors as well, it seems.
This year, Forsgren and Martiros decided to commercialize Riffusion, which is now being advised by the musical duo The Chainsmokers and has closed a $4 million seed round led by Greycroft with participation from South Park Commons and Sky9.
Riffusion is also launching a new, free-to-use app — an improved version of last year’s Riffusion — that allows users to describe lyrics and a musical style to generate “riffs” that can be shared publicly or with friends.
“[The new Riffusion] empowers anyone to create original music via short, shareable audio clips,” Forsgren told TechCrunch in an email interview. “Users simply describe the lyrics and a musical style, and our model generates riffs complete with singing and custom artwork in a few seconds. From inspiring musicians, to wishing your mom ‘good morning!,’ riffs are a new form of expression and communication that dramatically reduce the barrier to music creation.”
Matiros and Forsgren met at Princeton while in undergrad, and have spent the last decade playing music together in an amateur band. Forsgren previously founded two venture-backed tech companies, Hardline and Yodel, while Matiros joined drone startup Skydio as one of its first employees.
Forsgren says that he and Matiros were inspired to scale Riffusion by the potential they see in generative AI tools to connect people through creativity.
“The pandemic gave us all a lot more time at home — and led me to learn to play the piano,” Forsgren said. “Music has a great power to connect us in times of isolation. Generative AI is a new and rapidly changing space, and Riffusion aims to harness this technology to deliver a fun new instrument — one that empowers everyone to actively create music throughout their lives.”
The upgraded Riffusion is powered by an audio model that the Riffusion team — which is six people strong, including Forsgren and Matiros — trained from scratch. Like the model behind the original Riffusion, the new model’s fine-tuned on spectrograms, or visual representations of audio that show the amplitude of different frequencies over time.
Forsgren and Martiros made spectrograms of music and tagged the resulting images with the relevant terms, like “blues guitar,” “jazz piano” and so on. Feeding the model this collection “taught” it what certain sounds “look like” and how it might re-create or combine them given a text prompt (e.g. “lo-fi beat for the holidays,” “mambo but from Kenya,” “a folksy blues song from the Mississippi Delta,” etc.).
“Users describe musical qualities through natural language or even recording their own voice, as a method of prompting the model to generate unique outputs,” Forsgren explained. “We think the product will empower music producers and audio engineers to explore new ideas and get inspiration in a totally new way.”
Here’s a sample made using Riffusion’s ability to record a voice with the prompt “punk rock anthem, male vocals, energetic guitar and drums”:
But what, you might ask, about the potential for copyright infringement?
Increasingly, homemade tracks that use generative AI to conjure familiar sounds that can be passed off as authentic, or at least close enough, have been going viral. Just last month, a Discord community dedicated to generative audio released an entire album using an AI-generated copy of Travis Scott’s voice — attracting the wrath of the label representing him.
Music labels have been quick to flag AI-generated tracks to streaming partners like Spotify and SoundCloud, citing intellectual property concerns — and they’ve generally been victorious. But there’s still a lack of clarity on whether “deepfake” music violates the copyright of artists, labels and other rights holders.
Forsgren was quick to note that the new and improved Riffusion wasn’t trained to recognize famous artist names or songs — and, he says, can’t replicate them.
“The product isn’t built to produce deepfakes and doesn’t recognize famous artist names in its prompts,” he said. “Instead, it lets users craft personal messages and catchy hooks using the app. It’s not uncommon to have a riff you create get stuck in your head and find yourself singing along to it all day.”
There’s no clear monetization strategy — yet. For now, Forsgren and Martiros say that they’re focusing on growing Riffusion’s team and developing complementary new generative AI products.
But Forsgren also hinted at working more closely with artists like The Chainsmokers to see how the tech could be used in their creative processes.
“It’s very early days for generative music. Models such as Google’s MusicLM, Facebook’s MusicGen, and Stability’s Stable Audio are exciting tools in the space,” Forsgren said. “But Riffusion stands out as one of the first to enable users to generate lyrics in their music via a fun and accessible website.”