Stability AI has introduced an improved version of its music generation platform, Stable Audio 2.0. This system allows users to create up to three minutes of audio from a text request, which is roughly the length of a full song. The instrument is capable of generating an intro, a full chord progression and an outro.
Three minutes is a good improvement, considering that the previous version was limited to 90 seconds. Another plus is that the tool is free and publicly available through the company’s website
Working with the generator is carried out using text queries, but it is possible to upload your own audio clip. The system will analyze it and create something similar. All downloaded pieces must be copyright free, so this is not intended to imitate pre-existing tracks. Rather, it might come in handy for, say, humming a drum line or turning a 20-second snippet into something longer.
It’s worth considering that this is still AI-generated music, so the result is still not at human level. But there is still progress.
Created this with the new Stable Audio 2.0 from @StabilityAI! pic.twitter.com/kmN0eubJSK
— Chris McKay (@cmcky) April 3, 2024
One of the problems with the generator is that Stable Audio 2.0 likes to add strange vocals. Sometimes the vocals sound like real people, and at other times they sound like Gregorian chants. In short, it’s the uncanny valley of audio. Some call this music “soulless and strange”, comparing it to the sounds of whales.
Stable Audio 2.0 makes the same weird mistakes as all similar systems, regardless of the type of result. Parts may go missing and be replaced by something else. Sometimes melodic elements are doubled.
The main problem is that these results are not catchy, like the music that people create. Especially if we talk about tracks that have meaning, and not just a set of sounds. However, this is enough for experiments.