Meta has announced a new tool that can generate high-quality, realistic audio from text prompts called AudioCraft.
The AudioCraft program features three AI tools called MusicGen, AudioGen, and EnCodec to build its prompts from scratch. The MusicGen model was trained exclusively on Meta-owned and specifically licensed music, while AudioGen was trained on public sound effects. The third component, the EnCodec decoder allows high-quality music generation with fewer artifacts. Meta says it is releasing its pre-trained AudioGen models, while sharing all of the AudioCraft model weights and code.
The AudioCraft family of AI models can produce high-quality audio with long-term consistency—something current music AI generation models lack. Meta says it has simplified the overall design of generative models for audio compared to prior work in the field, while giving people the full recipe for people to play with its existing models.
Meta says generating high-fidelity audio of any kind requires modeling complex signals and patterns at varying scales. “Music is arguably the most challenging type of audio to generate as it’s composed of local and long-range patterns, from a suite of notes to a global musical structure with multiple instruments,” the blog notes.
AudioCraft works for music, sound, compression, and generation—all in the same place. The aim was to build a tool that is easy to reuse, so people who want to build better sound generators or algorithms can do so.
“We see the AudioCraft family of models as tools for musicians and sound designers to provide inspiration, help people quickly brainstorm and iterate on their compositions in new ways,” the Meta blog shares. “We can’t wait to see what people create with AudioCraft.”
“Having a solid open source foundation will foster innovation and complement the way we produce and listen to audio and music in the future. With even more controls, we think MusicGen can turn into a new type of instrument—just like synthesizers when they first appeared.”