Artificial IntelligenceNews/PR

Meta introduces ‘AudioCraft’: An open-source AI tool for text-to-music generation

2 Mins read
Meta AudioCraft

In what can be considered a big leap for music lovers and creators, Meta, the company behind popular platforms like Facebook, has just unveiled its latest innovation in the realm of artificial intelligence: an open-source AI music generation tool called ‘AudioCraft’. This groundbreaking tool aims to revolutionize music composition and audio creation by allowing users to generate high-quality, realistic audio and music from simple text prompts.

AudioCraft promises to enable musicians to compose new tunes without needing to play a single note on an instrument. Small business owners can effortlessly add a captivating soundtrack to their latest Instagram video advertisement. AudioCraft leverages the power of AI to transform creative endeavors.

At the core of AudioCraft are three distinct models: MusicGen, AudioGen, and EnCodec. MusicGen, meticulously trained on Meta-owned and specially licensed music, excels in generating music from text inputs. AudioGen is trained on publicly available sound effects and creates audio from textual prompts. Meta is releasing the enhanced version of the EnCodec decoder, which possesses improved music generation quality with fewer artifacts. Additionally, Meta is releasing pre-trained AudioGen models that enable the generation of ambient sounds and various sound effects.

The AudioCraft suite of models demonstrates remarkable capabilities in consistently producing top-tier audio quality over extended durations. The user-friendly nature of AudioCraft simplifies the design of generative audio models, setting a new benchmark in the field.

AudioCraft isn’t limited to just music and sound generation; it encompasses a broader spectrum, including sound compression and generation, all within a unified platform. This release provides users with access to Meta’s years of research and development, encouraging them to explore the boundaries and even develop their models.

Simplifying text-to-audio generation through innovative techniques

Creating audio directly from raw audio signals is a formidable challenge due to the need to model exceedingly long sequences. To tackle this complexity, AudioCraft makes use of the EnCodec neural audio codec, which learns discrete audio tokens from the raw signal. This novel approach establishes a fixed “vocabulary” for music samples. Autoregressive language models are then trained on these discrete audio tokens, facilitating the generation of new tokens and subsequently, new sounds and music.

Through rigorous training, AI models embedded within AudioCraft master the art of text-to-audio generation. With a textual description of an acoustic scene, AudioGen seamlessly generates corresponding environmental sounds, replicating the intricate context and realistic recording conditions.

Designed exclusively for crafting music, MusicGen stands as a specialized audio generation model. Music compositions present greater intricacies compared to environmental sounds, necessitating a focus on creating seamless samples that align with long-term musical structures. In its training, MusicGen engaged with a dataset encompassing approximately 400,000 recordings, complete with accompanying textual descriptions and metadata.

As a part of its responsible AI practices, Meta is extending access to these models to the research community in various sizes. The release also includes model cards detailing the construction and development processes of AudioGen and MusicGen, showcasing Meta’s commitment to ethical and responsible AI innovation.

CM3leon for text-to-image generation

Recently, Meta unveiled the CM3leon, an advanced generative AI model that comprises text-to-image and image-to-text generation functionalities. CM3leon is a causal masked mixed-modal (CM3) model, that has a unique prowess in generating both text and images while being conditioned on existing image and text content. Its training journey comprises two pivotal phases: an initial retrieval-augmented pre-training stage followed by a multitask supervised fine-tuning (SFT) process.

Know more about CM3leon here.

As the world embraces the harmonious fusion of technology and creativity, AudioCraft and CM3leon emerge as a beacon of possibility, enabling individuals to transcend traditional artistic limitations.

Read next: Microsoft and Salesforce propel innovation in low-code platforms through generative AI investments – GlobalData

Leave a Reply

Your email address will not be published. Required fields are marked *

× 3 = 27