We introduce a simple and efficient text-controllable high-fidelity music generation and editing model. It operates on sequences of continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec that eliminates the information loss drawback of discrete representations. Based on a diffusion transformer architecture trained on a flow-matching objective the model can gener- ate and edit diverse high quality stereo samples of variable duration, with simple text descriptions. We also explore a new regularized latent inversion method for zero-shot test-time text-guided editing and demonstrate its superior performance over naive denoising diffusion implicit model (DDIM) inversion for variety of music editing prompts. Evaluations are conducted on both objective and subjective metrics and demonstrate that the proposed model is not only competitive to the eval- uated baselines on a standard text-to-music benchmark - quality and efficiency-wise - but also outperforms previous state of the art for music editing when combined with our proposed latent inversion.
In the following, we present music samples edited by MelodyFlow, MusicGen-melody, and AudioLDM2 with DDPM inversion.
Source description | Editing prompt | Source sample | MelodyFlow (our inv.) | MelodyFlow (DDIM inv.) | MusicGen-melody | AudioLDM2 (DDPM inv.) |
Reflective, nostalgic, synth bass electronic alternative track with vibrant, energetic tone. | Reflective, nostalgic, oud bass Middle Eastern alternative track with vibrant, energetic tone. | |||||
Alternative rock song with an upbeat dance beat. | Fun kids' tune with an upbeat dance beat. | |||||
A soulful genre bending Latin inspired pop track with good vibes. | A high-energy genre bending Latin inspired rock anthem with good vibes. | |||||
Up-tempo rock song with sparkling guitars, punchy bass, grooving rhythm. About feel-ing young and reckless, in search of the feeling of being invincible with no reservations. | Epic cinematic score with soaring orchestral strings, pulsing percussion, and a driving rhythm. About feeling young and reckless, in search of the feeling of being invincible with no reservations. | |||||
A happy, snappy light rock tune with lead melody on gently played electric guitar, builds to an epic break, than settles back into happy snappy :) | A happy, snappy synth-pop tune with lead melody on pulsing synthesizer, builds to an epic drop, than settles back into happy snappy :) | |||||
An uptempo country rock song with lead female vocals, guitars, upbeat drums and a sassy, fun tone. | An uptempo Afrobeat song with lead female vocals, guitars, upbeat djembe drums and a sassy, fun tone. | |||||
Hipster mid-tempo rock. Organic instruments and tones; emotion and purpose. Moods: Uplifting, hopeful, longing. | Acoustic-driven folk ballad. Organic instruments and tones; emotion and purpose. Moods: Uplifting, hopeful, longing. | |||||
Stripped-down golden era hip hop with a heavy bounce and a strong message about personal freedom. | Stripped-down classical tabla rhythms with a heavy bounce and a strong message about personal freedom, evoking the spirit of Indian independence. | |||||
Indie rock, OR chill with energy :) Primarily synth sound, with electric guitar, vocal accents. Moods: Energetic, upbeat, atmospheric post-rock undertones. Key words: Indie rock, synth, electronic, guitar. | Reggae vibes, OR chill with energy :) Primarily organ sound, with wah-wah guitar, vocal accents. Moods: Energetic, upbeat, atmospheric one-drop undertones. Key words: Reggae, roots, electronic, guitar. |
In the following, we present music samples generated by MelodyFlow, Stable Audio, MusicGen and AudioLDM2.
Text description | MelodyFlow | Stable Audio | MusicGen | AudioLDM2 |
This song comes from a music box. The backing melody is repetitive which is played in a set of three notes followed by a set of two notes. The main melody is high pitched and sounds like bells. There are no other instruments in this song. | ||||
The instrumental music features a piano playing a romantic song. A group of strings accompany the pianist with warm harmonies. The overall atmosphere is romantic and touching. | ||||
This song is a percussion instrumental. The tempo is fast with intense and rapid drum rhythm along with sound rhythmic clashing of cymbals with sound of clapping in the background and a man grunting. The music is animated,vigorous ,energetic, and enthusiastic. | ||||
This song is an instrumental. The tempo is slow with vigorous drumming, keyboard and electric guitar harmony,beatboxing and the sound of a car horn .The song is vibrant, punchy, vigorous and upbeat with a dance groove. This song is a Pop Hit. | ||||
This is a recording of two didgeridoos. They are playing low notes and create a very low vibrational tone. There is a melody, but it is not easily recognizable. | ||||
This is a contemporary classical music performance. The piece is being played on the grand piano with an accentuated playing style. There is a lot of emphasis on the notes. The atmosphere is dramatic. Parts of this piece could be included in the soundtrack of a documentary. It could also be used in the soundtrack of a mystery/horror video game. | ||||
This instrumental is a Heavy Metal instrumental. The tempo is fast with hard hitting drumming, furious and vigorous amplified keyboard playing a harmony, electric bass guitar and electric guitar accompaniment. The music is intense, grim, compelling, passionate, powerful and harmonious. The vibe of the music is serious, sinister, grim and steely. This is used in Hard Rock/Heavy Metal. | ||||
This is an 80s electronic music piece. The rhythmic background consists of a disco electronic drum beat with frequent tom fills. There is a keyboard playing the main tune while a bass and an evolving synth are in the background. The atmosphere of this piece is groovy. This piece could be used at retro-themed nightclubs and parties. It could also be used in the soundtrack of an 80s movie or a TV show. | ||||
This music is a flamenco piece. There are two acoustic guitars, one playing the leading arpeggio melody and the other creating the rhythmic background by strumming the chords. The characteristics of the song makes it clear that it is influenced by Spanish music. It could be used in a thriller movie soundtrack or as a flamenco dance accompaniment piece for dancing courses. |
MelodyFlow can perform full-song edits with a sliding window.
Text description | MelodyFlow |
Original Song (no description provided). | |
A meditative and relaxing ambience featuring a piano melody along with a cello. The instrumental song features a meditative and calming vibes. The track would be suitable as music for relaxation at a spa. | |
A light warmup song featuring strings such as violin and cello, a synth and a bass guitar. It's the perfect kind of music to get ready for a sports session. The instrumental song features a blend of pop, dance, and hip hop elements. | |
The song is an upbeat and energetic dance track that features strings such as violin and cello, a synth and a bass guitar. The instrumental song features a blend of pop, dance, and hip hop elements. Overall, the song is an uplifting and catchy party anthem. | |
The song is an upbeat and energetic dance track that features a catchy melody. The instrumental song features a blend of pop, dance, and hip hop elements. Overall, the song is an uplifting and catchy party anthem. |