ElevenLabs Music v2 adds genre switching and section-level regeneration
ElevenLabs has released Music v2, a new version of its AI music-generation model that can change genres inside a single track, regenerate selected song sections from prompts, and build longer compositions from parts such as intros, verses, and ch...
ElevenLabs Music v2 pushes AI songs toward editable productions, not disposable clips
ElevenLabs has released Music v2, a new version of its AI music-generation model that can change genres inside a single track, regenerate selected song sections from prompts, and build longer compositions from parts such as intros, verses, and choruses.
The useful part is control. AI music systems can already produce decent short clips. Production teams need revision loops, structure, localization, consistency, and rights clarity. Music v2 is ElevenLabs’ latest attempt to move generated music closer to an editable workflow.
The model is available through ElevenCreative, ElevenLabs’ tool for marketing and branding teams, and through ElevenMusic, its AI song-generation platform. API access through ElevenAPI is coming later.
Genre switching gets the demo, editing gets the work
The headline capability is genre switching. ElevenLabs says Music v2 can move from opera to heavy metal and back in the same track, handle fast rap without losing coherence, and add non-musical sound effects alongside vocals and instrumentation.
Those demos do test something real. A clean transition from operatic vocals to distorted guitars requires the model to preserve timing, phrasing, harmonic context, and some sense of song identity while changing timbre, rhythm, vocal style, arrangement density, and mix.
Still, most production use will probably come from the less flashy editing tools.
Music v2 lets users select a portion of a song and regenerate that section using prompts while leaving the rest intact. That’s much closer to how creative work happens. A client likes the chorus but hates the second verse. A game team needs the same cue with a softer intro. A localization team wants a Spanish vocal pass without rebuilding the whole track. Full-track regeneration wastes time when only eight bars are broken.
Section-level editing also makes AI music less brittle. Earlier generators often produced impressive first drafts but gave users little room to steer the result afterward. If the bridge went wrong, the fix was usually to reroll and hope. That’s a poor fit for production work.
Section-based songs fit real workflows better
ElevenLabs says Music v2 can generate songs section by section, including intro, verse, and chorus, then stitch those pieces together.
That matters because long-form audio generation tends to drift. Vocals change character. Percussion loses the groove. Melodies fail to resolve. The mix can start sounding like it came from several unrelated sessions. Breaking a song into structured segments gives the model and the user a better handle on musical form.
For developers and technical teams, section-based composition also maps better to software. A music tool can represent a generated track as a sequence of editable regions instead of one opaque audio file. That creates room for timeline editing, prompt history per section, versioning, and eventually controls for intensity, instrumentation, vocal presence, tempo, or language.
The model’s ability to add non-musical sound effects is worth watching too. For advertising, short-form video, podcasts, game prototypes, and interactive apps, the line between music bed and sound design is thin. A single system that can generate a branded jingle with a whoosh, crowd hit, or cinematic riser embedded in context could reduce asset handoffs.
It also creates moderation and licensing problems. Prompts can ask for sounds that resemble recognizable samples, brands, voices, or film scores. That won’t be a small edge case for commercial users.
Competition is tightening
ElevenLabs is no longer working in a quiet corner of AI audio.
Google recently introduced Lyria 3, its professional-grade music generation model, and has been folding music creation into tools like Flow Music, including cover generation, section editing, and music video creation. Stability AI released a new audio model capable of producing songs up to six minutes long. Suno has also been iterating on its music models, with longer and more complex tracks.
The direction is obvious: music generation is moving toward controllable media systems. Length, vocal quality, arrangement complexity, multilingual support, and editing are now competitive axes.
ElevenLabs comes at this from a different starting point than some rivals. Its core reputation is voice AI: synthetic speech, voice cloning, dubbing, and multilingual audio. That background matters because vocals are usually where AI music breaks first. Bad drums can be buried. Weird guitar tone can pass as taste. Garbled lyrics and unstable vocal identity are harder to ignore.
Music v2’s claimed improvements across languages, lyrics, vocals, and arrangements fit that lineage. If ElevenLabs can pair strong vocal modeling with structured music generation, it has a credible angle. The company doesn’t need to beat every music model on every instrumental benchmark. It needs to make vocal-heavy, commercially usable tracks that survive revision.
What to watch in the API
The upcoming ElevenAPI support may matter most for technical teams, even though it isn’t live yet.
A usable music API needs more than a prompt -> audio endpoint. Serious integrations need controls and metadata around structure, timing, outputs, and rights. The difference between a toy integration and a production feature often comes down to dull, necessary details.
When ElevenLabs exposes Music v2 through its API, developers should look for:
- Section addressing: Can developers regenerate only a selected time range or named region such as
chorus_2? - Determinism controls: Are seeds or version IDs available so teams can reproduce or slightly vary a result?
- Stem access: Can the system output vocals, drums, bass, and other stems separately, or only a final stereo mix?
- Lyric alignment: Does the API return timestamps for words, lines, or vocal phrases?
- Format support: WAV, MP3, stems, sample rates, loudness normalization, and export settings matter in real pipelines.
- Latency and queue behavior: Music generation is heavier than text or short speech synthesis. Teams need predictable job status, retries, and timeouts.
- Usage rights and audit trails: Commercial teams need records tying generated assets to prompts, model versions, licenses, and user accounts.
The rights piece needs special attention. ElevenLabs previously positioned its music generation as cleared for commercial use, but “commercially usable” does not mean risk-free. Enterprise buyers will ask how training data was sourced, what indemnity exists, how similarity detection works, and whether generated songs can accidentally resemble copyrighted material too closely. Music models sit in legally sensitive territory. The lawsuits and licensing fights around AI music are part of the product surface.
Performance and scale are not minor details
Music generation is computationally expensive, especially when a model handles vocals, arrangements, genre changes, and longer structures. Section editing can reduce waste for users, but it can make the backend harder to run.
If Music v2 supports regeneration inside an existing track, the system likely has to preserve context around the edit boundaries. Otherwise users get audible seams, mismatched ambience, tempo shifts, or vocal tone changes at transition points. The model or pipeline needs enough surrounding context to blend the regenerated section back into the original audio.
For API customers, that affects cost and latency. Regenerating a 10-second region may not cost the same as generating 10 seconds from scratch if the system must condition on neighboring audio, lyrics, and arrangement state. Batch workflows will need clear pricing and predictable throughput.
Scalability also depends on how ElevenLabs packages the feature. Marketing teams using a web app can tolerate a wait. A consumer video editor, game tool, or automated ad system may need lower latency and high concurrency. Developers building user-facing experiences around Music v2 will need queue APIs, webhooks, failure modes, and usage caps that don’t create a support mess.
Security is another practical issue. Prompt-based media systems can leak sensitive campaign plans, unreleased slogans, lyrics, or brand concepts if teams treat them casually. Enterprise deployments will want data retention controls, workspace isolation, permissions, and logging. For agencies, client separation matters. For media companies, unreleased creative assets are confidential material.
Section editing matters more than longer tracks
A lot of AI music announcements emphasize duration. Six-minute tracks. Full songs. Extended arrangements. Longer output is useful, but length alone doesn’t solve the production problem.
A four-minute song with one bad chorus is still a bad deliverable. A 90-second ad cue that can be revised in 15 seconds is often more valuable than a six-minute track that requires a full restart for every change.
That’s why Music v2’s sectional workflow is the more important capability. It fits how creative teams iterate: select, revise, compare, approve. If ElevenLabs gets that interaction right, Music v2 becomes easier to place inside tools for ads, social video, podcasts, games, and internal brand systems.
There’s a catch. Prompt-based editing can still be imprecise. “Make the verse darker but keep the vocal energy” is a human-friendly instruction, but models may interpret it inconsistently. Professional users will eventually want sliders, constraints, references, stems, automation lanes, and DAW-friendly exports. Prompts are good for intent. They’re weaker as precision controls.
Generated audio as software material
For technical teams, the interesting part is the move toward programmable music.
A product could generate localized jingles on demand. A game prototype could create adaptive music loops before hiring a composer. A marketing platform could produce dozens of brand-safe variations for A/B testing. A video tool could let users regenerate only the chorus or outro to match a cut. Those uses need repeatability, editing, metadata, and governance.
ElevenLabs is aiming at that direction. Music v2 looks like a step from one-shot generation toward controllable audio assembly, with vocals as a likely strength. The open questions are the ones engineers will care about: how consistent the outputs are, how good the API controls become, how ElevenLabs handles rights and auditability, and whether the model can survive production feedback without forcing teams back into reroll mode.
The demo feature will get attention. The workflow will decide whether people keep using it.
Useful next reads and implementation paths
If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.
Turn data into forecasting, experimentation, dashboards, and decision support.
How a growth analytics platform reduced decision lag across teams.
Anthropic has launched Claude Design, an experimental product that turns a text prompt into prototypes, one-pagers, and slide decks. That pitch lands in an already crowded category. Canva has expanded its AI stack, Microsoft keeps adding generation t...
Nothing has launched Playground, an AI tool that lets users build small phone experiences from a text prompt and run them as widgets on Nothing devices. Type “track my next flight” or “show a brief before my next meeting,” and Playground generates an...
OpenArt, the startup founded by former Googlers, has opened beta access to One-Click Story, a feature that generates a roughly one-minute video from a prompt, script, or even a song upload. The obvious pitch is simple: type something in, get an AI vi...