The gap between how AI music tools are marketed and how production professionals actually want to use them is still wide. Most tools are positioned as consumer-facing creativity products: "generate a track in seconds," "describe your mood and get music," "no musical knowledge required." This is a reasonable market to go after. It's not the market that decides whether AI music becomes infrastructure for professional production.
We've spent the past year speaking with post-production houses, advertising agencies, indie game studios, editorial teams, and podcast production companies. Not at conferences — in their actual rooms, watching their actual workflows. What they want is not what most AI music demos show. This is what they actually ask for.
Stems, not tracks
This comes up in every professional conversation. Without exception. Stems — isolated layers — are the deliverable that fits professional workflows. A stereo mixed master is a finished product; it's useful only at the final delivery stage. Everything before final delivery requires control over individual layers.
What's interesting is how specific professionals are about which stems. An advertising post producer doesn't need twelve isolated tracks. They need four to six: something rhythmic, something harmonic/ambient, something melodic, bass, and maybe one or two textural elements. A game audio implementer working in Wwise needs stems that align with the state logic they're building — they want to know "which stem do I fade in for the tension state" not "here are twelve tracks, figure it out."
The stem requirement is also a quality requirement. A stem with bleed from adjacent sources — melody content in the harmony stem, bass in the percussion stem — is not a usable stem for professional post. It's just a bad mix at a different level. The quality bar for stems is "I can EQ this layer independently without hearing content from other layers." That's not easily achieved, and it's what distinguishes tools built for professional delivery from tools built for consumer use.
Clearance they can actually act on
Music licensing for professional use is complicated by history. Stock libraries carry Content ID registrations, PRO claims, and sync licensing terms that interact in complex ways with broadcast, streaming, and digital platforms. We wrote about this specifically in the Content ID piece earlier this year — the short version is that paying for a license doesn't always prevent a copyright claim from landing on your content, because the rights infrastructure wasn't built to handle automated content matching at scale.
What production professionals want from AI music is clearance that is unambiguous and platform-comprehensive. Not "royalty-free for online use" with caveats about broadcast. Not "you own the sync rights but the master might have third-party samples." Clean, original generation where the copyright chain is short and clear — and where the terms are stated in plain language that a contracts person can read and approve without a specialist music lawyer.
One thing we hear from advertising agencies in particular: they need something they can include in a deliverables pack. When a campaign asset is delivered to a brand, the music rights documentation goes with it. "AI-generated, original, cleared" needs to be a document with a format their legal team can file. This sounds administrative, but it's a genuine production requirement that most AI music tools don't address.
Revision granularity: fix this part, not the whole thing
The current generation of AI music tools mostly operates on whole-track generation. You describe a track, you get a track, you describe adjustments, you get a new track. This is useful for ideation. It's frustrating for late-stage refinement.
When a production team is on the third revision of a 60-second spot and the client says "everything is right except the energy drops too much at the 45-second mark," generating a new whole-track is not what they need. They need the 45–60 second section regenerated with adjusted parameters while the first 45 seconds stay exactly as approved. Full regeneration risks the parts that were already working.
Professionals ask for section-level regeneration, parameter adjustment per stem, and the ability to lock elements while varying others. This is a harder interface design problem than "describe your mood and generate." It requires the model to maintain consistency across a partial regeneration — matching the harmonic key, the timbral character, the tempo and groove feel of the locked section, while producing meaningfully different output in the unlocked section.
We're working on this. It's not fully solved and I'm not going to claim otherwise. But it's the right problem to be working on, because it's what separates a generation tool from a production tool.
API access and pipeline integration
Among more technically oriented teams — particularly game studios and editorial agencies with established post-production pipelines — API access is a recurring request. Not because they want to consume music programmatically at high volume, but because they want music generation to be one step in an automated or semi-automated workflow.
An editorial house producing daily content needs music assets to be available without a manual generation step per episode. A game studio doing procedural level generation wants background music to respond to level parameters. An agency with a creative automation platform wants to trigger music generation as part of a broader asset production run.
These use cases don't require a consumer-facing UI. They require a well-documented API with predictable output quality, consistent stem structure, and reasonable latency. The latency point matters more than it sounds: an API call that returns results in 60–90 seconds fits into a manual workflow. One that takes 8–12 minutes is not usable in a pipeline context.
What professionals are not asking for
Almost nobody asks for music that "writes itself" or for AI to replace creative direction. The composition of specific pieces for specific editorial contexts remains a creative direction task — someone has to decide what a scene needs musically. What professionals want is for the execution of that creative direction to be faster, cheaper, and more controllable than the current commissioning model allows.
They're also not asking for AI to understand their brand in some holistic way. They want to write a specific brief and get a specific result that matches it. Brand-level consistency over time is a human-managed thing — maintained through brief quality and iteration, not through a model that "knows your brand."
And most professionals are not asking for cost to be the primary driver. The composer-versus-generation cost comparison is interesting academically, but experienced post-producers don't want cheap music; they want fast, controllable music with clean rights. If AI generation achieves that, cost follows as a consequence, but it's not the starting point for the decision.
What this shapes for us as a product
The pattern in what professionals want is consistent: they want production infrastructure, not a creation toy. Stems, clearance, iteration granularity, and pipeline integration are the infrastructure requirements. They're harder to build than a consumer-facing generation demo, and they're less visually impressive as a product pitch. But they're the requirements that determine whether AI music becomes a real part of professional production workflows.
We're a small team building in this direction. Not everything on the list is done. But the list is right, and the sequence matters — stems and clearance are the foundation, iteration granularity builds on top, API access is the layer after that. You can't skip to pipeline integration before the stems are clean.
What we've noticed over the past year is that production professionals who try AI music tools once with a consumer-grade output and see mixed stereo files without stems don't come back. They conclude the category isn't ready for professional use. The tools that will prove them wrong have to clear the production infrastructure bar first — everything else is secondary.