Game Audio

Game Audio Budget Breakdown: Where Indie Studios Actually Spend Their Audio Money

Sundar Arvind
Game audio budget breakdown for indie and AA studios

The indie game audio budget conversation usually goes one of two ways. Either it's treated as residual — whatever's left after art and code — or it's a carefully considered slice that still can't accommodate everything the team wanted. We've spoken with teams across a range of project scales over the past year, from solo developers with £5K total budgets to AA studios working with £200K-£400K production costs. The patterns in how they allocate and how they compromise are consistent enough to be worth writing down.

This isn't a prescriptive guide for how you should spend your audio budget. It's an honest account of how studios at different scales actually do spend it, what they cut, and where the ambition-reality gap most often appears.

The indie audio budget structure (£5K–£30K total audio)

For a true indie — a team of two to five people self-funding or operating on a small publisher advance — the total audio budget typically sits between £5,000 and £30,000 for a project with a 12–18-month development cycle. Here is roughly how that breaks down in practice:

  • Sound design and SFX (40–55%): This almost always takes the largest share. Sound effects touch every system in a game — UI, footsteps, environmental ambience, weapon hits, ability cues. SFX designers charge £350–£700 per day and a mid-complexity indie typically needs 15–30 days of work across the project.
  • Music composition (25–40%): The single biggest variable. A composer who charges £800–£1,500 per finished minute of music will produce a very different budget outcome from one who charges £2,500–£4,000 per minute. The rate difference is often less about talent than about revision terms, stem delivery, and exclusivity.
  • Audio implementation (10–20%): Getting assets into Wwise or FMOD, building state machines, setting up adaptive music layers, parameter-driven attenuation — this is often underestimated. If the lead developer isn't handling implementation, budget 5–10 days of a dedicated audio implementer's time.
  • Middleware licensing and tools (5%): Wwise is royalty-free for indie projects under a revenue threshold, but FMOD Studio licensing costs money at commercial release for some tiers. Budget for licenses even if you think you'll qualify for free tiers — projects sometimes exceed those thresholds.

Where the budget collapses: adaptive music

Here is the most common gap between what indie teams plan and what they ship. The design document calls for adaptive music: an exploration state with a soft, looping ambient layer; a tension state that introduces rhythmic elements when enemies are nearby; a combat state with full intensity. This is a reasonable and achievable design. It requires three to five distinct music layers that can crossfade or layer on state transitions.

The problem is that delivering adaptive music from a composer requires stem-separated assets. Not one mixed track per state, but individual stems — rhythm, bass, melody, harmony, textural elements — that the middleware can bring in and out independently. Many indie-tier composers don't work this way. They deliver mixed masters. The result is a design that planned for adaptive layering but ships with a series of abrupt track switches because nobody in the chain had the time or budget to renegotiate the deliverable format.

A specific scenario: a five-person team in Stockholm developing a top-down RPG planned a combat music system with four intensity layers in Wwise. They budgeted £6,000 for music — reasonable for the scope. The composer they hired delivered 14 tracks as stereo mixes. The Wwise implementation engineer quoted £1,800 to build the state machine but told them up front that without stems, the adaptive system would be reduced to full-track swaps. They shipped with track swaps. The adaptive music design they'd planned for 18 months never made it into the game.

This is not an indictment of the composer. It's a communication and deliverable-specification problem. The studio didn't know to ask for stems. The composer didn't know they needed them. The gap only appeared at implementation.

AA audio budgets (£40K–£120K total audio)

At AA scale, the structure shifts. Audio is no longer a single line item subcontracted to one or two freelancers. It splits into distinct budget lines:

  • Composer: Often a named composer or a small studio. Budget £15K–£40K for original score, depending on duration and exclusivity. Stem delivery is more commonly spec'd at this level, though not universal.
  • SFX studio or lead sound designer: £20K–£50K, covering design, Foley, recording sessions, and library licensing.
  • Audio director or lead: Increasingly a fixed-term contract role at AA, separate from implementation work. £400–£700/day for 20–40 days of direction and QA.
  • Implementation: Larger Wwise or FMOD builds at AA scale require 20–40 days of dedicated implementation. This is often done by the audio director or a specialist contractor.
  • Voice direction and VO: Once you have voiced dialogue, this can easily consume 20–30% of total audio budget on its own. AA projects with significant narrative content often discover that VO expenses crowd out music and ambience budgets in ways that weren't planned for.

What gets cut first, and why

At both indie and AA scale, the things that get cut first share a common characteristic: they're invisible until they're missing. In order of typical cut priority:

  1. Adaptive music depth. Goes from four-state to two-state to single-loop as budget pressure increases. Nobody on the marketing side will notice, and the developers are tired.
  2. Environmental ambience variety. The plan was eight distinct ambient soundscapes. Four ship. Sometimes two.
  3. Re-recording and polish sessions. The first recording pass is good enough. Additional days to re-record the three SFX that never sat right get cancelled.
  4. Music stem delivery. If the composer contract didn't explicitly include stems, requesting them late in production becomes a renegotiation. Often the budget isn't there to cover the additional work.

We're not saying cutting these things is wrong. Budget constraints are real and every project makes difficult choices. The point is that these cuts have downstream consequences — particularly the stem delivery cut, which affects how much flexibility the implementation team has during the final weeks before submission.

The music generation economics argument

Where AI music generation changes the economics at indie scale specifically: stem delivery is not an afterthought or an additional line item — it's the default output format. When we generate a track through Mozrat AI, the output is the separated stems. There's no mixed-master-then-request-stems negotiation, no revision cycle to get the format right.

For the Stockholm team in the example above: if the music had been generated rather than commissioned, the Wwise implementer would have had the stems from day one. The adaptive system they designed would have shipped. The £6,000 composer budget would have been available for SFX polish instead.

This doesn't mean AI generation is the right choice for every game. Projects where the composer's personal artistic voice is part of the game's identity — where the music is inseparable from the authorship of the work — are not good candidates. Projects where the audio brief is functional ("tense stealth music, 120 BPM, minimal percussion, strings-led") and stems are a technical requirement are exactly the use case.

The practical implication for studios planning audio budgets

Two things to explicitly budget for that teams most often forget:

First, implementation time is not free. If you're hiring a composer and planning to do your own Wwise integration, budget two to three days of your own time per hour of finished music — more if the adaptive design is complex. Implementation complexity scales with design ambition, not just music duration.

Second, specify stem delivery in the composer contract before signing. The time to negotiate stem format is before any music is written, not after the final mix is delivered. The clause to include: "All delivered music shall include separated stems at minimum: rhythm/percussion, bass, melody, harmony/pads, and any additional textural layers. Stems shall be delivered as 24-bit/48kHz WAV, gain-staged such that the mix of all stems reproduces the approved master."

That one sentence prevents the most common adaptive music failure mode in indie game development. It costs nothing to include and potentially saves several thousand pounds in implementation renegotiations.