The science of loudness

Aug 22, 2025 13 min #audio · #maths · #physics

Thanks to my sponsors: Diego Roig, Mikkel Rasmussen, Elnath , Integer 32, LLC, Noel, Ronen Cohen, Max von Forell, David E Disch, Tiziano Santoro, Gorazd Brumen, Wyatt Herkamp, Nyefan, Xavier Groleau, Philipp Gniewosz, Antoine PESTEL-ROPARS, qrpth , David Cornu, Ahmad Alhashemi, Andronik, Matthew T and 278 more

My watch has a “Noise” app: it shows $d B$ , for decibels.

My amp has a volume knob, which also shows decibels, although.. negative ones, this time.

And finally, my video editing software has a ton of meters — which are all in decibel or decibel-adjacent units.

How do all these decibels fit together?

Are the decibels from my watch and the decibels from my amp related? And if so, how? I’ve decided to spend twenty minutes of your time answering that question.

I’ve also spent about a month of my time making FAM: the fasterthanlime audio meter, in Rust of course, with egui. If you’re a patron of any tier, you can clone and run it right now, and if you’re not, well you can’t.

A screenshot of fam, the fasterthanlime audio meter

What even is sound?

Sound, like wind, is more of a concept than a thing, since it’s the name we’ve given to a specific behavior of particles.

When you strum a guitar, the chord vibrates:

…transferring energy to the body of the guitar, which amplifies it and projects it into the air as a pressure wave!

This wave eventually makes its way to the ear, where some processing is already done via its unique mechanical design: after being collected by the the outer ear, the wave is ferried through the ear canal into the eardrum, where three tiny bones amplify it. Then it’s destination: inner ear, where hair cells travel up and down some fluid.

A schematic of the entire hearing system — NIH/NIDCD

Eventually, it’s converted by chemicals into electrical signals and interpreted by the brain as sound.

People experiencing hearing loss and who use a cochlear implant bypass the mechanical parts of that pipeline, relying instead on microphones and speech processors.

Cochlear implant at the museum Kulturama in Zürich, Switzerland.

Tiia Monto

Although those implants do not replicate natural hearing, they give the brain enough information to recognize and process human speech, and environmental sounds.

For the rest of us, our ears detect tiny changes in pressure.

Under pressure

You have to realize that there is pressure constantly being applied to our bodies all the time, on the order of one atmosphere, or about one hundred thousand pascals, the SI unit for pressure.

But you’ll notice that my watch’s “Noise” app doesn’t use pascals. In fact, no audio equipment I’ve ever looked at uses pascals. Instead, they show sound pressure level, defined as follows:

$L_{p} = 20 l o g_{10} (\frac{p}{p_{0}}) d B_{S P L}$

Decibels are a logarithmic unit expressing a ratio — in this case the ratio between $p$ , a pressure we measured, and $p_{0}$ , a reference pressure.

In air (because sound can also travel through water and other media), we usually pick 20 micropascals, which is about the quietest sound the human ear can detect.

$p_{0} = 20 µ P a$

Instead of having a linear scale that spans 8 orders of magnitudes (from micropascals to the thousands), decibels give us a nice human-friendly scale going from $0 d B$ to $194 d B$ :

Decibels	Example
0	Faintest sound heard by human ear
30	Whisper, quiet library
60	Normal conversation, sewing machine, or typewriter
90	Lawnmower, shop tools, or truck traffic (90 dB for 8 hours per day is the maximum exposure without protection*)
100	Chainsaw, pneumatic drill, or snowmobile (2 hours per day is the maximum exposure without protection)
115	Sandblasting, loud music concert, or automobile horn (15 minutes per day is the maximum exposure without protection)
140	Gun muzzle blast or jet engine (noise causes pain, and even brief exposure injures unprotected ears, and injury may occur even with hearing protectors)
180	Rocket launching pad

Source: Merck Manual

Graphing decibels vs pascals — Own work, graphed via Desmos

Above $194 d B$ , we don’t get sound waves, we get shock waves — the pressure amplitude would need to be more than one atmosphere, resulting in negative absolute pressure, which is impossible. Once you reach a vacuum… that’s it. There’s no going any more vacuumy.

Signal processing

We haven’t yet elucidated what the decibels on my amp mean. I would call those $d B_{F S}$ , for decibel “relative to full scale”, because at $0 d B$ , we would get the absolute maximum power the amp can output (which would be damaging to my ears and to my relationship with the neighbors).

The formula is the same, except that we don’t pick a reference $p_{0}$ like with sound pressure levels. Here, we consider an input signal and an output signal:

$L = 20 l o g_{10} (\frac{x_{2}}{x_{1}}) dB$

Say the solid curve here is our input signal, with amplitude $1$ , and the dotted curve is the signal after it comes out of.. some system, with amplitude $0.2$ :

Based on those amplitudes, our system has a gain of:

$L = 20 l o g_{10} (\frac{0.2}{1}) \approx - 14 dB$

Because of the way human hearing works, it makes more sense to design a volume control around decibels. It is, in fact, logarithmic, but it feels more linear to the ear.

If you don’t do that, you end up with a very frustrating volume control where the upper 80% are way too loud, and the value you want is between two ticks on the low end of the slider. I’m sure you’ve seen those before, I know I have.

Those $d B_{F S}$ are the ones we’re interested in as broadcast engineers: since we’re dealing with a signal directly.

Of course, most of the time, a signal will be transformed back into a sound wave, and then we’ll have to worry about $d B_{S P L}$ …

But as long as it’s within an audio system, we have to worry about exceeding levels. With an analog signal, that typically results in distortion, which… can be done on purpose for style, and exceeding levels in a digital system typically results in clipping, a very harsh form of distortion.

Which sounds quite awful — you may recognize that from public announcement systems in parks or maybe trains:

To avoid that, we have watch our levels. And over the past hundred years, we’ve come up with a bunch of solutions to do that, all of which are flawed in some way.

In the 1930s, the BBC came up with meters that look like this:

A typical British quasi-PPM. Each division between '1' and '7' is exactly four decibels and '6' is the intended maximum level.

Hyrumph on Wikimedia Commons

Well, that one isn’t from the 1930s, but the basic idea hasn’t changed.

Independently and around the same time, the Germans also developed level meters, putting actual decibels on the scale…

A German PPM from Siemens & Halske

Max Koschuh on Wikimedia Commons

…and giving them the cute little nickname “Lichtzeigerinstrument” (light pointer instruments).

Today we would call them both “quasi-PPMs”, PPM for “peak programme meter”, and quasi because… they don’t actually report peaks accurately.

Type II PPMs, for example, have an integration time of 10ms — any peak shorter than that gets under-reported. This succession of notes, which all have the exact same volume but are getting longer and longer, shows how the quasi-PPM under-reports at first:

I don't own one, so the best I can do is show you a plug-in that simulates it!

mvMeter2 plug-in

Root Mean Square

But quasi-PPMs are still pretty good at showing peaks. A lot more than, say, VU meters (for Volume Unit), which were invented in the US in the 1940s, and which get us a lot closer to “loudness”.

What VU meters measure is similar to the Root Mean Square, which gives us the average level of a signal over a period of time:

$R M S = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} x_{i}^{2}}$

A typical VU meter has an integration time of 300ms (vs 10ms for a Type II PPM if you remember), giving us even more severe under-reporting of peaks:

mvMeter2 plug-in

Which is fine!

Here’s another side-by-side example — using an RMS meter this time, which is close enough to an actual VU meter:

A prototype of the fasterthanlime audio meter playing one of my tracks, we see that the sample peak is consistently higher than the root mean square.

A VU meter is not meant to show short peaks it’s meant to let a radio operator know how loud a song roughly is, so they can adjust the volume, and the listeners don’t have to.

However, I’m happy to report that, since the 1940s, both our understanding of human hearing and technology have improved.

Sample Peak, True Peak

First off, most audio processing is now done in the digital domain. Which is both a blessing and a curse.

Here’s Ableton showing a piece of audio:

An Ableton screenshot cropped to show just the wave.

If we zoom in, we can see the wave:

The wave is now made up of a faster wave.

We can zoom in some more:

We're now seeing what looks like a graph almost

And eventually, Ableton will show us individual audio samples:

Today’s PPMs are not “quasi” anymore. It’s really easy to make a sample peak monitor, because you just look at a window of samples, say, a thousand of them, and keep whichever value is the furthest from zero! That’s your peak!

Your sample peak! Not your true peak!

Because the actual sound wave is reconstructed from a limited amount of discrete samples, it’s possible for all samples to be below the maximum desired level, and yet for the reconstructed wave to be above it!

(JavaScript is needed for this bit)

To actually measure the true peak, one can use a sinc filter to upsample the signal, which fills in additional samples between the original ones — letting us know how high that sound wave truly goes.

(JavaScript is needed for this bit)

The Loudness Wars

That takes care of peaks. What about loudness then? Well, we made progress there too. First, in the wrong direction.

The idea of compression is to get rid of dynamic range by taking anything above a certain level and progressively making it smaller:

A demonstration of compression in DaVinci Resolve 20 using the very text for this article.

We can apply gain to the resulting signal without clipping, making the whole thing louder. That gain is called “make-up” gain, because it makes up for the loud bits that were made quieter by the compression. I can’t believe that just clicked for me now.

During the 2000s, sound engineers started abusing compression to make their albums sound louder and louder, based on the theory that people preferred louder things.

The song 'Super Trouper' as shown on the major issues of the album, the 1980 Super Trouper LP, 2001 Jon Astley remaster, 2005 The Complete Studio Recordings box set disc 7, and 2011 Super Trouper Deluxe Edition remaster disc 1. — The song "Super Trouper" as shown on the major issues of the album, the 1980 Super Trouper LP, 2001 Jon Astley remaster, 2005 The Complete Studio Recordings box set disc 7, and 2011 Super Trouper Deluxe Edition remaster disc 1.

Kosmosi

These “loudness wars” lasted until the mid-2010s, when the music industry finally tackled the problem, by inventing a proper loudness unit: LKFS.

The first interesting thing about LKFS is that it takes into account multiple channels and does a downmix into one value:

Simplified block diagram of multichannel loudness algorithm

Which begs the question: what did the BBC do, with their PPMs?

Well, they didn’t have to worry about stereo until the late 50s, when they started experimenting with stereo themselves, with two separate AM transmitters.

So, two separate PPMs was one option:

Screenshot of BBC-type Peak programme meter in AB (left/right) mode

Harumphy

…and then they had a different variant that showed the sum and the difference of both channels, which came in two versions, M3:

Screenshot of BBC-type Peak programme meter in M3 (sum/difference) mode

Harumphy

And M6:

Screenshot of BBC-type Peak programme meter in M6 (sum/difference) mode

Harumphy

This is important because if you have two waves of opposite phase, they cancel each other out!

But the first stage of LKFS computation is filtering, to model how humans perceive sound.

The first filter boosts everything above 1000Hz:

Response of stage 1 of the pre-filter used to account for the acoustic effects of the head

Graphed with Desmos

And the second is a high-pass filter, which attenuates anything under 100Hz.

Second stage weighting curve

Graphed with Desmos

Next, we integrate over some interval $T$ to calculate the power of the filtered signal:

$z_{i} = \frac{1}{T} \int_{0}^{T} y_{i}^{2} d t$

And finally, this should look familiar: it’s very close to the $d B$ formula from earlier:

$L_{K} = - 0.691 + 10 \cdot \log_{10} \sum_{i} G_{i} \cdot z_{i}$

But because this time we’re measuring a power, not an amplitude, we use a 10x factor instead of a 20.

$G_{i}$ are the weighting coefficients for individual channels, given in table 3 of BS.1770-5 as:

$\begin{matrix} G_{L} & = 1.0 & 0 DB \\ G_{R} & = 1.0 & 0 DB \\ G_{C} & = 1.0 & 0 DB \\ G_{L s} & = 1.41 & \sim + 1.5 DB \\ G_{R s} & = 1.41 & \sim + 1.5 DB \end{matrix}$

Depending on the interval chosen to calculate loudness, we call the result different things:

$M$ for momentary (400 milliseconds)
$S$ for short-term (3 seconds)

As for I, it’s the integrated loudness, and it takes into account an entire piece of media, minus the quiet parts, using a standard gating mechanism.

This prevents any “cheating” done by audio engineers to make their songs louder than the others, because we finally have one number that is relatively good at predicting how loud something will sound to the human ear.

When mastering for YouTube, we target an integrated loudness level of $- 14 L U F S$ .

In the “Stats for nerds” section of a YouTube video (that you can find in the context menu), there is a content loudness section:

On that video of George Michael’s “Careless Whisper” they left a bit of headroom: their integrated loudness is $- 14 - 2.9 = - 16.9 L U F S$ .

I checked with ffmpeg’s -af ebur128 filter:



~/Downloads
❯ ffmpeg -i careless-whisper.webm -af ebur128 -f null -
ffmpeg version 7.1 Copyright (c) 2000-2024 the FFmpeg developers
✂️
[Parsed_ebur128_0 @ 0x6000002640b0] t: 0.106979   TARGET:-23 LUFS    M:-120.7 S:-120.7     I: -70.0 LUFS       LRA:   0.0 LU
[Parsed_ebur128_0 @ 0x6000002640b0] t: 0.206979   TARGET:-23 LUFS    M:-120.7 S:-120.7     I: -70.0 LUFS       LRA:   0.0 LU
[Parsed_ebur128_0 @ 0x6000002640b0] t: 0.306979   TARGET:-23 LUFS    M:-120.7 S:-120.7     I: -70.0 LUFS       LRA:   0.0 LU
[Parsed_ebur128_0 @ 0x6000002640b0] t: 0.406979   TARGET:-23 LUFS    M: -20.6 S:-120.7     I: -20.6 LUFS       LRA:   0.0 LU
[Parsed_ebur128_0 @ 0x6000002640b0] t: 0.506979   TARGET:-23 LUFS    M: -20.5 S:-120.7     I: -20.6 LUFS       LRA:   0.0 LU
[Parsed_ebur128_0 @ 0x6000002640b0] t: 0.606979   TARGET:-23 LUFS    M: -21.4 S:-120.7     I: -20.8 LUFS       LRA:   0.0 LU
[Parsed_ebur128_0 @ 0x6000002640b0] t: 0.706979   TARGET:-23 LUFS    M: -25.0 S:-120.7     I: -21.6 LUFS       LRA:   0.0 LU
[Parsed_ebur128_0 @ 0x6000002640b0] t: 0.806979   TARGET:-23 LUFS    M: -33.5 S:-120.7     I: -21.6 LUFS       LRA:   0.0 LU
✂️
[Parsed_ebur128_0 @ 0x6000002640b0] t: 300.606979 TARGET:-23 LUFS    M: -60.3 S: -60.3     I: -16.9 LUFS       LRA:   8.2 LU
[Parsed_ebur128_0 @ 0x6000002640b0] Summary:

  Integrated loudness:
    I:         -16.9 LUFS
    Threshold: -27.1 LUFS

  Loudness range:
    LRA:         8.2 LU
    Threshold: -37.1 LUFS
    LRA low:   -22.0 LUFS
    LRA high:  -13.8 LUFS
✂️

Because -16.9 is below the target of -14, YouTube does not apply any change to this video.

By contrast, Rihanna’s Umbrella is mastered to $- 8.8 LUFS$ , so YouTube turns the volume down:

Going from 100% to 55% is a change of… hey, we can calculate that!

$20 l o g_{10} (\frac{0.55}{1}) \approx - 5.2 dB$

And that’s what the “stats for nerds” overlay is showing: it’s 5.2, 5.3 above their target loudness, so they’re turning the volume down.

Personally, I find it strange that they show the difference between their loudness target and the content’s integrated loudness in decibels.

If they want to display the delta, they should use LU, loudness units. And honestly, they could just say the target is $- 14$ and give us the actual loudness of a track in LUFS. This is stats for nerds! Not stats for normies!

Even more recently, YouTube has introduced “DRC” (for dynamic range compression):

As seen here on this Scott the Woz video, which is mastered way below YouTube’s target loudness level, at around $- 23 LUFS$ , the target for “broadcast” rather than “streaming”.

YouTube’s user-facing name for it is “Stable Volume”, it’s not supposed to kick in for music because it ruins music and you can turn it off in the settings:

A-weighting

LKFS, and LUFS (same thing different name) aren’t the only units that try to take psychoacoustics into account: the Apple Watch’s Noise app also does filtering.

The first experiments regarding “how loud humans think sound is” date back to 1927:

A DIRECT COMPARISON OF THE LOUDNESS OF PURE TONES BY B. A. KINGSBURY — Note: "T.U." stands for telephone units, and "cycle" for Hertz.

A few years later, Fletcher & Munson publish this equal-contour loudness graph:

An equal-contour loudness from Fletcher & Munson, 1933

Loudness, Its Definition, Measurement and Calculation

Which takes a minute to figure out. Each of the lines determine a level of loudness. Their test subjects reported that, for example, a 1000Hz sound blasted at 40 decibels felt as loud as a 100Hz sound at 62 decibels.

In other words: we are much, much more sensitive to sounds at 1000Hz than to those at 100Hz.

That dip around 3 to 4 KHz is where our hearing is most sensitive: we made our smoke detectors beep at that frequency for maximum alert, and our babies cry at that frequency for similar reasons.

From that graph, an ISO standard was derived, specifying the A-weighting curve, which predates LKFS’s K-weighting by almost 50 years:

A-weighting, B, C and D-weighting curves. There is a peak around 2-5Khz for the A curve, as expected. — Lindosland on Wikimedia Commons

Although more basic and somewhat outdated, A-weighting is used in a bunch of places.

French law requires sound level meters like these in every music venue:

A display showing 100 dBA, 84 dBA Leq(10min) and 88 dbC Leq(10min). — Amix AFF17-3

As of 2023, the levels to respect are $102 d B_{A}$ and $118 d B_{C}$ Level Equivalent (LEQ), or, “average sound energy” over 15 minutes.

American work safety organizations give different recommendations when it comes to maximum sound exposure:

OSHA:

Duration per day	Sound level (dBA)
8 hours	90
4 hours	95
2 hours	100
1 hour	105
30 minutes	110
15 minutes	115

NIOSH:

Duration per day	Sound level (dBA)
8 hours	85
4 hours	88
2 hours	91
1 hour	94
30 minutes	97
15 minutes	100
7.5 minutes	103
3.75 minutes	106
1.88 minutes	109
0.94 minutes	112

These tables use $d B_{A}$ ; and so does the Apple Watch Noise app.

Now that we know how all these units fit together, we can all be fun at the next party.

(JavaScript is required to see this. Or maybe my stuff broke)

Here's another article just for you:

Declarative memory management

Sep 19, 2019

35 min #rust · #memory

It feels like an eternity since I’ve started using Rust, and yet I remember vividly what it felt like to bang my head against the borrow checker for the first few times.

I’m definitely not alone in that, and there’s been quite a few articles on the subject! But I want to take some time to present the borrow checker from the perspective of its benefits, rather than as an opponent to fend with.