Box-filters without accumulating errors

Geraint Luff
Signalsmith Audio Ltd.

Exploring some ways to efficiently calculate box filters without accumulating floating-point errors, including varying-width filters.

The challenge

In audio/DSP, it's fairly common to need a rectangular moving-average (a.k.a "box filter"). This is the same as rolling sum of recent values divided by its own length:

This smooths out the signal a bit:

A moving-average of 100 samples applied to an example input. The result is smoother, but also a bit delayed.

This is actually a form of lowpass, implemented by convolution. It has a finite-length impulse response, no overshoot, and can be computed efficiently (as seen below), which is a useful combination of properties in certain situations.

Here's the impulse-response of an un-normalised 100-sample box sum (i.e. box-filter without the 1/N factor):

A 100-sample box filter applied to an impulse, making it obvious where the name comes from.

The most basic way to do it is direct convolution (i.e. adding together the most recent 100 samples) but that's slow - and gets worse for longer N.

Better performance: cumulative sums

If you hold on to a previous result, you can calculate a new sum from it using only the samples that are added/removed:

If your moving-average window is progressing forward one sample at a time, maintaining the box-sum is very cheap:

The oldValue argument should be the delayed sample.

If you're using fixed-point (integer) values, this works perfectly (as long as your Sum is a large enough type that nothing overflows).

But if you're using floating-point, additions and subtractions are slightly approximate - so your sum will drift over time.

Avoiding floating-point errors

To avoid accumulating these floating-point errors, we occasionally reset our sum, to a more exact value which isn't based on our previous results (so it won't include previously-accumulated errors).

A straightforward approach is to start a separate "reset sum" from some convenient starting-point:

When our "reset" period and our "current" period match (N samples later) we replace our "running" sum with the "reset" sum.

This implementation doesn't store any history itself, so you have to pass in the oldValue from N samples ago.

As well as having to find your own oldValue, resetStart() and resetEnd() need to be called at specific times (N samples apart).

Fixed-length period

If our box-sum has a fixed period, we can use a circular buffer that's exactly the length of our box-sum. This makes two things nice and simple:

If you already have a circular buffer of the right length, you can use the code above, calling resetEnd() and resetStart() whenever the index wraps around.

If not, here's some example code which includes a circular buffer:

We've used the "index wrap-around" check to perform a reset at the correct interval.

Variable-length box-sum

The previous approach works if your box-sum is always the same length - but if you want to stretch/shrink your box-sum period while it's running, you'll need something else.

In this case, we'll often have a write-head (inserting the latest values) and a read-head (returning delayed values) chasing each other around the circular buffer, and we're interested in the sum of the values which lie between them:

We start a new reset sum when the write-head wraps back around to 0:

When the read-head wraps around, our reset-sum will have the correct period to replace our running-sum:

If you're already handling your own circular buffer, you can just use the code above, calling resetStart() and resetEnd() as the appropriate heads wrap around.

Example code

It's a bit unwieldy, but here's some code for a variable-length box-sum with built-in buffer. It has .push() and .pop() methods (similar to a FIFO queue):

Long code is long

Evan Balster's Bumpy Wheel

Evan Balster came up with an interesting solution to this, which he called the "Bumpy Wheel" (and is included in his rolling_stats C++ library).

Rather than maintain a value for the rectangular sum, this approach stores some "reference" sum, along with cumulative sums for the added/removed values:

The actual result is then calculated when needed:

When things progress far enough that entire reference period has been removed, we reset - using the "added" sum as our new "reference" sum, and resetting the added/removed sums to 0:

The state of the Bumpy Wheel when it is ready for a reset - or a "bump" in Evan's code

Instead of being told when to do this, we can compare the "removed" and "reference" sums. Floating-point errors generally make comparing floating-point numbers slightly iffy, but because they are constructed from an identical sequence of additions, they will produce an identical result (including the exact same errors).

This includes FIFO operations, but no built-in buffer - I'm sure you can imagine how that would look.

Comparison

The biggest advantage of this (compared to our previous variable-length approach) is that it can determine itself when to perform resets, without the user having to call .resetStart()/.resetEnd().

Another advantage is that if you don't actually need the sum every time, you perform fewer calculations.

A slight disadvantage is that when you feed it a stream of 0s, it does a bit more calculation - but in my tests, the difference was pretty marginal (10%), and will depend on architecture/compiler/optimisation.

Conclusion

If you already have an existing buffer (or need variable-sized box-filters), the "Bumpy Wheel" approach is flexible and just a fun design. The other "reset" approaches are competitive in terms of speed, but need some help knowing when to reset. If your averages stay the same length during processing, the fixed-length approach is simple and fast.

But as ever: if you care about performance, trying a bunch of options and profiling is the only way to go. Evan himself ended up using a fixed-point version of a simple box-sum (which doesn't need resets with integers, but you do need know what range of values you put in, to protect against overflows) instead of the Bumpy Wheel, because he didn't need floating-point accuracy.

Anyway - I thought this was interesting, particularly the Bumpy Wheel stuff, and hopefully you did too. 😛