December 17, 2018

Getting a handle on Reduce

Yes, again.

Although traditional algol-inspired languages remain dominant, over the past decade or so, functional programming techniques utilizing immutable data structures have seen widespread adoption in a variety of domains. From React and friends on the frontend to Map-Reduce and the vast ecosystem of thus-inspired data warehousing systems that power some of the world’s largest software applications, the ability to navigate such applications is now a must-have skill for many programmers.

The subject of today’s article is a higher-order function called reduce, also known as fold, aggregate, and others. Reduce is a fundamentally useful tool, but I find that many developers have trouble grasping and using it. My hope here is to make an approachable introduction to reduce.

Redux is one prominent example of the reduce paradigm. Redux is centered around a global application state, which is updated via distinct actions with the application of a reducer, of the form: (state, action) => state. This is an application of the exact same concept that we’re going to be using for more trivial tasks for the rest of the piece, but the lessons are transferable.

Living in an immutable world

The first basic unit of iteration that most programmers encounter is a for loop:

// Old-school Javascript
var items = [1, 2, 3, 4];
for (var ii = 0; ii < items.length; ii++) {
    var item = items[ii];
    // do something
}

// Or for the bootcamp crowd:
items.forEach((item) => {
    // do something
})
# Python
items = [1, 2, 3, 4]
for item in items:
    # Do something
# Ruby
items = [1, 2, 3, 4]
items.each do |item|
    # Do something
end
// Java
int[] items = new int[]{1, 2, 3, 4};
for(int item: items) {
  // Do something
}

You probably saw your first for loop shortly after your first Hello World, and use it all the time. But a for loop is an iteration construct designed for imperative languages.

However, if you’re just getting started in a functional, and you want to be a full-fledged immutable girl (or boy or whatever you like), one of the problems that comes up immediately is navigating immutable collections. Since the body of a loop lacks a return value, it necessarily operates via side-effects, and immutable collections are designed to be resistent to side effects. You can still use for loops in functional paradigms, but it’s not quite the same, and developers often struggle when working with e.g. immutable data structures. Consider this example:

// Back in javascript, using the immutable library
import { List } from 'immutable';

const myIds = List(["1", "2", "3", "4"]);
const myObjects = Map({
  1: someObject1,
  2: someObject2,
  3: someObject3,
  4: someObject4,
  // ...
})

// Given a list of ids, produce a list of objects
const myListOfObjects = // ...?

You may have seen this sort of thing before. Perhaps you’re lucky enough to be working in a language with list comprehensions, like python:

my_ids = [1, 2, 3, 4]
my_objects = {1: obj1, 2: obj2, 3: obj3, 4: obj4}

my_list_of_objects = [my_objects[id] for id in my_ids]

Or perhaps you have access to some equivalent via lodash or whatever. But that’s not the point I’m trying to make. The point of this article is to convince you that anything that you might use a loop for, you can express in terms of reduce instead:

// Javascript again
const myListOfObjects = myIds.reduce(
  (objs, obj) => objs.push(myObjects.get(obj)),
  List()
)

But let’s back up a bit.

What is this “reduce” thing?

The wikipedia article actually calls it “fold”, but I’ve found “reduce” to be more common in more popular languages. It might also be called “aggregate”, which may be clearer for some.

Whatever you call it, reduce accepts a function of two arguments (call it f), an initial value, and a collection. It calls f with the initial value and the first item of the collection, then calls f again with the result of that and the second item, and so on, returning the value of the final call. The “Stairway to Heaven” of reduce examples is summing a list:

const plus = (result, num) => result + num;

const ten = [1, 2, 3, 4].reduce(plus, 0)

/* Equivalent to:

  plus(plus(plus(plus(0, 1), 2), 3), 4)
= plus(plus(plus(1, 2), 3), 4)
= plus(plus(3, 3), 4)
= plus(6, 4)
= 10
*/

To reiterate, there are three components to reduce:

  • A collection, usually a list or array or some other iterable.
  • An initial value, into which each item of the collection will be “folded”.
  • A combining function, which takes two arguments and combines them somehow.

We’ll assume that you have a collection to start with – let’s examine the other two components in turn:

Picking an initial value

You’ll typically want the initial value to be an identity of the type that you want to end up with. An identity is defined with respect to a binary operator (in this case, our combining function). Given an operator, an identity is a value that can be combined with any value to return that value unchanged.

To use a numeric example, 0 is the identity of +, since 0 + <any number> is that number. Many languages overload + to have other meanings on other data types, too:

# Because python makes decent pseudocode

# Strings
"Hello World" + "" == "Hello World"

# Lists
[1, 2, 3, 4] + [] == [1, 2, 3, 4]

# Booleans

# `true` is the identity for `and`
true and true == true
false and true == false

# `false` is the identity for `or`
true or false == true
false or false == false

Since reduce is so often used with addition and addition-like operations, the initial value is sometimes called a “zero value”. However, remember to consider the operator. If you’re multiplying a list of numbers, you’ll need to use 1 as your identity.

Your initial value need not be a scalar, by the way! You might have a complex data structure as your initial value, which will be updated with a list of other data structures.

Defining a combining function

The combining function takes two arguments. The first is always the aggregate, while the second is the next item in the list. The returned value is fed into the next iteration, and so it should be of the same type as the first argument.

So why do I say that reduce is such a fundamental operation? It’s because reduce is a tool that exists to solve the problem:

Given a collection of X, produce Y.

To further this point, it turns out that you can express other, higher-level operators in terms of reduce:

// Javascript

const map = (f, coll) => coll.reduce(
  (out, val) => [...out, f(val)],  // Combining function
  []  // Initial value
);

Here’s filter:

const filter = (f, coll) => coll.reduce(
  (out, val) => (f(val) ? [...out, val] : out),
  []
);

I could go on and on, and have before.

Weaknesses of reduce

Of course, no tool is without tradeoffs. Reduce is a tool ripped from mathematics, so like many functional constructs, its strength is its conceptual simplicity, while its weakness lies on the border between concept and machine.

The main weakness of reduce is that it expects to work with immutable data, which is the slower of the two sorts of data. For example, using the “splat” operator on a plain-old-javascript array as above ([...out, val]) will create a whole new output array for each step, which could be a performance bottleneck. Using a library such as immutable or mori, which utilize some novel structural-sharing algorithms to reduce memory usage, helps this substantially; both uses persistent collection implementations that, as a rule of thumb, are about 2-4x slower on write than an equivalent mutable structure. On the upside, using immutable data structures gives you the advantages of immutable data structures, most notably that they are immutable and therefore won’t change unexpectedly out from under you.

The other performance concern of reduce is that it’s impossible to exit early. My favorite-ever BFF language Clojure overcomes this via a tag-team function pair of reduced, which marks a given value as reduced, and reduced?, which checks said mark. However, most languages don’t afford you the variable metadata features necessary to do this easily, and even if they did, they don’t include it in their standard library and so would probably have a bunch of idiosyncratic and divergent implementations. You could get around this limitation by abusing exceptions or some such thing, but I recommend that you don’t do this.

Conclusion

I mentioned previously that one of reduce’s advantages is its conceptual clarity. What I mean by that is that it’s a helpful framework for breaking tasks down into smaller parts. In particular, the “combining function” is often a sufficiently general operation that it’s something you’d want handy anyhow. Consider a function like:


const addLineToInvoice = (invoice: Invoice, line: InvoiceLine) => {
    return {
        ...invoice,
        totalHours: invoice.hours + line.hours,
        total: invoice.total + line.total,
        lines: [...invoice.lines, line],
    }
}

You might use this with reduce, but you might also use it e.g. right after a user adds a new line to an invoice.

I guess my point is, even if you don’t use reduce every day, having the ability to understand it will likely benefit you all by itself. If I’m correct about that, well, you’re welcome. Thanks for reading!