June 4, 2013

Structured Clojure: Protocols and multimethods (oh my!)

You can use Clojure to do just about anything without ever needing to define anything but functions and vars, but you’d be missing out. Clojure supports some very nice ways to make your code more expressive – and more structured – when dealing with operations on data types.

The Expression Problem

The Expression Problem refers to the problem of making a language extensible. Software manipulates data types using operations. Sometimes, we want to add new operations, and have them work on existing data types. Sometimes, we want to add new data types, which will work with existing operations.

Object-oriented languages make it easy to add new data types (classes), by extending existing ones. I can make a new type that extends java.util.AbstractMap, and have access to the operations it defines. But if I want to add a new method to all the Maps in my software, I’ll have to touch a lot of code that I’ve already written to do it.

Functional languages tend towards the opposite: it’s easy to add new operations (functions), but harder to adapt the operation to various types. If I want to write a function in clojure that works on both vecs and maps, I’ll probably have to write two implementations and include an if in the calling function. If I then want to extend it to work on lists, I’ll have to write a third implementation and add another condition to the calling function.

Multimethods

Multimethods are, essentially, syntactic sugar for this situation. The before might look something like this

(declare do-a-thing-with-a-vector)
(declare do-a-thing-with-a-map)

(defn do-a-thing
  "Do a thing to a vec or a hash-map"
  [in]
  (cond
    (instance? clojure.lang.PersistentVector in)
      (do-a-thing-with-a-vector in)
    (instance? clojure.lang.PersistentArrayMap in)
      (do-a-thing-with-a-map in)))

(defn do-a-thing-with-a-vector [in]
  "A vector (via function)")

(defn do-a-thing-with-a-map [in]
  "A map (via function)")

Now, if I want to add an implementation for a list, I have to change the code in three places: I have to add a declare, add a case to do-a-thing, and then write the actual implementation.

Clojure’s multimethods are a syntactic improvement to this procedure. A multimethod includes a dispatch function, and several methods. The above code written using a multimethod looks something like this:

(defmulti do-a-thing class)

(defmethod do-a-thing clojure.lang.PersistentVector [in]
  "A vector (via multimethod)")

(defmethod do-a-thing clojure.lang.PersistentArrayMap [in]
  "A map (via multimethod)")

Now, to add a new operation, I just need to add a new defmethod. Expressiveness problem solved.

Multimethods are a flexible and generally awesome solution to this problem, but for this exact situation, another compelling option exists:

Protocols

Protocols are Clojure’s answer to Java-style interfaces. They define a type and its methods, without giving any implementation details. However, protocols differ from interfaces one major way: they can be applied to types that are already declared.

This is a very clever way of solving the expression problem, by getting around the limitations of OO described above. Here’s our very creative function, implemented as a protocol:

(defprotocol DoesAThing
  (do-a-thing [in] "Do a thing"))

(extend-protocol DoesAThing
  clojure.lang.PersistentVector
    (do-a-thing [in] "A vector (via protocol)")
  clojure.lang.PersistentArrayMap
    (do-a-thing [in] "A map (via protocol)"))

Protocols have the advantage of working with Java’s dispatch to improve performance. A quick-and-dirty benchmark puts the protocol implementation taking about half the execution time of the multimethod version. However, with such a trivial function, execution is dominated by dispatch; in real situations the difference is unlikely to be meaningful.

When to use what?

One detail Protocols have that multimethods don’t is the ability to group related methods. For example, if I wanted to create head and tail methods for collections, I could do so in a single protocol:

(defprotocol LispyColl
  (head [in] "Get the head")
  (tail [in] "Get the tail"))

Protocols also make explicit the relationships between types. You can use extends?, and satisfies? to inspect a type to see if it satisfies a given protocol; to do the same with a multimethod requires that you explicitly specify the relationship using derive.

On the other hand, multimethods are not limited to single dispatch on type–you can use multimethods for multiple dispatch on arbitrary attributes. Take this example, shamelessly adapted from clojuredocs.org:

(defmulti greeting :language)

(defmethod greeting "English" [_] "Hi")
(defmethod greeting "French" [_] "Salut")

(greeting {:language "English"}) ; => "Hi"

Ultimately, multimethods and protocols are designed to solve very similar problems. Protocols mirror and extend Java’s capabilities to gain greater performance and interoperability. Multimethods provide a less object-oriented and more lisp-y way of doing the same thing, with an added genericism that allows them to be much more flexible. If you need to dispatch on something other than class, multimethods are the way to go for sure.

My recommendation? If your problem fits – dispatch on types, benefits from inheritance – use protocols for performance and explicitness; otherwise, use multimethods.

Further reading

Afterword: It was pointed out to me by @swannodette that this article probably doesn’t give multimethods enough credit. Probably the biggest thing I failed to point out was that multimethods are a lot better for dispatching on things that aren’t types, such as values in maps, and for multiple dispatch. I’ve amended my conclusion section, but you’d best read up on multimethods on clojure.org lest you get the wrong idea.