Structured Clojure: Protocols and multimethods (oh my!)
You can use Clojure to do just about anything without ever needing to define anything but functions and vars, but you’d be missing out. Clojure supports some very nice ways to make your code more expressive – and more structured – when dealing with operations on data types.
The Expression Problem
The Expression Problem refers to the problem of making a language extensible. Software manipulates data types using operations. Sometimes, we want to add new operations, and have them work on existing data types. Sometimes, we want to add new data types, which will work with existing operations.
Object-oriented languages make it easy to add new data types (classes), by extending existing
ones. I can make a new type that extends
java.util.AbstractMap
, and have
access to the operations it defines. But if I want to add a new method to all the Maps in my software,
I’ll have to touch a lot of code that I’ve already written to do it.
Functional languages tend towards the opposite: it’s easy to add new operations (functions), but
harder to adapt the operation to various types. If I want to write a function in clojure that works on both
vecs
and maps
, I’ll probably have to write two implementations and include an if
in the calling
function. If I then want to extend it to work on lists
, I’ll have to write a third implementation
and add another condition to the calling function.
Multimethods
Multimethods are, essentially, syntactic sugar for this situation. The before might look something like this
(declare do-a-thing-with-a-vector)
(declare do-a-thing-with-a-map)
(defn do-a-thing
"Do a thing to a vec or a hash-map"
[in]
(cond
(instance? clojure.lang.PersistentVector in)
(do-a-thing-with-a-vector in)
(instance? clojure.lang.PersistentArrayMap in)
(do-a-thing-with-a-map in)))
(defn do-a-thing-with-a-vector [in]
"A vector (via function)")
(defn do-a-thing-with-a-map [in]
"A map (via function)")
Now, if I want to add an implementation for a list
, I have to change the code in three places:
I have to add a declare, add a case to do-a-thing
, and then write the actual implementation.
Clojure’s multimethods are a syntactic improvement to this procedure.
A multimethod includes a dispatch function, and several methods. The above code written
using a multimethod
looks something like this:
(defmulti do-a-thing class)
(defmethod do-a-thing clojure.lang.PersistentVector [in]
"A vector (via multimethod)")
(defmethod do-a-thing clojure.lang.PersistentArrayMap [in]
"A map (via multimethod)")
Now, to add a new operation, I just need to add a new defmethod
. Expressiveness problem solved.
Multimethods are a flexible and generally awesome solution to this problem, but for this exact situation, another compelling option exists:
Protocols
Protocols are Clojure’s answer to Java-style interfaces. They define a type and its methods, without giving any implementation details. However, protocols differ from interfaces one major way: they can be applied to types that are already declared.
This is a very clever way of solving the expression problem, by getting around the limitations of OO described above. Here’s our very creative function, implemented as a protocol:
(defprotocol DoesAThing
(do-a-thing [in] "Do a thing"))
(extend-protocol DoesAThing
clojure.lang.PersistentVector
(do-a-thing [in] "A vector (via protocol)")
clojure.lang.PersistentArrayMap
(do-a-thing [in] "A map (via protocol)"))
Protocols have the advantage of working with Java’s dispatch to improve performance. A quick-and-dirty benchmark puts the protocol implementation taking about half the execution time of the multimethod version. However, with such a trivial function, execution is dominated by dispatch; in real situations the difference is unlikely to be meaningful.
When to use what?
One detail Protocols have that multimethods don’t is the ability to group
related methods. For example, if I wanted to create head
and tail
methods
for collections, I could do so in a single protocol:
(defprotocol LispyColl
(head [in] "Get the head")
(tail [in] "Get the tail"))
Protocols also make explicit the relationships between types. You can use extends?
,
and satisfies?
to inspect a type to see if it satisfies a given protocol; to do the
same with a multimethod requires that you explicitly specify the relationship using derive
.
On the other hand, multimethods are not limited to single dispatch on type–you can use multimethods for multiple dispatch on arbitrary attributes. Take this example, shamelessly adapted from clojuredocs.org:
(defmulti greeting :language)
(defmethod greeting "English" [_] "Hi")
(defmethod greeting "French" [_] "Salut")
(greeting {:language "English"}) ; => "Hi"
Ultimately, multimethods and protocols are designed to solve very similar problems.
Protocols mirror and extend Java’s capabilities to gain greater performance and interoperability.
Multimethods provide a less object-oriented and more lisp-y way of doing the same thing,
with an added genericism that allows them to be much more flexible. If you need to dispatch
on something other than class
, multimethods are the way to go for sure.
My recommendation? If your problem fits – dispatch on types, benefits from inheritance – use protocols for performance and explicitness; otherwise, use multimethods.
Further reading
- This stackoverflow post does a great job of going over this information.
- Stuart Sierra’s “Solving the Expression Problem with Clojure 1.2” goes into much more detail about Protocols
Afterword: It was pointed out to me by @swannodette that this article probably doesn’t give multimethods enough credit. Probably the biggest thing I failed to point out was that multimethods are a lot better for dispatching on things that aren’t types, such as values in maps, and for multiple dispatch. I’ve amended my conclusion section, but you’d best read up on multimethods on clojure.org lest you get the wrong idea.