January 19, 2014

Just what does "code as data" mean anyway?

And, what is that so interesting about homoiconicity?

Recently, I’ve seen popular articles on macros and metaprogramming in Nimrod and Elixir. Both of these are wonderful metaprogramming systems, to be sure, but I couldn’t help but imagine some in the audience smirking inwardly – and occasionally less inwardly. What odd syntax! How out-of-place they are! If only there existed a homiconic language, they think sarcastically, that could eliminate the need for awkward and forced syntax.

Of course, these smug bastards are users of any of the LISP family of languages.

Lisps, in case you came here entirely by accident, are unique by virtue of having almost no syntax. You write each leaf of your AST delineated by parentheses, making nesting and order of evaluation quite explicit. Lisps also use parentheses to denote lists. So the source code is just lists of lists of symbols. Here’s quicksort in Clojure, for reference (don’t worry too much about understand it if you’re new to Clojure or lisps in general. It’s just for the feeling):

(defn qsort [coll]
  (if (empty? coll)
    coll
    (let [pivot (first coll)
          remainder (rest coll)]
      (concat
       (qsort (filter (partial > pivot) remainder))
       [pivot]
       (qsort (filter (partial <= pivot) remainder))))))

This can make it difficult to read for the uninitiated. Some like it, some hate it. But whatever your tastes, it makes writing macros a first-class feature of the language. But how?

Just what is a macro anyhow?

Usually, “macro” refers to some code construct that is evaluated at compile-time to modify code.

Most peoples’ first exposure to a macro is in C, where they look like this:

#define min(X, Y) ((X) < (Y) ? (X) : (Y))

In this case, macros aren’t much more than string subtitution. For example, it’s important that the parentheses are present, or whatever is passed in to the macro could break the syntax.

Lisp macros are nothing like that. In lisp, we’re not substituting strings, we’re substituting values into the AST directly. We can use the entire language to manipulate code in our macros, and so can implement almost any language construct.

Examples speak louder than words, so let’s imagine we want to make a meta-html domain-specific language that looks like this:

(html
  (body
    (h1 "Hello World")
    (p "How's it going?")))

You’d need a bunch of functions that look like this:

(defn html [& inner]
  (str "<html>" (apply str (map eval inner)) "</html>"))

(defn body ...)

But, nobody likes to see code repeated like that. But we can’t use a function to define a function. So, why not automate it with a macro?

(defmacro deftag [name]
  (list 'defn name ['& 'inner]
        (list 'str "<" (str name) ">"
              '(apply str inner)
              "</" (str name) ">")))

(deftag html)
(deftag body)
(deftag h1)
(deftag p)

See what I did there? The body of the macro actually returns lists of lists. Some of those lists are quoted ('), which causes them to be rendered literally instead of being evaluated. however, the calls to (str name) are left unquoted.

We can use macroexpand to examine the result:

(macroexpand-1 '(deftag blink))

; returns:
(clojure.core/defn blink [& inner]
  (clojure.core/str "<" "blink" ">"
                    (clojure.core/apply clojure.core/str inner)
                    "</" "blink" ">"))

Exactly what we expect, with some extra namespacing added.

This is the sort of thing that people usually mean when they say that lisp treats code as data. Lisp code is just nested lists of symbols that can be evaluated. Lisp macros let you create and manipulate these lists of symbols, using the ample facilities provided by the language for manipulating lists. That’s what homoiconicity means.

For a great example of an elegant DSL created this way, check out Korma.

If you want to learn more about macros in Clojure, you can find a short tutorial on Learn X in Y minutes.