June 30, 2014

DIY NoSQL part deux: interchangeable parts

After my last post on using Clojure’s STM as a quick-and-dirty in-memory datastore, I had an interesting discussion in the comments about the wiseness of implementing my example with a static global. Coincidentally, I attended a talk by Stuart Sierra at EuroClojure about this very thing, and started getting some ideas about how to make things better, which I want to share today.

Recall the core of our last example.

(def twits (atom []))
(defn clean-twit [twit] ...)

; ...

(defn get-twits [] @twits)
(defn get-twit [ii] (nth @twits ii))
(defn put-twit! [twit]
    (swap! twits #(take MAX-TWITS (conj % (clean-twit twit)))))

There is a problem with this simple code. The first, is that our implementations aren’t pure functions, or even pure-ish; they reference that external global atom just sitting there in the twit namespace. The second is that, when we upgrade our code to use a “real” database, our implementations will have to be discarded. But, we might still want to use an in-memory data store for testing or development, or even write a still-simpler mock.

We can solve both of these issues with a little structure. The solution I’m presenting here is inspired by Stuart Sierra’s Component library (and excellent talk), but scaled back a bit for the sake of simplicity, since we don’t (at the moment) have any complex dependencies to manage.

Part 1: Refactoring our old API

First, we add some records: One representing our existing storage, AtomStore, and one representing the new hotness, some sort of database, which we’ll call DBStore. We won’t implement anything using DBStore yet, but we plan to eventually.

(defrecord AtomStore [store])
(defrecord DBStore [conn])

Note that neither record contains any implementation detail, just a definition of what key it must contain. When we instantiate these, we’ll need to provide them with what they need - AtomStore with an (atom {}), and DBStore with a connection spec of some sort, depending on the actual DB library in use.

Our next problem is using different implementations of the twit-storing machinery, without exposing this detail to the user. We’ll do this using protocols, since we’ll only really need to dispatch on whether or not we’re using AtomStore or DBStore, but this whole thing could as easily be done with regular maps and multimethods. However, I find the self-documenting nature of a protocol definition convenient and comforting, so we’re doing it that way.

First, we add a protocol formalizing our public API. We use the existing function names put-twit!, get-twit, and get-twits:

(defprotocol TweetStore
  (put-twit! [this twit])
  (get-twit [this ii])
  (get-twits [this]))

Now, we’ll rewrite our existing functions as the implementations of put-twit!, get-twit and get-twits on the AtomStore. Another change: instead of using a vector in our twit atom, we use a single hash-map, with a vector under the keyword :twits.

(extend-protocol TweetStore
  AtomStore
  (get-twits [this] (get @(:store this) :twits))
  (get-twit [this ii] (nth (get-twits this) ii))
  (put-twit! [this twit]
    (swap! (:store this)
           #(assoc %
              :twits (-> (or (:twits %) [])      ; Grab (:twits store) or []
                        (conj (clean-twit twit)) ; Add the cleaned twit
                        (take MAX-TWITS))))))    ; Drop any old twits

And finally, we’ll need to get a store object all the way down to those functions. One way to do this is to create a ring middleware that injects it into the request map, so let’s just do that:

(defn wrap-store [handler store]
  (fn [req] (handler (assoc req :store store))))

Our view functions will have to be changed to pass the store along:

(defn GET-index [{store :store :as request}]
  {:status 200
   :body (str "<html><body><h1>TOP TWITS</h1><ul>"
              (apply str (map twit-as-html (get-twits store)))
              "</ul>"
              "<form action=\".\" method=\"POST\">"
              "<input name=\"name\">"
              "<textarea name=\"message\"></textarea>"
              "<button>TWIT</button>"
              "</form>"
              "</body></html>")
   :headers {}})


(defn POST-index [{{name "name" message "message"} :params
                   store :store
                   :as request}]
  (put-twit! store {:name name :message message})
  (GET-index request))

And finally, our main method will have to use wrap-store with some instantiated store.

(defn -main []
  (let [port (Integer/parseInt (get (System/getenv) "PORT" "8080"))]
    (-> handler
        (wrap-params)
        (wrap-store (AtomStore. (atom {})))
        (run-jetty {:port port}))))

Part 2: Reaping the benefits

So what did we gain? Well, let’s see how implementing our DBStore goes. We’ll need to add implementations for TweetStore’s methods alongside the existing AtomStore implementation:


(extend-protocol TweetStore
  AtomStore
  (get-twits [this] (get @(:store this) :twits))
  (get-twit [this ii] (nth (get-twits this) ii))
  (put-twit! [this twit] 
    (swap! (:store this)
           #(assoc %
              :twits (-> (or (:twits %) [])      ; Grab (:twits store) or []
                        (conj (clean-twit twit)) ; Add the cleaned twit
                        (take MAX-TWITS)))))     ; Drop any old twits

  DBStore
  (get-twits [this]
    (query (:conn this)
           ["SELECT name, message, timestamp
             FROM twits
             ORDER BY timestamp DESC
             LIMIT 5"]))

  (get-twit [this ii]
    (first (query (:conn this)
                  ["SELECT name, message, timestamp
                    FROM twits
                    ORDER BY timestamp DESC
                    OFFSET ? LIMIT 1" ii])))

  (put-twit [this twit]
    (insert! :twits (clean-twit twit))))
)

Try to ignore my questionable implementation; they key point here is that we only had to add lines. We’d also have to import some stuff from clojure.java.jdbc, which I skipped over, but overall the DBStore implementation exists peacefully alongside the AtomStore one, and none of the consumers of the TweetStore API have to worry about which is which; they just pass along the datastore that they were given.

The only other change: we’ll need to adjust -main to use DBStore instead of AtomStore:

(defn -main []
  (let [port (Integer/parseInt (get (System/getenv) "PORT" "8080"))]
    (-> handler
        (wrap-params)
        (wrap-store (DBStore. {:url "INSERT DB CONNECTION PARAMETERS HERE"}))
        (run-jetty {:port port}))))

And with that, we have a swappable SQL implementation of our data store. You can see how easy it would be to add more; although one might start questioning the sense of writing multiple datastore implementations all at once, it must be comforting to imagine that, should you make a wrong decision, you can easily change storage backends without having to change existing code.

This is a very common approach in Java, of course; write an interface for the datastore, and different implementations. One major difference is that, using protocols, we can separate different of the program into different chunks (e.g. protocols for UserStore, PostStore, CommentStore for a blog), without having to explicitly compose them into some super-interface. And, of course, we get to do it in Clojure, which I think requires no explaining.

There are many other problems that can benefit from this pattern; the example from Stuart’s talk involved an EmailService, a DBService, and a CustomerService which depends on both the db and the email (a dependency problem which Component library aims to solve).

I encourage you to think of places where you’re perhaps defining things like static globals, and consider whether some restructuring now could save you a lot of trouble later.