June 2, 2014

"DIY NoSQL" in Clojure

Prototyping apps without a data store

I saw this post, “Do It Yourself NoSql”, recently, and it reminded me of something that Clojure does very very well.

Clojure’s go-to concurrency enabler is its STM system: atoms, agents, and refs. I’ve written about it before, but this seemed like an opportunity to provide a nice example, and discuss a little more about some of the things I often do to defer the need for a database.

A Bare Example

First, here’s a quick re-implementation of the original post’s counter. No special server configuration required, thanks to the JVM’s single-process architecture.

;; Implement some "DIY NoSQL" in clojure
;;
;; Lein Dependencies: [[org.clojure/clojure "1.6.0"]
;;                     [ring "1.1.0"]
;;                     ]

(ns diy-nosql.core
  (:require [ring.adapter.jetty :only [run-jetty]))

(def counter (atom 0))

(defn index [request]
  (swap! counter inc)
  (str "Hello World " @counter))

(defn -main []
  (run-jetty index {:port 8080}))

Clojure’s atom is a simple, synchronous, transactional piece of data. You can use swap! to “mutate” it (actually, to rewrite it with an updated value via function application), and you can access it with (deref counter), which is usually shortened via the @ dereferencing macro to @counter. So, just counter refers to the atom, while @counter refers to the value contained within.

But of course, this isn’t really the best way to go about things, at least for a problem of any complexity. The call to swap! in index is a very implementation-specific way to update the counter, and @counter an implementation-specific way to read it. For this pointless application this is totally fine, but what would be better is to separate the persistence and view layers a bit more.

Something more exciting

Here’s a less- but still very-contrived example: A tiny message board that persists the last five messages posted to it. I’ve put it up on heroku for your enjoyment.

The entire code is below:

;; An ephemeral notice board
(ns diy-nosql.core
  (:require [ring.adapter.jetty :refer [run-jetty]]
            [ring.middleware.params :refer [wrap-params]]
            )
  (:import java.util.Date))

(def MAX-TWITS 5)


;; "Tables"

(def twits (atom []))


;; "Schema"

(def twit-keys [:name :message :timestamp])
(defn clean-twit [twit]
  (-> twit
    (select-keys twit-keys)
    (assoc :timestamp (System/currentTimeMillis))))


;; DB Methods

(defn get-twits [] @twits)
(defn get-twit [ii] (nth @twits ii))

(defn put-twit! [twit]
    (swap! twits #(take MAX-TWITS (conj % (clean-twit twit)))))


;; Views

(defn escape-html
  "Stolen from hiccup"
  [text]
  (.. ^String (str text)
    (replace "&"  "&")
    (replace "<"  "&lt;")
    (replace ">"  "&gt;")
    (replace "\"" "&quot;")))

(defn twit-as-html [twit]
  (str "<li><strong>"
       (escape-html (:name twit))
       "</strong> - "
       (escape-html (:message twit))
       " <small>("
       (str (Date. (:timestamp twit)))
       ")</small>"
       "</li>"))

(defn GET-index [request]
  {:status 200
   :body (str "<html><body><h1>TOP TWITS</h1><ul>"
              (apply str (map twit-as-html (get-twits)))
              "</ul>"
              "<form action=\".\" method=\"POST\">"
              "<input name=\"name\">"
              "<textarea name=\"message\"></textarea>"
              "<button>TWIT</button></form>"
              "</body></html>")
   :headers {}})

(defn POST-index [{{name "name" message "message"} :params :as request}]
  (put-twit! {:name name :message message})
  (GET-index request))

(defn handler [{path :uri method :request-method :as request}]
  (case path
    "/" (case method
          :post (POST-index request)
          ; Otherwise
          (GET-index request))
    ; Otherwise
    {:status 404 :body "404 Not Found"}))


;; Run server

(defn -main []
  (let [port (Integer/parseInt (get (System/getenv) "PORT" "8080"))]
    (run-jetty (wrap-params handler) {:port port})))

I spend too much time fiddling with raw HTML there, but the important parts are near the top, so here they are again:


;; "Tables"

(def twits (atom []))

;; ...

;; DB Methods

(defn get-twits [] @twits)
(defn get-twit [ii] (nth @twits ii))

(defn put-twit! [twit]
    (swap! twits #(take MAX-TWITS (conj % (clean-twit twit)))))

Now, there’s a layer of abstraction between the implementation of my data persistence (a simple atom), and the access of it. If at some point in the future I wanted to switch to something more persistent, I could simply change the implementation of get-twits, get-twit and put-twit!. Simple!

This is my go-to when I’m prototyping. For some applications, data loss is not acceptable and the system needs to be upgraded before going to production. For others, it’s perfectly reasonable to start with an empty datastore on each server restart, and so the storage remains in-memory until some specification requires that it not. Sometimes, a small amount of data loss is ok, and so I start a worker thread to persist the atoms to a file every minute or so and load them on spin-up.

Hosting this application on Heroku also illustrates this data loss, since Heroku will spin down any apps that don’t get traffic and lose all data.

You don’t need to be using Clojure to take advantage of this seperation of concerns, by the way. You could as easily write these utility functions in Python or Ruby or PHP, save the STM part. By keeping the implementation of the in-memory database from the implementation of database access, you’ll save yourself a lot of trouble when you move past the prototype phase and need to install a datastore.