Domain modelling with clojure.spec
Clojure.spec is, among other things, Clojure’s official answer to tools like Typed Clojure and Plumatic’s Schema. It represents an attempt to apply some validation to data and functions, without compromising Clojure’s dynamism and data-is-data philosophy. In this post, I’ll be working through a sample program by first outlining and modelling it with the help of clojure.spec, then using spec to guide me while I develop the implementation.
Today’s demonstration problem
The project I’ve chosen for this demo is an RSS feed fetcher and formatter. This is actually a port of an F# project a friend is working on, so I already had a type structure to port.
The program is pretty simple – it’s a tool to fetch the latest from Hacker News’ RSS feed, then go through each of those items and fetch a cleaned-up version of the link via Mercury’s API. Specifically, we’ll need to accomplish a few tasks:
- Retrieve the RSS feed
- Parse the feed to retrieve its contents
- For each item:
- Check whether its domain appears on a pre-configured blacklist (using regexes) This is important because Mercury simply doesn’t work on some domains, so we want to be able to skip those.
- Retrieve the article’s content via the Mercury API
Make sense? Good, let’s get started!
Creating the project
I tend to use boot for my Clojure projects, so I created a build.boot
file with the following:
(set-env!
:source-paths #{"src"}
:resource-paths #{"resources"}
:dependencies '[[org.clojure/clojure "1.9.0-alpha17"]
[org.clojure/spec.alpha "0.1.123"]
[cheshire "5.7.1"]
[failjure "1.0.1"]
[org.clojure/core.async "0.3.443"]
[bidi "2.1.2"]
[http-kit "2.2.0"]
[clj-http "3.6.1"]])
I also need to ensure that I’m using clojure 1.9, so I set up the boot.properties
file
to ensure this:
BOOT_CLOJURE_NAME=org.clojure/clojure
BOOT_CLOJURE_VERSION=1.9.0-alpha17
BOOT_VERSION=2.7.1
After running mkdir src
and mkdir resources
, I can run boot repl
and start developing.
A Quick primer on namespaced keywords
I’ll assume you’re familiar with Clojure’s keywords. They look like this: :keyword
.
You may have encountered keywords with two colons instead of one. These keywords are namespaced, and
the double-colon syntax is shorthand for “use this namespace”. So, if I’m in (ns myproject.myns)
,
::keyword
returns :myproject.myns/keyword
.
A second shorthand that exists is the ability to assign a required namespace to a keyword. For example,
if I’m in (ns myproject.myotherns)
and I’ve run (require '[myproject.myns :as myns])
, then
::myns/keyword
will as well return :myproject.myns/keyword
.
Creating the Domain definitions
For this project, I’ve decided to put all the domain definitions in a single namespace. This is because, besides being a way to usefully validate values, specs (like static types) offer a valuable sort of documentation, and keeping them in one place creates a very useful reference.
I created a new directory in src/hackynews
, opened up src/hackynews/domain.clj
and added a namespace declaration:
(ns hackynews.domain
(:require [[clojure.spec.alpha :as s]
[failjure.core :as f]]))
Failjure is a library I maintain to help work with errors as values, and it turns out to play nice with spec – at least, nicer than thrown exceptions, which can’t really be specced.
Our domain definitions will not only define our data types, but also the steps in our process. We’ll go through and write our domain specs in three parts:
- Inputs
- Outputs
- Process Steps
Defining the Domain Inputs
I started by defining the structure of the Domain inputs: namely, the feed.
(s/def ::rss-feed
(s/keys
:req-un [::title ::description ::link-uri ::items]))
Here, I define a single spec, using s/keys
, which checks that keys are present in a map. I also used
s/def
to register the spec to a key, which must be a namespaced key.
This is already a perfectly valid and useful spec. It will ensure that
its input is a map, and require some unqualified keys (hence, :req-un
): title, description, link-uri, and items.
However, even though the spec will accept unqualified keywords as valid, it demands that I use namespaced keywords
to define them, for reasons I’ll explain right now.
We’ve already run into one of spec’s most interesting design decisions. Notice that I’ve specified nothing
about what the values of these keys might be. That’s because I can’t spec the value of
a feed’s :link-uri
key. However, I can attach a spec to ::link-uri
, which will then
apply to all link-uri keys in the current namespace. And in fact this comes up right away,
because each feed item also has a link-uri:
(s/def ::feed-item
(s/keys
:req-un [::title ::description ::link-uri ::comments-uri ::pub-date]))
You may have noticed that the feed item spec contains several keys in common with the rss-feed spec, so let’s enforce those a little bit:
(s/def ::title string?)
(s/def ::description string?)
(s/def ::link-uri uri?)
(s/def ::comments-uri uri?)
Now, I’ve applied some additional validation to both the ::feed-item
spec and the ::rss-feed
spec. I’ve left out a spec for the ::pub-date
key because, even though it appears in the
RSS data, I won’t actually be using it at all.
Next, we can tie our two major specs together.
(s/def ::items (s/coll-of ::feed-item))
Here I’ve added an additional constraint to ::rss-feed
, which is that the :items
key
must be a collection of (hence, coll-of
) values that match the ::feed-item
spec.
We also mentioned that we wanted to have a predefined blacklist of regular expressions, which we want to use to skip links that we don’t want to fetch the content of. Here’s what that spec looks like:
(s/def ::blacklist (s/coll-of #(instance? java.util.regex.Pattern %)))
As demonstrated here, any function with the signature (x) -> boolean
can be used as a spec.
Defining the outputs
We don’t really need anything as the output except a list of ::feed-items
with an
extra key, ::content
, which we’ll spec as a regular string:
(s/def ::content string?)
(s/def ::feed-item-with-content
(s/and ::feed-item (s/keys :req-un [::content])))
Here, I’ve used s/and
to combine two specs.
Defining the process
Next, we can pre-spec the functions that will compose our overall program.
Let’s start from the bottom: We’ll need to be able to turn a ::feed-item
into a ::feed-item-with-content
, it’s the whole point! However, we can add two
constraints:
- The content retrieval might fail, in which case we want the
::feed-item
as a fallback - The item’s url might be on the blacklist, so we’ll need access to the blacklist to check against. A failure of this check
should also return the
::feed-item
.
So, here’s our spec:
(s/def ::fetched-item-result
(s/or
:ok ::feed-item-with-content
:error ::feed-item))
(s/def ::try-fetch-item-content
(s/fspec
:args (s/cat
:blacklist ::blacklist
:item ::feed-item)
:ret ::fetched-item-result))
Here, we have first defined a spec that represents either failure or success. In case of
failure, we fall back on the unfetched feed item. We’ve also defined a spec for a function
using s/fpec
, that accepts two arguments (the blacklist and a feed item) and returns something matching the
::fetched-item-result
that we defined.
The s/cat
here is a bit interesting. It represents the concatenation of several, tagged
values. The tags will show up in error messages thrown by spec to help point out which
condition failed. s/cat
, along with a few others, are part of a branch of spec called “regular expression specs”,
which are beyond the scope of this article (and problem) but worth reading about anyhow.
Next, we’ll need a function that turns an rss feed into a list of ::fetched-item-result
.
We can spec that straightforwardly:
(s/def ::try-fetch-items
(s/fspec
:args (s/cat
:blacklist ::blacklist
:feed ::rss-feed)
:ret (s/coll-of ::fetched-item-result)))
We pass in the blacklist because we need to pass it along.
We’ll need a way to get the rss feed, which will be a function that is given a uri:
(s/def ::get-rss-feed
(s/fspec
:args (s/cat :uri uri?)
:ret (s/or
:ok ::rss-feed
:error f/failed?)))
Here, I use failjure’s failed?
as a spec, which does a fine job if
I may say so myself.
Finally, we’ll want one more function that ties everything together, accepting the blacklist and a feed url and returning a list of fetched items:
(s/def ::fetch-rss-feed-items
(s/fspec
:args (s/cat
:blacklist ::blacklist
:uri ::link-uri)
:ret (s/or
:ok (s/coll-of ::fetched-item-result)
:error f/failed?)))
With the spec done, it’s time to see how it can help us actually write the code – after all, we haven’t actually done anything yet!
Developing the implementation
Now that our domain is laid out, the implementation becomes a matter of filling out those function specs we crafted so nicely. We wrote out the specs back-to-front, but for the sake of repl-driven development it’s probably a bit easier to write out the implementation the right way around, so that we have values to pass to the next step.
Here’s my namespace declaration:
(ns hackynews.impl
(:require [hackynews.domain :as domain]
[clojure.spec.alpha :as s]
[clj-http.client :as http]
[failjure.core :as f]
[clojure.xml :as xml]
[cheshire.core :refer [parse-string]]))
Setting up for development
Before beginning to write these functions, I prepared a little helper
in a (comment)
at the bottom of the file:
(comment
(require '[clojure.spec.test.alpha :as stest])
(stest/instrument)
)
The instrument
function will attach automatic spec-checking to every function in the namespace,
which makes spec errors very obvious. However, note that this is (necessarily, for Clojure)
run-time checking. You also need to re-run instrument when you add or change a spec.
Retrieving the feed
This is actually where having access to specs helped the most. I used
clojure.xml
to retrieve the feed, which returns a somewhat verbose
data structure of the format {:tag :rss :attrs {} :content [{:tag :title ...} ...]}
.
Getting this down into the format we want to work with ended up being most of the implementation:
(defn- parse-item [item-node]
(reduce (fn [item node]
(case (:tag node)
:title (assoc item :title (-> node :content first))
:description (assoc item :description (-> node :content first))
:link (assoc item :link-uri (-> node :content first (java.net.URI.)))
:comments (assoc item :comments-uri (-> node :content first (java.net.URI.)))
:pubDate (assoc item :pub-date (-> node :content first))
)
) {} (:content item-node)))
(defn- parse-channel [channel-node]
(reduce (fn [feed node]
(case (:tag node)
:title (assoc feed :title (-> node :content first))
:description (assoc feed :description (-> node :content first))
:link (assoc feed :link-uri (-> node :content first (java.net.URI.)))
:item (update feed :items conj (parse-item node))
feed)) {} (:content channel-node)))
(defn get-rss-feed [uri]
(f/attempt-all [feed (f/try* (xml/parse (str uri)))
channel (-> feed :content first)]
(parse-channel channel)))
(s/def get-rss-feed ::domain/get-rss-feed)
However, as I was developing the above, I was able to refer to (s/explain ::domain/rss-feed result)
. Explain
takes a spec and a value, and tells you just where your value is failing to conform to the spec (or
prints a nice success message if it does conform). This gave me a lot more confidence in my implementation.
Fetching the items
Retrieving the items is a pretty straightforward operation, simple requiring me to make a request to mercury’s JSON endpoint and add the result to the item.
(defn- fetch-item-content [item]
(f/attempt-all
[req {:query-params {:url (str (:link-uri item))}
:headers {"x-api-key" "XXXXXXXXXXXXXXXx"}}
resp (f/try* (http/get "https://mercury.postlight.com/parser" req))
content (-> resp
(:body)
(parse-string true)
(:content))]
(assoc item :content content)
(f/when-failed [e] item)))
(defn try-fetch-item-content [blacklist item]
(if (some #(re-matches % (:link-uri item)) blacklist)
item
(fetch-item-content item)))
(s/def try-fetch-item-content ::domain/try-fetch-item-content)
(defn try-fetch-items [blacklist feed]
(map #(try-fetch-item-content blacklist %) (:items feed)))
(s/def try-fetch-items ::domain/try-fetch-items)
Tying it together
The final piece of the puzzle was the overarching function, which turned out a bit anticlimactic:
(defn fetch-rss-feed-items [blacklist uri]
(f/attempt-all
[feed (get-rss-feed uri)]
(try-fetch-items blacklist feed)))
(s/def fetch-rss-feed-items ::domain/fetch-rss-feed-items)
And with that, we have a working project! Now it’s pretty straightforward to hook this up to a template generator of one description or another and come up with a nice, readable summary of the day’s HN posts.
Conclusion
Using spec on this project was a bit overkill, not because it’s too small a project, but because I’m never going to touch it again. However, the benefits of spec, like the benefits of static typing and other validation systems, come mostly when someone else (or perhaps yourself, in a year or two) has to understand it to use or maintain it.
Here’s hoping that spec catches on as a standard for Clojure libraries!