May 14, 2014

The nitty-gritty of a Clojure web app

Or, how I made RedditLater, in excruciating detail

This post is a follow-up/supplement to my article Anatomy of a Web App: How I Built RedditLater in Clojure, published on sitepoint. To make the article appealing for a mass-market audience, I had to take out most of the nitty-gritty details that wouldn’t interest people who weren’t interested in Clojure to begin with (plebians!). The details still have some good stuff in them, though, so I’ve made this post out of it.

So I hope you’re here to read about my fiddly minutae, because off we go!

A Review

RedditLater runs on a single Heroku instance. In a nutshell, RedditLater logs users in to reddit, saves their auth token, accepts posts that they would like to schedule into a queue, and then posts them using said token at some later date and time. It uses mongodb via MongoHQ for persistence.

All the pieces

Data model

There are users, and posts. Users are mostly just a collection of reddit usernames and auth tokens; posts contain all the information the user wants in their reddit post, plus the schedule and some metadata about the number of post attempts made and the response from reddit’s api for debugging purposes. All very straightforward.

Modules

RedditLater consists of a bunch of modules, listed here in the order in which the user encounters them:

  • core - Running the development and production servers and associated services.
  • routes - Defining URLS within the app
  • views - Showing pages and accepting input
  • reddit_api - Communicating with reddit
  • analysis - Producing day/time histograms
  • auth - Authentication with reddit on behalf of the user
  • forms - Form validation
  • data - Saving and retrieving posts and users
  • worker - Does the actual work of posting to reddit at the scheduled time

Core

The only remotely interesting thing about core is that I use http-kit instead of the usual ring-jetty to serve the pages. It claims to be more performant. RedditLater’s never gone down to a traffic surge, but I can’t really say if http-kit had anything to do with that.

Routes

Routes aren’t too interesting. However, I did steal a macro to enforce all view functions receiving a single req argument instead of succumbing to the temptation to use compojure’s unpacking:

(defmacro r [method path callback]
  `(~method ~path req# (~callback req#)))


; in use:
(defroutes auth-routes
  (r GET "/" auth/auth)
  (r GET "/callback" auth/auth-request-callback))

Views

RedditLater’s views use the Middleman/Enlive arrangement I’ve written about before. This allows me to design new features with Middlleman, then hook up their functionality with HTML-transforming macros via Enlive. Here’s how the upcoming posts table is populated, for example:

;
(enlive/defsnippet upcoming-post-row "templates/post-index.html"
  [:.upcoming.posts [:tr.post (enlive/nth-of-type 1)]]    ; Selector
  [{:keys [title link subreddit schedule _id tz_offset]}] ; Arguments

  [:.title :a] (enlive/do->
                (enlive/content title)
                (enlive/set-attr :href link))
  [:.subreddit] (enlive/content subreddit)
  [:.date] (enlive/content (helpers/format-date schedule tz_offset))
  [:.edit-link] (enlive/set-attr :href (str "/post/edit/" _id "/"))
  [:form] (enlive/do->
            (enlive/set-attr :action (str "/post/delete/" _id "/"))
            (enlive/set-attr :method "post")))

(enlive/defsnippet upcoming-post-table "templates/post-index.html" [:.upcoming.posts]
  [posts]
  [:tbody] (enlive/content (map upcoming-post-row posts)))

The first defsnippet call creates a function upcoming-post-row that uses the element identified by .upcoming.posts tr.post:nth-of-type(1), from the post-index.html template, and modifies it in the way described when called with the given arguments.

This is a very handy way to do templating, if you’ve never tried it. I can’t recommend it enough.

Reddit API

There’s nothing horribly exciting here. I use the clj-http-lite library for reasons relating to Reddit’s OAuth implementation, but this doesn’t make much difference when simply communicating with Reddit.

One thing I failed to do was to tightly-wrap clj-http-lite’s error-throwing behavior into some sort of monadic return value indicating success or failure, plus a value. Every so often an exception slips out that bites me.

Analysis

This module pulls some statistics out of reddit’s api to inspect the posting times and dates of popular posts, to power the analysis page.

Given a subreddit, 5 pages of 100 posts each, sorted by most popular and from the last month, are fetched. The day and time of thier postings are binned to produce the histograms on said page. Results are cached for a day to limit api requests, because this is a fairly slow operation (the fetching, that is).

Auth

OAuth was a total bitch to get working here. I had to change http libraries and do a whole crapload of testing. Not impressed.

I ended up using oauthentic, which does the minimum an oauth library possibly could.

I hated this part a lot, but at least the auth burden on the app’s end is lightened substantially now that it’s working.

Forms

These forms use the monads-for-idiots form validation technique I’ve used before. Before being saved, posts are run through this gauntlet:


(defn clean-id [data]
  (if (nil? (:id data))
    [data nil]
    [(assoc data :_id (mongo/object-id (:id data))) nil]))

(defn clean-type [data]
  (if (= (:type data) "link")
    [(dissoc data :text) nil]
    [(dissoc data :link) nil]))


(defn clean-offset [data]
  (try
    (let [offset (:tz_offset data)
          hours (if (empty? offset) 0 (Double/parseDouble offset))
          ms (int (* hours -3600000))]
      [(assoc data :tz_offset ms) nil])
    (catch Exception e [nil "The timezone went funny somehow..."])
    ))

; ... Many more

(defn clean [data usr]
  (->> [(select-keys data post-keys) nil]
       (err-bind clean-id)
       (err-bind clean-type)
       (err-bind clean-offset)
       (err-bind clean-schedule)
       (err-bind clean-submission)
       (err-bind clean-sendreplies)
       (err-bind (assert-not-empty :subreddit))
       (err-bind (assert-not-empty :title))
       (err-bind (check-schedule usr))
       (err-bind (check-owner usr))
       ))

Hasn’t been a problem yet! It’s verbose, but complete and explicit.

Data

Since there are posts and users, the data module contains, mostly, the functions get-user, put-user!, get-post!, put-post!, and delete-post!.

As is usually the case when I start writing something in Clojure, for a long time these functions referred not to a database but just to in-memory atoms. Eventually, though, I had to put a persistant datastore in.

Worker

Finally, the money shot! The worker module gets spun up by core on start-up, via this function:


; Channel, for queuing posts
(def posts (lamina/channel))

(defn next-post
  "Fetch the most recent post from the channel and re-load it from Mongo for freshness"
  []
  (let [post @(lamina/read-channel posts)]
    (mongo/fetch-one :posts :where {:_id (:_id post)})))

(defn consume-post
  "Consume a single post from the queue, re-queue it if it is not scheduled,
  and handle deletion events"
  []
  (let [post (next-post)]
    (try
      (if (do-post? post)

        ; Yes: post it!
        (let [post (post-to-reddit! post)]
          (prn "Attempting to Post " (:_id post))
          (if (or (:posted post) (>= 3 (or (:tries post) 0)))
            nil ; Success
            (enqueue-post post))) ; Failure: Requeue

        ; No: don't post
        (if (post-deleted? post)
          (swap! deleted disj (:_id post)) ; Remove from set
          (enqueue-post post))) ; Re-enqueue

      (Thread/sleep 3000) ; Sleep for 3 seconds 
      (catch Exception e (println e)))
    ))

(defn posts-subscriber []
  (doall (repeatedly consume-post)))

(defn upcoming-posts []
  (mongo/fetch :posts :where {:posted {:$ne true}
                              :$or [{:tries {:$exists false}}
                                    {:tries {:$lt 3}}]}))

(defn start-posts-subscriber! []
  ; Queue up all unposted posts
  (let [post-objs (upcoming-posts)]
    (doall (map enqueue-post post-objs)))

  ; Start the thread
  (.start (Thread. posts-subscriber)))

This fetches all unposted posts from mongo and adds them to the lamina queue that drives the whole thing. consume-post just grabs the next post, attempts to post to reddit, and handles any errors or issues. The Thread/sleep at the end causes it to crawl through at the leisurely pace of one post per 3 seconds, so RedditLater doesn’t quite have to-the-second resolution.

Whenever a post is saved, it’s added to the post queue as well as mongo. There are all sorts of race conditions that can happen here, but the big one is deletion. To handle that case, a separate set of deleted post ids lives in memory. This is checked before any post is posted.

To assuage other timing issues, the post is fetched fresh from the database every attempt. There’s still the possibility that someone will save their post between when it gets posted and when this fact is written to the database, but nobody uses RedditLater as rapidly or often as that would require.

And that’s it! Using these simple tools, RedditLater has been running mostly-happily for a solid year, and should continue to do so for many more.