March 27, 2012

Utility scripting with Clojure, as a Newbie

Disclaimer: I have been using clojure for like 2 weeks, I'm no expert. I write this in hopes that there is benefit from this perspective.

In this post, I will outline the development of a small utility script in Clojure, in the hopes that, I don't know, it's helpful to you.

Why write your little utility scripts in Clojure? The REPL, stupid! Or more to the point, the integration between your editor (vim, for me) and the Clojure REPL served via Swank.

Setting Up

Lein is a great tool, but it's also picky. It wants you to have a project directory, complete with project.clj, before you can use "lein swank". So, if you want to use Clojure to write little scripts that don't really deserve their own project, I recommend you just create a single project that you'll subsequently use to provide the environment for all your scripts. Easy.

> lein new scratch

This will set up a project called "scratch" in wherever you happen to be. Then, start swank and we can get scripting.

> lein swank

Writing the script

I am not one to spend a lot of time planning -- I prefer to dive in and get a proper feel for the problem. That may be the reason for my newfound enthusiasm.

When you -- when I -- write scripts using Clojure, I write it in little bits, usually from the inside out. In this case, I created a utility script to

  1. Crawl a directory structure given a root, and find the paths to all files ending in ".m"
  2. Filter out duplicates based on filename alone
  3. Return a sorted list of duplicate pathnames, with duplicated files together

How to start? Easy: each of those represents a function. Write a bit of code, evaluate it using swank, see how the result turns out. Do that for a while, and you have a program.

In contravention of custom, I use Vim, not Emacs, to edit my Clojure code. Using Slimv, ",c" connects to swank, ",e" evaluates the current form. There's not much else to know.

The Final Code

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
(ns find-m-files
; Wrap methods
(defn is-file? [f] (.isFile f))
(defn is-dir? [f] (.isDirectory f))
(defn get-name [f] (.getName f))
(defn get-path [f] (.getPath f))
(defn get-m-files [d]
"Recursively get a list of File objects for files ending in .m in the given directory"
(let [ls (if (is-dir? d) (.listFiles d) nil)
files (filter is-file? ls)
dirs (filter is-dir? ls)]
(if (nil? ls)
(flatten (concat
(filter (fn [f] (re-matches #".*\.m" (get-name f))) files) ; .m files
(map get-m-files dirs))))))
(defn dupes [s]
"Get a list of the duplicate elements in [s]. That list may itself contain duplicates."
(empty? s) nil ; Break recursion
(> (.indexOf (rest s) (first s)) -1) (cons (first s) (dupes (rest s)))
true (recur (rest s))))
(defn duplicate-filenames [d]
"Get a list of the paths to all filenames that are duplicated in the given directory."
(let [files (get-m-files (File. d))
names (map get-name files)
duped-names (set (dupes names))
duped-files (filter (fn [f] (contains? duped-names (get-name f))) files)]
(map get-path
(sort #(compare (get-name %1) (get-name %2)) duped-files))))

The code may look more complex than, say, python, but it wasn't written all at once. Lisp programs, as you may have heard and as I will verify, tend to grow almost organically, with many forms written and tested indepently before being assembled into something working.

Another difference from Python: In python, there's about 1 way to do this, with minor variations. In Lisp (clojure, that is), there are probably 1000.

Also, if you have a way to write the "dupes" function without the non-tail recursion there, tell me it. I'm still new at this.