Robot Has No Heart

Xavier Shay blogs here

A robot that does not have a heart

Exploring data with Clojure, Incanter, and Leiningen

I’m working through Machine Learning in Action at the moment, and it’s done in Python. I don’t really know Python, but I’d prefer to learn Clojure, so I’m redoing the code samples.

This blog posts show how to read a CSV file, manipulate it, then graph it. Turns out Clojure is pretty good for this, in combination with the Incanter library (think R for the JVM). It took me a while to get an environment set up since I’m unfamiliar with basically everything.

Install Clojure

I already had it installed so can’t remember if there were any crazy steps to get it working. Hopefully this is all you need:

1
sudo brew install clojure

Install Leiningen

Leiningen is a build tool which does many things, but most importantly for me is it manages the classpath. I was jumping through all sorts of hoops trying to get Incanter running without it.

There are easy to follow instructions in the README

*UPDATE: * As suggested in the comments, you can probably just `brew install lein` here and that will get you Leiningen and Clojure in one command.

Create a new project

1
lein new hooray-data && cd hooray-data

Add Incanter as a dependency to the project.clj file, and also a main target:

1
2
3
4
5
6
(defproject clj "1.0.0-SNAPSHOT"
  :description "FIXME: write"
  :dependencies [[org.clojure/clojure "1.2.0"]
                 [org.clojure/clojure-contrib "1.2.0"]
                 [incanter "1.2.3-SNAPSHOT"]]
  :main hooray_data.core)

Add some Incanter code to src/hooray_data/core.clj

1
2
3
4
5
6
(ns hooray_data.core
  (:gen-class)
  (:use (incanter core stats charts io datasets)))

(defn -main [& args]
  (view (histogram (sample-normal 1000)))

Then fire it up:

1
2
lein deps
lein run

If everything runs to plan you’ll see a pretty graph.

Code

First, a simple categorized scatter plot. read-dataset works with both URLs and files, which is pretty handy.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
(ns hooray_data.core
  (:use (incanter core stats charts io)))

; Sample data set provided by Incanter
(def plotData (read-dataset 
            "https://raw.github.com/liebke/incanter/master/data/iris.dat" 
            :delim \space 
            :header true))

(def plot (scatter-plot
            (sel plotData :cols 0)
            (sel plotData :cols 1)
            :x-label "Sepal Length"
            :y-label "Sepal Width"
            :group-by (sel plotData :cols 4)))

(defn -main [& args]
  (view plot))

Second, the same data but normalized. The graph will look the same, but the underlying data is now ready for some more math.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
(ns hooray_data.core
  (:use (incanter core stats charts io)))

; Sample data set provided by Incanter
(def data (read-dataset 
            "https://raw.github.com/liebke/incanter/master/data/iris.dat" 
            :delim \space 
            :header true))

(defn extract [f]
  (fn [data]
     (map #(apply f (sel data :cols %)) (range 0 (ncol data)))))

(defn fill [n row] (map (fn [x] row) (range 0 n)))

(defn matrix-row-operation [operand row matrix] 
  (operand matrix 
    (fill (nrow matrix) row)))

; Probably could be much nicer using `reduce`
(defn normalize [matrix]
  (let [shifted (matrix-row-operation minus ((extract min) matrix) matrix)]
   (matrix-row-operation div ((extract max) shifted) shifted)))

(def normalized-data
  (normalize (to-matrix (sel data :cols [0 1]))))

(def normalized-plot (scatter-plot
            (sel normalized-data :cols 0)
            (sel normalized-data :cols 1)
            :x-label "Sepal Length"
            :y-label "Sepal Width"
            :group-by (sel data :cols 4)))

(defn -main [& args]
  (view normalized-plot))

I was kind of hoping the normalize function would have already been written for me in a standard library, but I couldn’t find it.

I’ll report back if anything else of interest comes up as I’m working through the book.

A pretty flower Another pretty flower