February 20, 2016

Plotting Reddit post frequencies with d3

This week I added a better graph to Later (for Reddit)’s post timing analysis page. Previously, the most common time and the most common day were plotted on separate bar charts. However, I noticed that I got called out on this in this article on Medium, which offered a pretty sensible alternative that I wasted no time in ripping off wholesale. I still have the bar graphs in, but I also added a 7x24 color-coded grid plot, allowing for a more accurate view of exactly when the most popular posts were posted.

I couldn’t find an out-of-the-box way to do exactly this, so I spent an evening doing up a custom chart using D3.js. I have a weird relationship with D3 – it’s super powerful and does a great job at a variety of tasks, but I’ve simply never quite grokked how to work with it well enough to get anything done without substantial experimentation. Nevertheless, I get a bit closer every time. Here’s how I made the chart.

The data

I won’t go into too much detail about how the data is fetched (hint: It’s from the /<subreddit>/top.json?t=month enpoint), but I will say that for the purposes of this chart, it’s just a 168-element array of data with 24 datapoints for each day of the week (7x24 = 168).

Tweakable parameters

The first thing in the code is a set of parameters that define the dimensions of the chart. It’s a static 700x200, none of this responsive nonsense. For the box width and height calculation, I fudged a bit by dividing by 25 (instead of 24) for the width and 8 (instead of 7) for the height, to allow some margin space.

(Oh yeah, I used coffeescript too. It should read like javascript but better.

render_calendar_chart = (data) ->
    width = 700
    height = 200
    top_margin = 50
    left_margin = 130
    box_w = ((width - left_margin) / 25) + 1
    box_h = ((height - top_margin) / 8) + 1

Setting up the colors

I spent some time googling around on this, but it turned out to be really, really easy. You can create a scale that maps the extent of your data to a gradient between two colors like this:

heatmap = d3.scale.linear()
    .domain(d3.extent(data))
    .range(["#cccccc", "#4989AF"])

In the above, heatmap is a function that accepts a value (in this case, a number from the dataset) and returns a color. Later, this is used directly with the fill attribute of each square to assign it a color.

Configuring the grid scale

It’s pretty easy to see in advance that we’re going to want to map the index of our data to some x- and y-position. The most d3-y way to do this is to create some scale functions that we can use to map the data index to the offset.

Our x scale will have a domain of [0, 24) (for hours of the day), with bands of size (box_w + 1) (the 1 is the margin between boxes). So, we represent our xScale like so:

    xScale = d3.scale.ordinal()
        .domain(d3.range(0, 24, 1))
        .rangeBands([0, (box_w + 1) * 24])

The y scale is very similar, but for [0, 7) and using the height instead of the width:


    yScale = d3.scale.ordinal()
        .domain(d3.range(0, 7, 1))
        .rangeBands([0, (box_h + 1) * 7])

Creating the grid

The chart itself will be rendered with svg, which makes everything a bit easier for us. First, we need to create an svg element; in d3, that looks like this:

    chart = d3.select('#grid')
        .append('svg')
        .attr('class', 'chart')
        .attr('width', width)
        .attr('height', height)

Congrats, there’s a widthxheight svg element within your #grid now. Now, we need to render all the days. The trick here is to correctly compute the (x, y) offset of each rectangle. I wanted them to be spacted 1 apart, and arranged in 7 rows and 24 columns. Since the data was just a big list of numbers, this means some modulo math was required to “re-shape” our data:

  • We want to pass the index modulo 24 to the xScale
  • We want to pass Math.floor of the index / 24 to the yScale.

To position the boxes, I used the above formulas along with the transform svg attribute, for which the value is translate(<dx>, <dy>). This probably could have been done better with d3 scales, but as I said, I don’t have a ton of experience.

An aside about how d3 works: first, we create a selection by using the .selectAll("rect") method on the chart. Then, we bind the data to that selection using .data(data). Finally, we use the ever-mysterous enter method.

Enter lets you assign properties only to new (“entering”) elements of your graphic. It will also let you create elements, as I’ve done by calling .append("rect") after the .enter call. More sophisticated users can use this to add fancy animations and whatnot, but I was satisfied with this:


    chart.selectAll("rect")
        .data(data)
        .enter().append("rect")
            .attr("width", box_w)
            .attr("height", box_h)
            .attr("fill", heatmap)
            .attr("transform", (_, ii) ->
                dx = xScale(ii % 24) + left_margin
                dy = yScale(Math.floor(ii / 24)) + top_margin
                "translate(#{dx}, #{dy})")
            .append("svg:title")
            .text((d) -> "#{d}")

The calls to attr can accept either a literal value (e.g. box_w, box_h above), or a function that will be called with the data point’s value, and its index. By providing the heatmap function for the fill attribute, each datapoint is automatically mapped to a color. Neat!

Adding axes

Representing the bulk of the code is the creation of the hour- and day- axes and labels.

We can create axes from scales very easily. Since we want the labels to just be the numbers for the x (hours) axis, we can re-use xScale.

    xAxis = d3.svg.axis()
        .scale(xScale)
        .orient("top")

For the y axis, we want the days of the week as the labels, so we recreate yScale with a different domain before creating the axis.

    yAxisScale = d3.scale.ordinal()
        .domain(["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"])
        .rangeBands([0, (box_h + 1) * 7])

    yAxis = d3.svg.axis()
        .scale(yAxisScale)
        .orient("left")

Now, we can create graphics objects for the axes and append them to the chart, with a little tweaking to position and style them:

    chart.append("g")
        .attr("class", "y axis")
        .attr("transform", "translate(#{left_margin}, #{top_margin})")
        .attr("fill", "#666666")
        .call(yAxis)

    chart.append("g")
        .attr("class", "x axis")
        .attr("transform", "translate(#{left_margin}, #{top_margin})")
        .attr("fill", "#666666")
        .call(xAxis)

Finally, a label for the x axis (the y-axis is self-explanatory):

    chart.append("text")
        .attr("text-anchor", "left")
        .attr("transform", "translate(#{left_margin}, 15)")
        .text("Hour")

Et voila! We have a fully-functional (if not gorgeous) chart to work with.

A special message

This is where the affiliate links live, but hear me out! I use these two services every day, and I wouldn't recommend them if I wasn't satisfied.

DigitalOcean - Purveyors of fine (and inexpensive) virtual servers. I use DigitalOcean to host Address Bin and a few others; it's my go-to host. Use this referral link for a $10 credit.

AirBnb - I've been living in AirBnbs for over a year now, and plan to for many more. If you've ever wanted to try them out, you can get a $25 discount from this referral link.