August 1, 2014

Top Github Languages of 2014 (So far)

It’s that time of year again! Today, we’ll look at just over a half-year’s worth of Github data to draw unsubstantiated conclusions about the relative popularity of programming languages. Ok let’s go!

Showing my work

As I did last year, I used Google BigQuery to get data from the Github Archive. Here’s the query for this year:

SELECT repository_language, count(repository_language) AS repos_by_lang
FROM [githubarchive:github.timeline]
WHERE repository_fork == "false"
AND type == "CreateEvent"
AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('2014-01-01 00:00:00')
AND PARSE_UTC_USEC(repository_created_at) < PARSE_UTC_USEC('2014-07-31 00:00:00')
GROUP BY repository_language
ORDER BY repos_by_lang DESC
LIMIT 100

I also made the same query for the same period in 2013 and 2012, so that we’d have something to compare our results to.

Then, I did some Python munging, which for your benefit I’ve tossed in a gist. I wanted to pull out the ranks to compare between years, and also the raw numbers, for all languages which have cracked the top 20 since 2012.

Results

So, here are the results.

Keep in mind that these results are surely skewed, since the rely heavily on the quality of Github’s repo language assignment heuristics. In fact, given the huge variation between years, this is probably responsible for several languages’ presence on the list.

Some of the more notable points:

  • The jumps for CSS, R, and TeX, and the fall of Tcl and Prolog, can probably be ascribed to bugs and/or improvements to Github’s heuristics for detecting languages.

  • After JavaScript, the next language demonstrating the expected monotonic year-over-year increases is Objective-C. Again, highly suspicious.

  • Since last year, C has seen big jumps, while Python, PHP, and C++ suffered for it. C actually gained almost 100,000 extra repositories created this year, although it might have stolen some of those from C++

  • Java was the biggest gainer, jumping up by almost 100,000 repos, although it apparently lost

  • People like to put their Vim and Emacs config files on Github.

  • The rise of Lua is pretty interesting. I wonder if there’s some major project or product I don’t know about driving that.

Looking just at the numbers for 2014 (And ignoring languages that aren’t really programming languages), you can see some clear “tiers”. JavaScript is in the 300k+ tier by itself, while Java and Ruby share the 200k. C, and PHP have nearly the same count, around 170k, while Python is close enough to be lumped in with those.

After that is C++, Objective-C, C#, a trio of C-variants rounding off the top ten at around 60-80k. (please keep pedantry about C#’s lineage in a single comment thread).

Shell is off by itself, head of the minor-league languages. R and Coffeescript are 20k+, Go and Perl hover lonelily around 16k and 13k respectively, and then Scala, Lua, Clojure, and Haskell occupy a continuum of 7-10k languages.

But that’s enough of my Ouija-board ramblings. What does your confirmation bias tell you about this data?

Other Resources

For those of you making serious decisions on the basis of this analysis, I recommend also checking out RedMonk’s rankings for January 2014, and for something more considered when choosing which language to use, how about Thoughtworks’ Technology Radar