August 30, 2013

Top Github Languages for 2013 (so far)

I just discovered the Github Archive, a dataset of Github events queryable using Google BigQuery. What fun! So I decided to count how many repositories have been created this year by language.

SELECT repository_language, count(repository_language) AS repos_by_lang
FROM [githubarchive:github.timeline]
WHERE repository_fork == "false"
AND type == "CreateEvent"
AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('2013-01-01 00:00:00')
AND PARSE_UTC_USEC(repository_created_at) < PARSE_UTC_USEC('2013-08-30 00:00:00')
GROUP BY repository_language
ORDER BY repos_by_lang DESC
LIMIT 100

The results:

Top 20 Languages for 2013

By # of repositories created on Github so far this year:

Commentary

Hey, Clojure cracked the top 20! It’s neck-and-neck with Haskell, too.

The top 10 are no surprise at all, although there are definitely some specifics from Github’s early popularity with the Ruby crowd, and a general skew towards web languages.

The high positions of Shell and VimL are pretty odd, but can be explained by people putting their dotfiles on github.

Prolog is a big surprise here. If anyone can explain that, I’d be interested.

Maybe we could learn more if we had the 2012 rankings for the same period (Jan 1 - Aug. 30). So here are those:

SELECT repository_language, count(repository_language) AS repos_by_lang
FROM [githubarchive:github.timeline]
WHERE repository_fork == "false"
AND type == "CreateEvent"
AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('2012-01-01 00:00:00')
AND PARSE_UTC_USEC(repository_created_at) < PARSE_UTC_USEC('2012-08-30 00:00:00')
GROUP BY repository_language
ORDER BY repos_by_lang DESC
LIMIT 100

Top 20 in 2012

By # of repositories created on Github from Jan. 1 through Aug. 30, 2012

I’m not sure if I trust the raw numbers here being so much less than in 2013, but the rankings are hopefully accurate.

Some highlights:

  • Perl appears to have suffered a drop in 2013 compared to 2012
  • Tcl appears out of nowhere in 2012. Maybe a quirk of the language recognition Github applies?
  • Groovy went away in 2013 (actually, dropped to 22)
  • Go was more popular than Scala in 2012, but less in 2013. I compare those two because I think people are using them to solve similar problems.
  • CSS showed up nowhere in 2012

Well, that’s all the analysis I care to do today, but I submit this data for discussion. Who else has opinions?

Oh, before I go:

The Full Results (i.e. the top 100)

2013

2012

Disclaimer

Here are a lot of reasons why analysing Github data might not be accurate:

  • Who knows how accurate the Github Archive is
  • Github users/open source projects are not a representative demographic of all programmers or programming everywhere
  • Maybe I screwed up the query