Top Github Languages for 2013 (so far)
I just discovered the Github Archive, a dataset of Github events queryable using Google BigQuery. What fun! So I decided to count how many repositories have been created this year by language.
SELECT repository_language, count(repository_language) AS repos_by_lang
FROM [githubarchive:github.timeline]
WHERE repository_fork == "false"
AND type == "CreateEvent"
AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('2013-01-01 00:00:00')
AND PARSE_UTC_USEC(repository_created_at) < PARSE_UTC_USEC('2013-08-30 00:00:00')
GROUP BY repository_language
ORDER BY repos_by_lang DESC
LIMIT 100
The results:
Top 20 Languages for 2013
By # of repositories created on Github so far this year:
Commentary
Hey, Clojure cracked the top 20! It’s neck-and-neck with Haskell, too.
The top 10 are no surprise at all, although there are definitely some specifics from Github’s early popularity with the Ruby crowd, and a general skew towards web languages.
The high positions of Shell and VimL are pretty odd, but can be explained by people putting their dotfiles on github.
Prolog is a big surprise here. If anyone can explain that, I’d be interested.
Maybe we could learn more if we had the 2012 rankings for the same period (Jan 1 - Aug. 30). So here are those:
SELECT repository_language, count(repository_language) AS repos_by_lang
FROM [githubarchive:github.timeline]
WHERE repository_fork == "false"
AND type == "CreateEvent"
AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('2012-01-01 00:00:00')
AND PARSE_UTC_USEC(repository_created_at) < PARSE_UTC_USEC('2012-08-30 00:00:00')
GROUP BY repository_language
ORDER BY repos_by_lang DESC
LIMIT 100
Top 20 in 2012
By # of repositories created on Github from Jan. 1 through Aug. 30, 2012
I’m not sure if I trust the raw numbers here being so much less than in 2013, but the rankings are hopefully accurate.
Some highlights:
- Perl appears to have suffered a drop in 2013 compared to 2012
- Tcl appears out of nowhere in 2012. Maybe a quirk of the language recognition Github applies?
- Groovy went away in 2013 (actually, dropped to 22)
- Go was more popular than Scala in 2012, but less in 2013. I compare those two because I think people are using them to solve similar problems.
- CSS showed up nowhere in 2012
Well, that’s all the analysis I care to do today, but I submit this data for discussion. Who else has opinions?
Oh, before I go:
The Full Results (i.e. the top 100)
2013
2012
Disclaimer
Here are a lot of reasons why analysing Github data might not be accurate:
- Who knows how accurate the Github Archive is
- Github users/open source projects are not a representative demographic of all programmers or programming everywhere
- Maybe I screwed up the query