Aug 30, 2013

Top Github Languages for 2013 (so far)

I just discovered the Github Archive, a dataset of Github events queryable using Google BigQuery. What fun! So I decided to count how many repositories have been created this year by language.

SELECT repository_language, count(repository_language) AS repos_by_lang
FROM [githubarchive:github.timeline]
WHERE repository_fork == "false"
AND type == "CreateEvent"
AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('2013-01-01 00:00:00')
AND PARSE_UTC_USEC(repository_created_at) < PARSE_UTC_USEC('2013-08-30 00:00:00')
GROUP BY repository_language
ORDER BY repos_by_lang DESC
LIMIT 100

The results:

Top 20 Languages for 2013

By # of repositories created on Github so far this year:

RankLanguage# Repositories Created
1JavaScript264131
2Ruby218812
3Java157618
4PHP114384
5Python95002
6C++78327
7C67706
8Objective-C36344
9C#32170
10Shell28561
11CSS17813
12Perl15412
13CoffeeScript11133
14VimL7857
15Scala6918
16Go6884
17Prolog5829
18Clojure4904
19Haskell4681
20Lua4048

Commentary

Hey, Clojure cracked the top 20! It's neck-and-neck with Haskell, too.

The top 10 are no surprise at all, although there are definitely some specifics from Github's early popularity with the Ruby crowd, and a general skew towards web languages.

The high positions of Shell and VimL are pretty odd, but can be explained by people putting their dotfiles on github.

Prolog is a big surprise here. If anyone can explain that, I'd be interested.

Maybe we could learn more if we had the 2012 rankings for the same period (Jan 1 - Aug. 30). So here are those:

SELECT repository_language, count(repository_language) AS repos_by_lang
FROM [githubarchive:github.timeline]
WHERE repository_fork == "false"
AND type == "CreateEvent"
AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('2012-01-01 00:00:00')
AND PARSE_UTC_USEC(repository_created_at) < PARSE_UTC_USEC('2012-08-30 00:00:00')
GROUP BY repository_language
ORDER BY repos_by_lang DESC
LIMIT 100

Top 20 in 2012

By # of repositories created on Github from Jan. 1 through Aug. 30, 2012

RankLanguage# Repositories Created
1Ruby344825
2JavaScript296564
3Java265223
4C212393
5PHP173938
6Python173727
7C++93764
8Shell72006
9Perl48620
10C#43665
11Objective-C41536
12VimL18077
13Go16224
14CoffeeScript15722
15Scala14262
16Haskell10402
17Clojure9748
18Tcl9633
19Emacs Lisp8567
20Groovy6973

I'm not sure if I trust the raw numbers here being so much less than in 2013, but the rankings are hopefully accurate.

Some highlights:

  • Perl appears to have suffered a drop in 2013 compared to 2012
  • Tcl appears out of nowhere in 2012. Maybe a quirk of the language recognition Github applies?
  • Groovy went away in 2013 (actually, dropped to 22)
  • Go was more popular than Scala in 2012, but less in 2013. I compare those two because I think people are using them to solve similar problems.
  • CSS showed up nowhere in 2012

Well, that's all the analysis I care to do today, but I submit this data for discussion. Who else has opinions?

Oh, before I go:

The Full Results (i.e. the top 100)

2013

RankLanguage# Repositories Created
1JavaScript264131
2Ruby218812
3Java157618
4PHP114384
5Python95002
6C++78327
7C67706
8Objective-C36344
9C#32170
10Shell28561
11CSS17813
12Perl15412
13CoffeeScript11133
14VimL7857
15Scala6918
16Go6884
17Prolog5829
18Clojure4904
19Haskell4681
20Lua4048
RankLanguage# Repositories Created
21Puppet3437
22Groovy3372
23R2980
24Emacs Lisp2919
25ActionScript2413
26Matlab2395
27Arduino2238
28Erlang2061
29OCaml2049
30Visual Basic1854
31ASP1268
32Processing1207
33Common Lisp1153
34Assembly1051
35Logos1027
36TypeScript972
37Dart950
38D936
39Delphi901
40Scheme882
RankLanguage# Repositories Created
41FORTRAN794
42PowerShell771
43XML632
44Racket610
45Elixir573
46ColdFusion507
47XSLT496
48Apex484
49F#473
50Haxe455
51Verilog444
52Julia387
53Tcl338
54AutoHotkey338
55Vala321
56VHDL313
57Rust282
58LiveScript192
59SuperCollider151
60Standard ML139
RankLanguage# Repositories Created
61AppleScript121
62DOT118
63Ada109
64Coq99
65OpenEdge ABL86
66Gosu76
67Pure Data73
68Smalltalk63
69Kotlin61
70Lasso57
71Eiffel55
72Io53
73M53
74XQuery52
75Nemerle49
76Scilab44
77Objective-J43
78Awk42
79Slash38
80XProc35
RankLanguage# Repositories Created
81Xtend33
82Nimrod31
83CLIPS24
84Boo24
85Ceylon23
86ooc22
87MoonScript22
88DCPU-16 ASM19
89Rebol17
90Factor17
91Ragel in Ruby Host15
92Bro14
93Dylan13
94Monkey12
95Nu11
96Arc10
97Augeas9
98PogoScript8
99Turing6
100XC5

2012

RankLanguage# Repositories Created
1Ruby344825
2JavaScript296564
3Java265223
4C212393
5PHP173938
6Python173727
7C++93764
8Shell72006
9Perl48620
10C#43665
11Objective-C41536
12VimL18077
13Go16224
14CoffeeScript15722
15Scala14262
16Haskell10402
17Clojure9748
18Tcl9633
19Emacs Lisp8567
20Groovy6973
RankLanguage# Repositories Created
21Lua6474
22Erlang5784
23ActionScript4777
24Puppet3926
25R3386
26Matlab2828
27D2740
28Common Lisp2529
29Arduino2459
30Assembly1882
31Visual Basic1821
32Vala1614
33Scheme1565
34Delphi1370
35OCaml1330
36Smalltalk1313
37FORTRAN1269
38Dart1174
39ASP1042
40HaXe983
RankLanguage# Repositories Created
41ColdFusion966
42Prolog956
43F#670
44PowerShell652
45Racket614
46CSS530
47Verilog523
48VHDL473
49Eiffel406
50Parrot270
51Apex265
52AutoHotkey258
53Rust234
54Scilab230
55DCPU-16 ASM229
56XML206
57Elixir189
58Ada182
59Coq174
60XQuery155
RankLanguage# Repositories Created
61Julia151
62Pure Data147
63SuperCollider131
64Standard ML127
65XSLT102
66Kotlin98
67Powershell93
68Io92
69Objective-J84
70TypeScript81
71OpenEdge ABL76
72Nemerle61
73AppleScript57
74Haxe54
75Gosu47
76Factor44
77Logos43
78Processing40
79Logtalk34
80Dylan34
RankLanguage# Repositories Created
81Nimrod32
82Ceylon32
83ooc30
84Opa30
85Boo27
86Fancy26
87Turing26
88Mirah22
89Max/MSP21
90Bro17
91Xtend14
92Rebol13
93LiveScript12
94Lasso11
95Arc11
96Augeas8
97DOT6
98Fantom5
99Awk5
100Max4

Disclaimer

Here are a lot of reasons why analysing Github data might not be accurate:

  • Who knows how accurate the Github Archive is
  • Github users/open source projects are not a representative demographic of all programmers or programming everywhere
  • Maybe I screwed up the query

Further Reading