October 30, 2014

What language should you learn?

I travel a lot these days. I’d call myself a “digital nomad” as a shorthand, if there was any way to say it without sounding impossibly smug. Let’s just say I’m homeless but employed and my wife and I live in AirBnbs.

One of the challenges of moving around so much is dealing with language barriers. For the most part, even in places where English isn’t widely understood, it’s perfectly possible to get whatever you need with gestures, chief among them pointing and holding up money. It’s the little things that are harder when you can’t speak the language.

By way of example, I spent much of today wandering the streets of Istanbul in search of somewhere I could buy a simple envelope, because it turns out that without a Staples around I’m completely incapable of purchasing office supplies. I bet somewhere there’s a whole bazaar full of old men with long grey beards flaunting staplers and paper clips – that’s how it seems to go here – but I didn’t happen across anyone who spoke enough English to ask.

So, I really prefer having at least some grasp of the language of wherever I’m going, but learning a language is pretty tough work, so being a nerd as well as a clueless putz I decided today to compute emperically which languages are the best to learn, if one were to, hypothetically, travel around randomly on the basis of where has the cheapest airline fares from here.

Get to the point, dammit.

I’ve been wondering this for a little while, but it turns out to be a trending topic on Quora today too, so I thought I might as well put the effort in and just get some numbers. Specifically, one of the answers on Quora linked to [this page, listing the most widely spoken languages in the world along with which countries they’re spoken in](http://en.wikipedia.org/wiki/Listofmostwidelyspokenlanguages(bynumberof_countries%29). It might seem that most of the work is done here, but the problem not addressed by this list is that of overlap – if you came to Canada after learning French on the strength of this list, you’d probably be mighty disappointed.

The part of the process where we debate the merits of different ways of measuring the number of speakers of a language in which country was definitely something I wanted to avoid, so this is perfect. We have a simple job: take this data, and figure out which sets of N languages cover the most countries.

Source Code

If you’re interested in the code, I’ve put the commented clojure source in this gist for your perusal.

The Results + Discussion

If you can only learn two languages, you should learn English and French. Here are the top pairings:

Pairing# Countries
English, French95
Arabic, English91
English, Turkic90
English, Spanish89
English, Portuguese79
English, Russian79
Persian, English77
German, English75
Italian, English74
Dutch, English73
Chinese, English71
Indonesian, English71
Tamil, English71
English, Swedish70
English, Romanian70
Bengali, English69
English, Hindi69
Turkic, French54

I took an arbitrary sample there because it’s interesting to me that the top pairing without English (Turkic + French) gets you by in significantly fewer countries than just English. Lucky us.

Boringly, the top result is what you would predict from the Wikipedia article anyhow. I thought there might be more overlap between English and French, but perhaps that’s just because I’m so used to it being Canadian. Actually, most of the results are just English + (other languages in descending order of speakers).

However, this is good news for us native-english-speakers: French and English actually overlap a lot linguistically. About 30% of English words have French roots.

What about learning three languages? Perhaps the results of that will be less boring. If you’re more on-the-ball, here’s how you’ll do:

Languages# Countries
English, Turkic, French117
English, Spanish, French115
Arabic, English, French114
Arabic, English, Spanish112
English, Spanish, Turkic111
Arabic, English, Turkic111
English, Russian, French106
English, Portuguese, French105
Persian, English, French104
Arabic, English, Portuguese102
English, Turkic, Portuguese101
Arabic, English, Russian101
English, Russian, Spanish100
Italian, English, French100

There you have it: your third language should be Turkic. It makes sense, given the small overlap between Arabic and French in northwest Africa.

I’m most intrigued by the English-Spanish-French pairing, actually. There’s a lot of overlap between Spanish and French too, so this is almost certainly the easiest triple to learn for native English speakers.

So there you have it: learn you some French. Bonne chance, et au revoir!