Determining which languages are related to which is significantly more difficult than you might expect.
English, for example. Most people know that it's a Germanic language, related not only to German but to Dutch, Flemish, and (more distantly), to the Scandinavian languages. This shows most strongly in the basic vocabulary; the majority of our common verbs and nouns, as well as pronouns, prepositions, and conjunctions, have Anglo-Saxon -- i.e., Germanic -- origin.
However, consider the first sentence in this post. The words which, are, to, is, more, than, you, and might are Germanic; but the more complicated words (determining, languages, related, significantly, difficult, and expect) -- the ones that carry most of the meaning -- are all Latin in origin. So is English actually a Romance language?
It isn't, of course, but a superficial look at the language might well push you to reach the wrong conclusion. Most of our words of Greek and Latin origin were either via the Norman French spoken by the conquerors and ruling class in England who came in during the eleventh century, or later borrow-words that slipped into common parlance from their use in legal, scientific, and religious contexts. English, in fact, has borrowed words from just about every language it's contacted. A few interesting ones:
- algorithm (Arabic)
- loot (Hindi)
- torso (Italian)
- ketchup (Malay, by way of Chinese)
- easel (Dutch)
- sauna (Finnish)
- amen (Hebrew)
- chess and checkmate (Farsi)
- coffee (Turkish)
- icon (Greek, possibly via Russian)
- chocolate (Nahuatl)
- hurricane (Taino)
- tattoo (Samoan)
We English-speaking linguists are lucky, because the written records for English and its antecedents are generally excellent. We have a highly-detailed map of how the language evolved, and even in the case of borrow-words, we can often pinpoint not only where they came from, but when they entered the English language. Things are far murkier with languages that have a poorer -- or completely nonexistent -- written history. In that case, we're left with the immense task of using similarities in word roots and syntactic structure as the basis for inferring where a language fits in the overall family tree.
And sometimes even that isn't enough. There are a good number of languages for which we have been unable to establish a clear relationship to any other; these are called language isolates, and include Basque, Sandawe (a language spoken in Tanzania), Zuni, Huave (an indigenous language in Mexico), Burushaski (spoken by about 100,000 people in Pakistan), and -- amazingly -- Japanese and Korean.
In fact, it's the latter two that are why this topic comes up today. Both Japanese and Korean are of unclear relationship to each other and to the other languages in the region. The Japanese writing system is largely borrowed from Chinese; the Japanese kanji is an ideographic script that uses many identical characters to those in Chinese (although the pronunciations, and some of the meanings/connotations, are completely different). Korean writing, on the other hand, is of known provenance; the script (hangul) is a phonetic alphabet that was invented by the fifteenth-century King Sejong to give the speakers of Korean a standard, easily-learned way of writing the language.
So despite having complex and well-studied writing systems, the historical records of Japanese and Korean don't help us a lot with establishing how they fit in with other Asian language families. But some research published last week in Nature, which I found out from loyal reader of Skeptophilia Gil Miller, has proposed a solution to the mystery. Using computational analysis to map out not only the related features between the languages but their degree of separation -- analogous to the genetic bootstrap analysis used by evolutionary biologists to determine when the common ancestor between two species existed -- they figured out that not only are Japanese and Korean distantly related to each other, they're also related to Mongolian, to the Tungusic languages of eastern Siberia and Manchuria, and to... Turkish!"We have languages, archaeology and genetics which all have dates. So we just looked to see if they correlated," said study co-author Martine Robbeets, of the Max Planck Institute for the Science of Human History, in an interview with New Scientist. "We all identify ourselves with language. It’s our identity. We often picture ourselves as one culture, one language, one genetic profile. Our study shows that like all populations, those in Asia are mixed."