What science, especially natural history and genetics tell us is the near opposite of what historians and linguists have been saying for over a century. In particular, they have vastly underestimated the time scales involved by an order of magnitude.

Introduction: the two hundred year-old question

            There is now a revolution in progress in our understanding of our past. Science has finally answered the 200 year-old question of why people from India to Iceland speak languages clearly related to one another. All non-African humans and their languages can be traced to about a thousand individuals in South Asia 60,000 years ago. Two major events during the Pleistocene—a gene mutation about 80,000 years ago and a massive volcanic eruption 73,000 years ago—played a crucial role in triggering the evolution and spread of Indo-Europeans and their languages.

            It is natural history, not linguistics that has cut the Gordian Knot of Indo-European origins. Natural history and archaeology both show there were two waves of migration out of India into Eurasia and Europe during the prehistoric (c. 40,000 YBP or Years Before Present) and the proto historic (c. 10,000 YBP) periods. Further, it is Sanskrit, not any Proto-Indo-European that has left its mark on Indo-European languages. It may further be said that Sanskrit is to linguistics what mathematics has been to the natural sciences.

            With slightly less confidence it may be said that Vedas and the later Sanskrit (of the Upanishads, epics and the classical) were all products of a period of intense cultivation of language culture lasting thousands of years. They were carefully constructed by drawing upon Gauda (northern) and Dravida (southern) sources prevailing in the Indian subcontinent around 10,000 years ago if not earlier. This accounts for the so-called ‘Dravidian’ features in the Rig Veda as well as the extraordinary perfection of Sanskrit grammar. They are the product of a culture that took the science of language— etymology and grammar to heights that were never again to be attained.

            The same natural history suggests there may be a similar story of East and Southeast Asian peoples and languages— almost like a mirror reflection of the birth and spread of Indo-Europeans. It is a story that remains to be told. Thus, the picture given by science is the exact opposite of the Aryan invasion-migration theories favored by linguists for over a century. Above all it may said with confidence that historians and linguists in particular have very greatly underestimated the time spans by compressing time scales by an order of magnitude driven by the compulsion to fit history within the 6000 years mandated by the Biblical Belief in Creation.

‘Discovery’ of Sanskrit

            Unlike most academic disciplines, Indology (i.e. Western study of India) and its offshoot of Indo-European studies can be dated almost to the day. In a lecture in Kolkotta delivered on 2 February 1786 (and published in 1788) Sir William Jones, a forty year-old British jurist in the service of the East India Company observed:

            “The Sanskrit language, whatever be its antiquity, is of a wonderful structure; more perfect than the Greek, more copious than the Latin, and more exquisitely refined than either, yet bearing to both of them a stronger affinity, both in the roots of verbs and the forms of grammar, than could possibly have been produced by accident; so strong indeed, that no philologer could examine them all three, without believing them to have sprung from some common source, which, perhaps, no longer exists…”

             This influential statement is well known but not the errors Jones committed like his dating of Indian tradition based on the Biblical superstition that the world was created on Sunday, 23rd of October 4004 BCE at 9:00 AM— time zone not specified. The date was first derived by the Irish bishop James Ussher (1581 – 1656) based on a literal reading of the Bible combined with the belief that world would end 2000 years after Christ or twelve years ago.

            While it sounds comical today, it was taught as history through most of the nineteenth century even though both Darwin’s theory of evolution and geology had determined the earth had to be millions of years old to support fossils and the enormous diversity of life forms. Even this very greatly underestimated its age. (The current estimate for the age of the earth is about 4.5 billion years.)

Figure1: Biblical time line contrasted with the scientifically determined chronology

Bible as history

            Jones was a capable linguist and knew some Sanskrit. His task was to study Indian texts and understand Hindu law to help administer British justice in a manner acceptable to them. In his study of Hindu texts like the Puranas he came across dates that went much further back than the Biblical date for Creation. He dismissed them as superstitions (for failing to agree with the Biblical superstition) and imposed a chronology on Indian history and tradition to fit within the Biblical framework.

            This was to have fateful consequences for the study of India over the succeeding two centuries down to the present. To cite an example, Indian tradition going back at least to the mathematician Aryabhata (476 – 540 CE) has held that the Kali Age began with the Mahabharata War in 3102 BCE. This marks the end of an era known as the Vedic Age. Accepting it takes the beginning of the Vedic period as well as several dynasties like the Ikshwakus to 6000 BCE and earlier. This is millennia before the Biblical date for Creation which men like Jones could not accept.

                  Dates based on the Biblical chronology were accepted as historically valid by most Western scholars like F. Max Müller, the most influential of them. He explicitly stated that he took the Biblical account including the date to be historical. Most of them were classical scholars or students of religion and had no inkling of science. The widely quoted dates of 1500 BCE for the Aryan invasion and the 1200 BCE date for the Rig Veda were imposed to make them conform to the Biblical date of 4004 BCE.

            The situation has not changed much in the succeeding two centuries. Indologists like Wendy Doniger, Diana Eck, Michael Witzel and their Indian counterparts like Romila Thapar have little comprehension of the revolution in our understanding of the past brought about by science in the past two decades. They continue to quote 1200 BCE for the Rig Veda without mentioning that it rests on the authority of a 400 year-old Biblical superstition! (Some ‘scholars’ like Doniger and Thapar don’t know any Sanskrit either, but that is a different matter.) The main point is they know no more science than their predecessors of a century and more ago.

Language puzzle, linguistic inadequacy

            To return to Jones and his successors, in their ignorance of science it was natural they should have come up with some speculative theories to account for similarities between Sanskrit and European languages, especially Greek and Latin. Being linguists, they created a field called philology of comparing languages and cultures but it soon got mixed up with crackpot theories on race and language— like the ‘Aryan’ race speaking ‘Aryan’ languages somehow ending up in Nazi Germany! There was even an ‘Aryan’ science movement that demonized Einstein and his ‘Jewish’ physics! It was denounced by scientists, especially in the twentieth century, but politics and prejudice kept it alive for over a century. In addition to the Nazi ideology, British colonial policy used race as a way of classifying its British Indian subjects.

Figure2: Distribution of Indo-European languages. The blank (white) portion must be included as part of the Gauda-Dravida group greatly antedating the Indo-European family.

            Setting aside such aberrations, Jones did raise a legitimate question: why do people from India and Sri Lanka to Ireland and Iceland speak languages clearly related to one another, and have done so for more than two thousand years? This fact has been widely noted and a few examples help illustrate the point. What is deva in Sanskrit becomes dio in Latin, theo in Greek and dieu in French. Similarly, agni for fire in Sanskrit becomes ignis in Latin from which we get the English words ignite and ignition. Amusingly, the famous Russian drink vodka has its Sanskrit cognate in udaka both meaning water. And there are many more, far too many to be seen as coincidence. Prejudice and politics aside this basic question remains.

            Over the past two hundred years many theories have been created to account for these similarities. These are based mostly on superficial phonetic similarities but none has proved satisfactory. Even without the confusion due to race theories, these explanations give glaring inconsistencies. To take one example, using the same data and the same methods some scholars have argued that a branch of Indo-Europeans called ‘Aryans’ invaded India, while some others claim the reverse— that Aryans (or Indo-Europeans) originated in India and migrated to Eurasia and Europe taking their language(s) with them. The AIT of course holds the opposite view—that the invading Aryans were the eastern branch of Indo-Europeans whose original homeland was in Eurasia or Europe.

March of science

            With the benefit of hindsight one can see that the science needed to unlock the language mystery did not become available until about twenty years ago. It was only in the last few decades, with the emergence of molecular biology after World War II and especially gene sequencing and genome research in the past decade and more that we are able to trace the origin and spread of Indo-Europeans and their languages. Two areas of natural history— the distribution of mitochondrial DNA and Y-chromosomes (and haplogroups) in the world’s population groups and the fate of humans in the face of natural events have resulted in the spread of Indo-Europeans and their languages from a group of perhaps as few as a thousand 60,000 years ago well over two billion speakers today.

            What has allowed us to unlock the mysteries of IE origins is science, especially natural history and population genetics. Population genetics was founded by Sir Ronald Fisher, Sewall Wright and J.B.S. Haldane. Fisher, a geneticist as well as statistician had two outstanding students, C. Radhakrishna Rao (C.R. Rao) and Luigi Luca Cavalli-Sforza. Rao became known as the world’s greatest mathematical statistician while Cavalli-Sforza carried forward Fisher’s work in population genetics, combining microbiology with mathematical genetics. If we are able to unlock the secrets of our origins it is thanks to these pioneers. The material presented here, especially in the second part, draws heavily on the work of Cavalli-Sforza and his colleagues. (This author had the good fortune of working with C.R. Rao while a student in the U.S.)

            What is extraordinary in all this is the depth and power of scientific analysis needed to unlock the puzzle. Linguistics, the principal tool used for over two hundreds has proved unequal to the task of unlocking the mystery of our origins. The creation of Vedic and Sanskrit languages in India going back perhaps 10,000 years or more was crucial in the evolution of the final phase of Indo-European languages.

            Also remarkable is the immense time scales involved— not thousands but tens of thousands of years. Even this is miniscule by evolutionary standards. We Indo-Europeans (and their ancestors Gauda-Dravidas and Afro-Indians) have been on the planet for barely 65 thousand years, while dinosaurs ruled the earth for as many million years. What follows next is a brief account of our origin and spread.

Gauda-Dravida before Indo-European

            Ever since Sir William Jones in 1786 noticed similarities between Sanskrit and European languages, the question of how people from Sri Lanka and Assam to Ireland and Iceland happen to speak languages clearly related to one another has remained one of the great unsolved problems of history. The usual explanation, at least in India is the famous, now infamous Aryan invasion theory or the AIT. It claims that bands of invading ‘Aryan’ tribes brought both the ancestor of the Sanskrit language and the Vedic literature from somewhere in Eurasia or even Europe.

            This was the result of scholars assuming that the ancestors of Indians and Europeans must at one time have lived in a common place speaking a common language before they spread across Asia, Eurasia and Europe carrying their language which later split into different languages. They called these speakers Indo-Europeans and their languages—from North India to Europe—the Indo-European family. They called the original language Proto-Indo-European or PIE, a term sometimes applied to its speakers also.

            European linguists soon followed up on these ideas but in their newfound enthusiasm committed two egregious blunders. First, they borrowed the Sanskrit word Arya which only means civilized and turned it into a geographical and then a racial term by applying it to the people and languages of North India. (The correct term for North India is Gauda, just as Dravida refers to the south.) Next, they placed South Indian languages in a totally different category called the Dravidian family excluding them from nearly all discourse about Indo-Europeans. In reality South Indian languages are much closer to Sanskrit in both grammar and vocabulary, whereas with European languages it is limited to vocabulary. Science now tells us that Indo-Europeans were a later offshoot of Gauda-Dravida speakers.

            This point—the closeness of the so-called Dravidian languages to Sanskrit—needs to be emphasized for keeping the two separated continues to be part of a political and academic agenda. In truth, there are no reasons to suppose that Gauda and Dravida languages including Sanskrit had ever remained in separated exclusive domains. Some covert Aryan theorists like Thomas Trautmann go to the extent of claiming that the Dravidian family was ‘discovered’ by Bishop Robert Caldwell in 1835, just as Sanskrit was ‘discovered’ by Jones in 1786. The truth is by then they had a two thousand year history of coexistence and at no time were the Dravida people ignorant of Sanskrit.

            The Aryan myth and the idea of the invasion (AIT) were taught as history for nearly a century until archaeologists discovered the Harappan or the Indus Valley civilization. It continues to be taught in one form or another in spite of the many contradictions highlighted by archaeologists like Jim Shaeffer and B.B. Lal as well as natural scientists like Sir Julian Huxley L. Cavalli-Sforza and others. Politics and entrenched academic interests have succeeded in keeping alive this two hundred year old ad-hoc hypothesis but science has put an end to its survival while at the same time opening a vast new window on the origin and spread of Indo-Europeans.

            Recent discoveries in natural history and population genetics, especially in the past two decades have changed our understanding of Indo-European origins in ways that were totally unexpected. The picture, still a bit hazy, highlights the fact that theories like the AIT are naïve and simplistic. To begin with, they very greatly underestimate the time scales involved and also ignored the revolutionary impact of natural history on humans in the past hundred thousand years. It is science, not linguistic theories that help us unlock the mystery of Indo-Europeans.

A volcano and a gene mutation

            Our story takes us to Africa some hundred thousand years ago. Our ancestors, called ‘anatomically modern humans’ have been located in fossils in East Africa dating to about that time or a bit earlier. We were not the only humans then existing: there were several other ‘humanoid’ species in Asia and Africa among which the now extinct Neanderthals are the best known. What separates us from them is we have survived and they have not. In addition we are a speaking species with language without which civilization as we know it is inconceivable. So it is the origin of spoken language that we must speak and not just phonetic similarities; with some effort we can find phonetic similarities between any two languages.

Figure3: FOX P2 gene whose mutation gave our ancestors spoken language

            This means, before speaking of Indo-European, Proto-Indo-European or any other language, we must ask ourselves when did humans begin to speak in the first place? The answer is provided by the discovery of the mutation of a gene known as FOXP2. It is a complex ‘transcription’ gene that controls both verbalization and grammar. The time when the mutation actually occurred cannot be pinpointed but based on the evidence of the extinction of all other human species following the Toba super-volcanic eruption about 73,000 years ago, we may place it around 80,000 years before present. The exact date of mutation doesn’t matter: what is important is that only our ancestors, endowed with spoken language survived.

            Then, around 73,000 years ago, there was a massive volcanic eruption on the island of Sumatra known as the Toba Explosion. It is the greatest volcanic explosion known, nearly 3000 times the magnitude of the 1980 Mount St. Helen’s explosion. It resulted in a six year-long ‘volcanic winter’ (like a nuclear winter) followed by a 6,000 year long freeze resulting in the extinction of all the human species on the planet except a few thousand of our ancestors in Africa and the Neanderthals. In particular, all non-speaking humanoids became extinct. As a result only speech capable humans survived this catastrophe. This means all of us are descended from this small group of Africans capable of speech. (Neanderthals became extinct 30,000 years ago.)

Figure 4: Scale of the Toba Explosion, the greatest volcano on record

Indo-Europeans, two waves

            This was the situation until about 65,000 years ago when small groups of our African ancestors made their way to South Asia traveling along the Arabian coast. All non-Africans living today are descended from these one thousand or so original settlers in South Asia. They flourished in a small area for some ten thousand years in South-Central India. Their small number living in a small area meant a single language would have sufficed. This was the primordial language of our ancestors. My colleagues and I call it Proto-Afro-Indian. No trace of it has survived.

            For the next ten thousand years or so they led a precarious existence by hunting and gathering. About 52,000 years ago there was a dramatic warming in climate. This led to increase in both population and territory. It was followed by a mass extinction of animals probably due to over-hunting. Shortly after this, about 45,000 years ago or so, small groups left the Indian subcontinent in search of better hunting territory and made their way to Eurasia and Europe. These are the first Indo-Europeans. The language they took with them, possibly more than one, was descended from the primordial Afro-Indian and became the first Indo-European. We have no idea what it was like. So we may surmise the following scenario.

African ancestors → Afro-Indians → South Asians → Indo-Europeans (first wave)

            All this took place during the last Ice Age or what scientists call the Pleistocene. Towards the end of the Ice Age, about 11,000 years ago, agriculture originating in tropical Asia (India and Southeast Asia) replaced hunting-gathering leading to much larger populations. Important domestic animals including the horse were also domesticated in the region (There is no truth to the claim that horses were unknown in India before the Aryan invaders brought them.) There were now several languages in north and south India which my colleagues and I call Gauda and Dravida languages. (Arya means civilized and inappropriate for region or language.)

Out of Africa: Courtesy Bradshaw Foundation, Oxford and the National Geographic Magazine

            There were two major developments during the Holocene or the period after the Ice Age 10,000 years ago. First, there was intense activity leading eventually to the creation of the Vedas and the language that became Sanskrit by incorporating features found in both northern (Gauda) and southern (Dravida) sources. This accounts for the so-called Dravidian features found in the Vedas as well as the closeness of Dravidian grammars to Sanskrit grammar. The other was a second wave of people out of India who took with them both Sanskrit related languages and agricultural skills along with domestic animals including rats and mice! This accounts for the closeness of Sanskrit to European languages, in vocabulary if not grammar.

South Asians (Gauda-Dravida) → Indo-Europeans (second wave)
Climate and human activity
Language development

Toba destroys humans & vegetation in South Asia giving rise to a 6 year ‘volcanic winter’ and a 6000 year to 10,000 year freeze.
73 K BP
Toba explosion eliminates all humans without speech; only Neanderthals and our speaking African ancestors survive.

Groups of Africans settle in South Asia (India) and along the Arabian coast taking a coastal route. World population down from about 60 million to a few thousand.
65 K BP
Our African ancestors arrive in India bringing their language. It is the ancestor of our languages– the Primordial Afro-Indo-European.

Hunting-gathering: small population in a state of genetic drift. Cold period. Dramatic warming c. 52 K BP allows population and habitation expand. Migration East (East Asia, Australia).
65 K – 52 K BP
Cold phase: population and area
small enough for a single
language to suffice. More languages evolve over the next 10,000 years and more.

Temporary warming leads to increase in population, area, flora and fauna. Overhunting causes depletion of fauna.
52 K – 40 K BP
Expansion results in the birth of several regional languages and dialects- Gauda (northern) and Dravida (southern).

Depletion of fauna due to over-hunting sends people in search of better hunting grounds to Eurasia and Europe. First Indo-Europeans.
50 K – 35 K BP
Indo-Europeans, first wave with languages from India moves to Eurasia and Europe. No trace of their languages survives.

Late Pleistocene, transition to Holocene. Beginning of agriculture and domestication of animals— pigs, sheep, cattle, and horse.
35 K – 11K BP
Spread of agriculture and movement north. Beginning of Sarasvati settlements.

Transition to the Holocene. Expansion of agriculture and domesticated stock into West Asia, Eurasia, Europe. Second wave of Indo-Europeans.
11 K BP…
Creation of Sanskrit and the Vedic from Gauda and Dravida sources. The second wave takes Sanskritic terms into Eurasia & European languages.

Summary of Indo-European transitions (K = 1000, BP = Before Present)

            This means there were two major waves of Indo-Europeans, both out of India into the north and west. We know of the first (c. 45,000 BCE) only from genetic studies of modern populations around the world. We have no idea what their languages were like. The second, and much more recent, occurred at the turn of the Pleistocene-Holocene transition some 10,000 years ago. It has left many traces in archaeology, genetics, culture, and above all in the Sanskritic imprint on the languages of Europe and Eurasia. This is supplemented by genetic and other scientific data relating to animals that accompanied them including of rats and mice!

Finale: why India and Sanskrit so pivotal?

            The role of Sanskrit or what led up to it played therefore a crucial role. Sanskrit grew along two parallel tracks—Vedic and what became classical. As Sri Aurobindo pointed out a century ago, the Rig Veda, the world’s oldest literature, was the culmination of a long effort that must have occupied thousands of years and not the beginning. Everything that followed is a simplification and in some ways a degeneracy— even the later Vedas like the Yajur. Its creators must have recognized that they had created something extraordinarily precious because they put in enormous effort into preserving it through hundreds of generations of teachers and pupils as well as devising methods like ghana-patha, pada-patha and the like to facilitate the preservation.

            While less sophisticated than the Vedic, the later classical Sanskrit also was carefully constructed language as the word ‘Samskrita’ indicates. This explains the extraordinary perfection of its grammar: the grammar used by Kalidasa 2000 years ago is the same as what we use today. This is not true of any other language, and it is no accident. Since the idea that it was brought by invading Aryans has been demolished by science, we must look to indigenous sources. Sanskrit is and will always remain the lynchpin of linguistics, not any PIE or anything else. Sanskrit can do without PIE and has for thousands of years but Indo-European Studies will collapse without Sanskrit.

            India was (and is) pivotal because of its strategic location and climate. Both land and sea routes—east-west as well as north-south—are accessible from India. Also, India enjoys a subtropical climate that allows both tropical and temperate flora and fauna to flourish.

            The picture given here is by no means definitive but decidedly more in agreement with scientific data and the fossil record than linguistic theories like the AIT which must now be relegated to the dustbin of history. Many details remain to be filled, but any new theory must account for scientific data, especially from natural history and genetics, and take also into account the vast time scales involved. Such momentous developments as the evolution and spread of languages over half the world cannot be squeezed into a few thousand years like the Biblical account of Creation in 4004 BC on which AIT was based.

Note: The author gratefully acknowledges valuable suggestions and help from Dr. Stephen Oppenheimer, Dr. David Frawley, Dr. Premendra Priyadarshi and Dr. Rosalie Wolfe. The material presented here is a summary only, keeping in mind the fact not all the readers will be familiar with the highly technical details relating to population genetics of humans as well as of the flora and fauna on which it rests. It should be seen only as a framework for future presentations and research. The author is currently working on the book Genes of Time and the Birth of History in which fuller details will be provided. The author would also like to remark that the research and the methodology followed here owe nothing to the so-called Out of India Theory or the OIT, which the author sees as little better than the now discredited AIT.

