Berkeley researchers use genomics to unravel India’s past

India is the most populous country in the world, home to 1.5 billion people and more than 5,000 anthropologically well-defined ethno-linguistic and religious groups. However, Indian genomes have been largely left out of the conversation of human history.

Researchers at UC Berkeley in Priya Moorjani’s lab were interested in examining questions related to India’s genomic history. They worked with LASI-DAD, the Longitudinal Aging Study in India – Diagnostic Assessment of Dementia, which is a comprehensive population-based prospective cohort study that collected nationally representative genomic data from Indians.

The genomics analysis, part of the umbrella survey of aging and age-associated diseases in India, is the “largest Indian genetics [study] when it comes to whole genome sequences,” says Moorjani. This dataset has allowed Elise Kerdoncuff and Laurits Skov, postdocs in the lab, to explore and answer questions about Indian population history.

Using 2,762 high-coverage genomes from India which included individuals from most geographic regions, speakers of all major languages, and communities, the Moorjani lab’s research confirmed previous work that showed several main genetic groups in India. These groups are the Ancestral Northern Indians closely related to West Eurasians, and the Ancestral Southern Indian group which is not closely related to any present day population. Using computational models of different ancestries and seeking the best match, they confirmed that most Indians’ ancestry comes from three main sources: Steppe pastoralists, Iranian farmers, and South Asian hunter-gatherers.

Curious about the source of the Iranian farmer-related ancestry, the lab used statistical tools on fifteen ancient groups’ genomic information from the Neolithic to Iron Age to try and match the source. They found that ancient DNA from the 4th millennium BC Sarazm, an ancient Early Bronze Age settlement site in Tajikistan, best matched.An illustration of a DNA strand surrounded by faces.

According to Kerdoncuff, when they reached out to  experts from other disciples to confirm their findings they found the archeologists “were not surprised at all” by this revelation. There have been well documented trade routes between Sarazm—located in what is today north-western Tajikistan—and South Asia in archaeological studies. Even more impressive is that one of the individuals found in Sarazm, where the ancient human DNA sample was acquired, had shell bangles that are materially and stylistically similar  to ones found at Neolithic sites in India, genetically and archaeologically confirming this ancestry.

One of the questions the authors were interested in answering was looking to see if there was evidence of multiple waves of migration from Africa to India and to see if in “the same wave, people went to Europe and to South Asia” according to Kerdoncuff. By measuring the separation time of Indians and sub-Saharans, they calculated the most recent common ancestor between Indians and sub-Saharan Africans, finding that it was about 54,000 years ago. This matches a similar timeline as Europeans and East Asians which further supports that a single major migration from Africa likely occurred.

They were also interested in looking for archaic DNA and interbreeding between ancestral Indian populations and Neanderthals and Denisovans. Using a tool developed by Laurits Skov they hunted for Neanderthal and Denisovan DNA segments in these Indian genomes. They were able to recover “more variable segments of Neanderthals in South Asia compared to other populations” according to Kerdoncuff. This is substantial given that each Indian genome does not have more Neanderthal DNA when compared to other non-African genomes, still around 1.5% per individual, but that the diverse data set allowed them to find many previously unknown segments.

In a previous similar study led by Laurits Skov, 27,000 Icelandic individuals were used compared to the Indian study of 2,700 individuals. However, this study was able to reconstruct 1.6Gb of the Neanderthals DNA from present-day Indians, that is 50% more despite ten times smaller sample size of the Icelandic study.

“This shows you how the complex evolutionary history of India including recent mixing and founder events has shaped the genetic variation and legacy of archaic ancestry in present-day Indians ,” Moorjani notes. Kerdoncuff believes this showcases the importance of generating unique data sets and including more individuals from underrepresented populations.

Current databases of human genetic information such as 1000 Genomes, All of US, and UK biobank are European ancestry centric. There have been calls to diversify these databases as this has not only implications for ancestry, but also for health. Kerdoncuff also found that due to recent founder events there is increased homozygosity in the Indian genomes, this can have health implications as this predicts a high burden of deleterious variants and increased risk of recessive diseases.

Studying these diverse datasets lets researchers not only unravel our shared human history but allows for more personalized healthcare. If the datasets available are primarily caucasian, this means the genetic impact on health will be poorly understood or not applicable to large parts of the world. Kerdoncuff stresses the importance of pushing for more diverse data sets to be used in human genetics: “It’s really important to study diverse [and] understudied populations, for them, of course, but also for all of us.”