This journey has led to the accumulation of genomic data from hundreds of thousands of individuals, including those from ancient times. (CREDIT: Creative Commons)
In the world of genetic research, the last twenty years have witnessed remarkable progress. This journey has led to the accumulation of genomic data from hundreds of thousands of individuals, including those from ancient times.
This influx of information holds the promise of uncovering the roots of human genetic diversity, ultimately allowing us to construct a comprehensive map of global human genealogy.
Previously, the primary obstacles to achieving this ambitious goal were twofold: the need to amalgamate genome sequences from diverse databases and the development of algorithms capable of handling such massive datasets.
Visualizing inferred human ancestral lineages over time and space. Each line represents an ancestor-descendant relationship in our inferred genealogy of modern and ancient genomes. The width of a line corresponds to how many times the relationship is observed, and lines are colored on the basis of the estimated age of the ancestor. (CREDIT: Science)
However, a newly devised method, unveiled by researchers from the University of Oxford’s Big Data Institute, promises to surmount these challenges. This method offers the capability to seamlessly integrate data from multiple sources and can scale to accommodate millions of genome sequences.
Dr. Yan Wong, an evolutionary geneticist at the Big Data Institute and one of the lead authors, elucidated, "We have essentially erected an immense family tree—a genealogy for all of humanity—that faithfully models the historical processes underlying the genetic diversity observable in modern humans. This genealogy affords us the opportunity to discern the relationship between every individual's genetic sequence and that of every other individual, across all genomic loci."
Given that specific genomic regions are inherited solely from one parent, either the mother or the father, the ancestry of each point on the genome can be conceptualized as a tree.
Related Stories
This collection of trees, referred to as a "tree sequence" or "ancestral recombination graph," traces genetic segments back through time to the ancestors where genetic variation initially arose.
Lead author Dr. Anthony Wilder Wohns, who conducted the research during his Ph.D. tenure at the Big Data Institute and currently serves as a postdoctoral researcher at the Broad Institute of MIT and Harvard, stated, "In essence, we are reconstructing the genomes of our ancestors to construct an extensive network of relationships. This network enables us to estimate the time and place of existence of these ancestors.
What sets our approach apart is its minimal assumptions about the underlying data, allowing for the inclusion of both modern and ancient DNA samples."
"The very earliest ancestors we identify trace back in time to a geographic location that is in modern Sudan."
The study amalgamated data from eight distinct databases encompassing modern and ancient human genomes, comprising a total of 3,609 individual genome sequences from 215 populations. These ancient genomes spanned various regions worldwide and dated back from thousands to over a hundred thousand years.
“These ancestors lived up to and over 1 million years ago—which is much older than current estimates for the age of Homo sapiens—250,000 to 300,000 years ago. So bits of our genome have been inherited from individuals who we wouldn’t recognize as modern humans," Dr Wohns said.
Employing algorithms, the researchers predicted the presence of common ancestors in the evolutionary trees necessary to elucidate the patterns of genetic variation. The resultant network encompassed nearly 27 million ancestors.
Visualization of the nonparametric estimator of ancestor geographic location for HGDP, SGDP, Neanderthal, Denisovan, and Afanasievo samples on chromosome 20. (CREDIT: Science)
By incorporating geographical data associated with these sampled genomes, the authors leveraged the network to infer the probable locations of the predicted common ancestors. The outcomes effectively recapitulated pivotal events in human evolutionary history, such as the migration out of Africa.
Despite the genealogical map's richness as a resource, the research team intends to enhance its comprehensiveness further by continually integrating genetic data as it becomes available. The efficiency with which tree sequences store data implies that the dataset could effortlessly accommodate millions more genomes.
Dr. Wong remarked, "This study lays the groundwork for the next phase of DNA sequencing. As the quality of genome sequences from both modern and ancient DNA samples improves, the accuracy of the trees will likewise improve. Eventually, we will be poised to construct a unified map that elucidates the lineage of all human genetic diversity observable today."
Dr. Wohns added, "While humans are the primary focus of this study, the methodology is applicable to most organisms, ranging from orangutans to bacteria. It holds particular promise in medical genetics, aiding in the discrimination between genuine associations between genetic regions and diseases from spurious connections arising from our shared ancestral history."
For more science news stories check out our New Discoveries section at The Brighter Side of News.
Note: Materials provided above by The Brighter Side of News. Content may be edited for style and length.