Found at last: the human genome’s missing pieces


Researchers had long wondered where the human genome’s “missing pieces” were hiding – and what had hidden them from view.

Researchers had long wondered where the human genome’s “missing pieces” were hiding – and what had hidden them from view.

The mystery had tantalized geneticists for 15 years: scores of genes and millions of bases of human DNA sequence had no home on maps of the human genome.   Where in the human genome were they hiding?

Now two papers by Giulio Genovese and colleagues provide the answer: much of this sequence is hiding in and around the centromeres of human chromosomes, as euchromatic rafts that float in oceans of heterochromatin. These “missing pieces” of the human genome contain many expressed genes; at least one can, when mutated, contribute to neurological disease.

Genovese’s discoveries inform a revised map of the human genome sequence released by the Human Genome Reference Consortium, the first major revision of the human genome reference since 2009.

Scientists had long assumed that finding the human genome’s “missing pieces” would require a future genome-sequencing technology that could sequence chromosomes from end to end. But Genovese, who studied mathematics before becoming a geneticist, conceived a way to find the answer in whole genome sequence data that were already available.

Genovese’s approach utilizes a special property of human population mixtures, the admixed populations that have formed when populations from different continents (such as Africans, Europeans, and Native North Americans) have encountered each other and exchanged genetic material. The resulting, admixed populations – such as African Americans and Latinos – have genomes that are mosaics of long segments that come from the founding populations. Because recombination shuffles these segments only about once per generation – and because these population mixtures were relatively recent events, occurring in the past 2-20 generations – Genovese reasoned that a “missing piece” would, in most people, show the same ancestry as other sequences from the same genomic region. He found genetic variants in these cryptic DNA sequences, typed them in genome data from hundreds of African Americans and Latinos, and compared these genotypes to maps of “local ancestry” in each genome.

The results were surprising, but analysis of data from cytogenetic experiments quickly confirmed Genovese’s mappings – in dozens of cases, and with no apparent contradictions of his mathematical predictions.

Perhaps most surprising, Genovese found that these “missing pieces” of the human genome contained many expressed genes, including many novel genes that are “cryptic paralogs” of genes that were already known.

The work is described in two papers just published in Nature Genetics and the American Journal of Human Genetics.  In the second paper, Genovese found that Latino genomes are even more powerful for finding the human genome’s “missing pieces”.

References

Genovese G, Handsaker RE, Li H, Altemose N, Lindgren AM, Chambert K, Pasaniuc B, Price AL, Reich D, Morton CC, Pollak MR, Wilson JG, McCarroll SA. Using population admixture to help complete maps of the human genome. Nature Genetics 45: 406-14, 2013.
Genovese G, Handsaker RE, Li H, Kenny EE, McCarroll SA. Mapping the human genome’s missing sequence by three-way admixture in Latino genomes. Am J Hum Genet 93: 441-21, 2013.