» Articles » PMID: 19997067

Building the Sequence Map of the Human Pan-genome

Abstract

Here we integrate the de novo assembly of an Asian and an African genome with the NCBI reference human genome, as a step toward constructing the human pan-genome. We identified approximately 5 Mb of novel sequences not present in the reference genome in each of these assemblies. Most novel sequences are individual or population specific, as revealed by their comparison to all available human DNA sequence and by PCR validation using the human genome diversity cell line panel. We found novel sequences present in patterns consistent with known human migration paths. Cross-species conservation analysis of predicted genes indicated that the novel sequences contain potentially functional coding regions. We estimate that a complete human pan-genome would contain approximately 19-40 Mb of novel sequence not present in the extant reference genome. The extensive amount of novel sequence contributing to the genetic variation of the pan-genome indicates the importance of using complete genome sequencing and de novo assembly.

Citing Articles

The evolution, variation and expression patterns of the annexin gene family in the maize pan-genome.

Liu X, Zhang M, Zhao X, Shen M, Feng R, Wei Q Sci Rep. 2025; 15(1):5711.

PMID: 39962090 PMC: 11832922. DOI: 10.1038/s41598-025-89119-5.


Gastric cancer genomics study using reference human pangenomes.

Jiao D, Dong X, Fan S, Liu X, Yu Y, Wei C Life Sci Alliance. 2025; 8(4).

PMID: 39870503 PMC: 11772497. DOI: 10.26508/lsa.202402977.


Pangenome graphs and their applications in biodiversity genomics.

Secomandi S, Gallo G, Rossi R, Rodriguez Fernandes C, Jarvis E, Bonisoli-Alquati A Nat Genet. 2025; 57(1):13-26.

PMID: 39779953 DOI: 10.1038/s41588-024-02029-6.


The developments and prospects of plant super-pangenomes: Demands, approaches, and applications.

He W, Li X, Qian Q, Shang L Plant Commun. 2024; 6(2):101230.

PMID: 39722458 PMC: 11897476. DOI: 10.1016/j.xplc.2024.101230.


Pan-genome wide identification and analysis of the gene family in sunflowers ( L.) revealed their intraspecies diversity and potential roles in abiotic stress tolerance.

Zhang C, Li H, Yin J, Han Z, Liu X, Chen Y Front Plant Sci. 2024; 15:1499024.

PMID: 39606674 PMC: 11598334. DOI: 10.3389/fpls.2024.1499024.


References
1.
Wang S, Lewis C, Jakobsson M, Ramachandran S, Ray N, Bedoya G . Genetic variation and population structure in native Americans. PLoS Genet. 2007; 3(11):e185. PMC: 2082466. DOI: 10.1371/journal.pgen.0030185. View

2.
Cavalli-Sforza L . The Human Genome Diversity Project: past, present and future. Nat Rev Genet. 2005; 6(4):333-40. DOI: 10.1038/nrg1596. View

3.
Li J, Absher D, Tang H, Southwick A, Casto A, Ramachandran S . Worldwide human relationships inferred from genome-wide patterns of variation. Science. 2008; 319(5866):1100-4. DOI: 10.1126/science.1153717. View

4.
Bovee D, Zhou Y, Haugen E, Wu Z, Hayden H, Gillett W . Closing gaps in the human genome with fosmid resources generated from multiple individuals. Nat Genet. 2007; 40(1):96-101. DOI: 10.1038/ng.2007.34. View

5.
Hinds D, Stuve L, Nilsen G, Halperin E, Eskin E, Ballinger D . Whole-genome patterns of common DNA variation in three human populations. Science. 2005; 307(5712):1072-9. DOI: 10.1126/science.1105436. View