» Articles » PMID: 35591888

A Bioinformatics Pipeline for Estimating Mitochondrial DNA Copy Number and Heteroplasmy Levels from Whole Genome Sequencing Data

Abstract

Mitochondrial diseases are a heterogeneous group of disorders that can be caused by mutations in the nuclear or mitochondrial genome. Mitochondrial DNA (mtDNA) variants may exist in a state of heteroplasmy, where a percentage of DNA molecules harbor a variant, or homoplasmy, where all DNA molecules have the same variant. The relative quantity of mtDNA in a cell, or copy number (mtDNA-CN), is associated with mitochondrial function, human disease, and mortality. To facilitate accurate identification of heteroplasmy and quantify mtDNA-CN, we built a bioinformatics pipeline that takes whole genome sequencing data and outputs mitochondrial variants, and mtDNA-CN. We incorporate variant annotations to facilitate determination of variant significance. Our pipeline yields uniform coverage by remapping to a circularized chrM and by recovering reads falsely mapped to nuclear-encoded mitochondrial sequences. Notably, we construct a consensus chrM sequence for each sample and recall heteroplasmy against the sample's unique mitochondrial genome. We observe an approximately 3-fold increased association with age for heteroplasmic variants in non-homopolymer regions and, are better able to capture genetic variation in the D-loop of chrM compared to existing software. Our bioinformatics pipeline more accurately captures features of mitochondrial genetics than existing pipelines that are important in understanding how mitochondrial dysfunction contributes to disease.

Citing Articles

Bioinformatics Tools for NGS-Based Identification of Single Nucleotide Variants and Large-Scale Rearrangements in Mitochondrial DNA.

Barresi M, Dal Santo G, Izzo R, Zauli A, Lamantea E, Caporali L BioTech (Basel). 2025; 14(1).

PMID: 39982276 PMC: 11843820. DOI: 10.3390/biotech14010009.


Mitochondrial heteroplasmy improves risk prediction for myeloid neoplasms.

Hong Y, Pasca S, Shi W, Puiu D, Lake N, Lek M Nat Commun. 2024; 15(1):10133.

PMID: 39578475 PMC: 11584845. DOI: 10.1038/s41467-024-54443-3.


Best practices for germline variant and DNA methylation analysis of second- and third-generation sequencing data.

Bonfiglio F, Legati A, Lasorsa V, Palombo F, De Riso G, Isidori F Hum Genomics. 2024; 18(1):120.

PMID: 39501379 PMC: 11536923. DOI: 10.1186/s40246-024-00684-8.


Quantifying constraint in the human mitochondrial genome.

Lake N, Ma K, Liu W, Battle S, Laricchia K, Tiao G Nature. 2024; 635(8038):390-397.

PMID: 39415008 PMC: 11646341. DOI: 10.1038/s41586-024-08048-x.


Mitochondrial DNA copy number variation in asthma risk, severity, and exacerbations.

Xu W, Hong Y, Hu B, Comhair S, Janocha A, Zein J J Allergy Clin Immunol. 2024; .

PMID: 39237012 PMC: 11875079. DOI: 10.1016/j.jaci.2024.08.022.


References
1.
Singh L, Ennis B, Loneragan B, Tsao N, Lopez Sanchez M, Li J . MitoScape: A big-data, machine-learning platform for obtaining mitochondrial DNA from next-generation sequencing data. PLoS Comput Biol. 2021; 17(11):e1009594. PMC: 8610268. DOI: 10.1371/journal.pcbi.1009594. View

2.
Ding J, Sidore C, Butler T, Wing M, Qian Y, Meirelles O . Assessing Mitochondrial DNA Variation and Copy Number in Lymphocytes of ~2,000 Sardinians Using Tailored Sequencing Analysis Tools. PLoS Genet. 2015; 11(7):e1005306. PMC: 4501845. DOI: 10.1371/journal.pgen.1005306. View

3.
Sherry S, Ward M, Kholodov M, Baker J, Phan L, Smigielski E . dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2000; 29(1):308-11. PMC: 29783. DOI: 10.1093/nar/29.1.308. View

4.
Knez J, Winckelmans E, Plusquin M, Thijs L, Cauwenberghs N, Gu Y . Correlates of Peripheral Blood Mitochondrial DNA Content in a General Population. Am J Epidemiol. 2015; 183(2):138-46. PMC: 4706678. DOI: 10.1093/aje/kwv175. View

5.
Gorman G, Schaefer A, Ng Y, Gomez N, Blakely E, Alston C . Prevalence of nuclear and mitochondrial DNA mutations related to adult mitochondrial disease. Ann Neurol. 2015; 77(5):753-9. PMC: 4737121. DOI: 10.1002/ana.24362. View