An Analytical Framework in the General Coalescent Tree Setting for Analyzing Polymorphisms Created by Two Mutations
Overview
Authors
Affiliations
This paper presents an analytical framework for analyzing polymorphisms created by two mutation events in samples of DNA sequences modeled in the general coalescent tree setting. I developed the framework by deriving analytical formulas for the numbers of the topologies of the genealogies with two mutation events. This approach gives an advantage to analyze polymorphisms in large samples of DNA sequences at a non-recombining locus under vicarious evolutionary scenarios. Particularly the framework allows to estimate the probability of polymorphism data created by two mutation events as well as the ages of the events. Based on these results I extended the definition of the site frequency spectrum by classifying pairs of polymorphic sites into groups and presented analytical expressions for computing the expected sizes of these groups. Within the framework I also designed a Bayesian approach for inferring the haplotype of the most recent common ancestor at two polymorphic sites. Lastly, the framework was applied to polymorphism data from human APOE gene region under various demographic scenarios for ancestral human population and explored the signature of linkage disequilibrium for inferring the ancestral haplotype at two polymorphic sites. Interestingly enough, the results show that the most frequent haplotype at two completely linked polymorphic sites is not always the most likely candidate for the haplotype of the most recent common ancestor.
Recurrent mutation in the ancestry of a rare variant.
Wakeley J, Fan W, Koch E, Sunyaev S Genetics. 2023; 224(3).
PMID: 36967220 PMC: 10324944. DOI: 10.1093/genetics/iyad049.