An E-M Algorithm and Testing Strategy for Multiple-locus Haplotypes
Overview
Authors
Affiliations
This paper gives an expectation maximization (EM) algorithm to obtain allele frequencies, haplotype frequencies, and gametic disequilibrium coefficients for multiple-locus systems. It permits high polymorphism and null alleles at all loci. This approach effectively deals with the primary estimation problems associated with such systems; that is, there is not a one-to-one correspondence between phenotypic and genotypic categories, and sample sizes tend to be much smaller than the number of phenotypic categories. The EM method provides maximum-likelihood estimates and therefore allows hypothesis tests using likelihood ratio statistics that have chi 2 distributions with large sample sizes. We also suggest a data resampling approach to estimate test statistic sampling distributions. The resampling approach is more computer intensive, but it is applicable to all sample sizes. A strategy to test hypotheses about aggregate groups of gametic disequilibrium coefficients is recommended. This strategy minimizes the number of necessary hypothesis tests while at the same time describing the structure of disequilibrium. These methods are applied to three unlinked dinucleotide repeat loci in Navajo Indians and to three linked HLA loci in Gila River (Pima) Indians. The likelihood functions of both data sets are shown to be maximized by the EM estimates, and the testing strategy provides a useful description of the structure of gametic disequilibrium. Following these applications, a number of simulation experiments are performed to test how well the likelihood-ratio statistic distributions are approximated by chi 2 distributions. In most circumstances the chi 2 grossly underestimated the probability of type I errors. However, at times they also overestimated the type 1 error probability. Accordingly, we recommended hypothesis tests that use the resampling method.
Al-Kaabi M, Deshpande P, Firth M, Pavlos R, Chopra A, Basiri H PLoS Pathog. 2024; 20(7):e1012359.
PMID: 38980912 PMC: 11259285. DOI: 10.1371/journal.ppat.1012359.
Williams R, Hanson R, Peters B, Kearns K, Knowler W, Bogardus C Diabetes. 2024; 73(6):1002-1011.
PMID: 38530923 PMC: 11109785. DOI: 10.2337/db23-0925.
ACCURATE CONSTRUCTION OF LONG RANGE HAPLOTYPE IN UNRELATED INDIVIDUALS.
Johnson N, London S, Romieu I, Wong W, Tang H Stat Sin. 2023; 23:1441-1461.
PMID: 37398638 PMC: 10312227. DOI: 10.5705/ss.2012.141s.
Costa P, Maciel-Fiuza M, Kowalski T, Fraga L, Feira M, Aranha Camargo L Mem Inst Oswaldo Cruz. 2022; 117:e220039.
PMID: 36383784 PMC: 9668341. DOI: 10.1590/0074-02760220039.
Kasai M, Omae Y, Khor S, Shibata A, Hoshino A, Mizuguchi M Genes Immun. 2022; 23(3-4):123-128.
PMID: 35422513 DOI: 10.1038/s41435-022-00170-y.