» Articles » PMID: 17572027

Confounding Factors in HGT Detection: Statistical Error, Coalescent Effects, and Multiple Solutions

Overview
Journal J Comput Biol
Date 2007 Jun 19
PMID 17572027
Citations 33
Authors
Affiliations
Soon will be listed here.
Abstract

Prokaryotic organisms share genetic material across species boundaries by means of a process known as horizontal gene transfer (HGT). This process has great significance for understanding prokaryotic genome diversification and unraveling their complexities. Phylogeny-based detection of HGT is one of the most commonly used methods for this task, and is based on the fundamental fact that HGT may cause gene trees to disagree with one another, as well as with the species phylogeny. Using these methods, we can compare gene and species trees, and infer a set of HGT events to reconcile the differences among these trees. In this paper, we address three factors that confound the detection of the true HGT events, including the donors and recipients of horizontally transferred genes. First, we study experimentally the effects of error in the estimated gene trees (statistical error) on the accuracy of inferred HGT events. Our results indicate that statistical error leads to overestimation of the number of HGT events, and that HGT detection methods should be designed with unresolved gene trees in mind. Second, we demonstrate, both theoretically and empirically, that based on topological comparison alone, the number of HGT scenarios that reconcile a pair of species/gene trees may be exponential. This number may be reduced when branch lengths in both trees are estimated correctly. This set of results implies that in the absence of additional biological information, and/or a biological model of how HGT occurs, multiple HGT scenarios must be sought, and efficient strategies for how to enumerate such solutions must be developed. Third, we address the issue of lineage sorting, how it confounds HGT detection, and how to incorporate it with HGT into a single stochastic framework that distinguishes between the two events by extending population genetics theories. This result is very important, particularly when analyzing closely related organisms, where coalescent effects may not be ignored when reconciling gene trees. In addition to these three confounding factors, we consider the problem of enumerating all valid coalescent scenarios that constitute plausible species/gene tree reconciliations, and develop a polynomial-time dynamic programming algorithm for solving it. This result bears great significance on reducing the search space for heuristics that seek reconciliation scenarios. Finally, we show, empirically, that the locality of incongruence between a pair of trees has an impact on the numbers of HGT and coalescent reconciliation scenarios.

Citing Articles

Enumeration of coalescent histories for caterpillar species trees and -pseudocaterpillar gene trees.

Alimpiev E, Rosenberg N Adv Appl Math. 2021; 131.

PMID: 34483422 PMC: 8415704. DOI: 10.1016/j.aam.2021.102265.


Roadblocked monotonic paths and the enumeration of coalescent histories for non-matching caterpillar gene trees and species trees.

Himwich Z, Rosenberg N Adv Appl Math. 2020; 113.

PMID: 32863514 PMC: 7450691. DOI: 10.1016/j.aam.2019.101939.


ENUMERATION OF LONELY PAIRS OF GENE TREES AND SPECIES TREES BY MEANS OF ANTIPODAL CHERRIES.

Rosenberg N Adv Appl Math. 2019; 102:1-17.

PMID: 30983650 PMC: 6456302. DOI: 10.1016/j.aam.2018.09.001.


Horizontal Gene Transfer in Five Parasite Plant Species in Orobanchaceae.

Kado T, Innan H Genome Biol Evol. 2018; 10(12):3196-3210.

PMID: 30407540 PMC: 6294234. DOI: 10.1093/gbe/evy219.


Enumeration of compact coalescent histories for matching gene trees and species trees.

Disanto F, Rosenberg N J Math Biol. 2018; 78(1-2):155-188.

PMID: 30116881 PMC: 7661175. DOI: 10.1007/s00285-018-1271-5.