» Articles » PMID: 37462292

A Machine Learning Framework Identifies Plastid-Encoded Proteins Harboring C3 and C4 Distinguishing Sequence Information

Overview
Date 2023 Jul 18
PMID 37462292
Authors
Affiliations
Soon will be listed here.
Abstract

C4 photosynthesis is known to have at least 61 independent origins across plant lineages making it one of the most notable examples of convergent evolution. Of the >60 independent origins, a predicted 22-24 origins, encompassing greater than 50% of all known C4 species, exist within the Panicoideae, Arundinoideae, Chloridoideae, Micrairoideae, Aristidoideae, and Danthonioideae (PACMAD) clade of the Poaceae family. This clade is therefore primed with species ideal for the study of genomic changes associated with the acquisition of the C4 photosynthetic trait. In this study, we take advantage of the growing availability of sequenced plastid genomes and employ a machine learning (ML) approach to screen for plastid genes harboring C3 and C4 distinguishing information in PACMAD species. We demonstrate that certain plastid-encoded protein sequences possess distinguishing and informative sequence information that allows them to train accurate ML C3/C4 classification models. Our RbcL-trained model, for example, informs a C3/C4 classifier with greater than 99% accuracy. Accurate prediction of photosynthetic type from individual sequences suggests biologically relevant, and potentially differing roles of these sequence products in C3 versus C4 metabolism. With this ML framework, we have identified several key sequences and sites that are most predictive of C3/C4 status, including RbcL, subunits of the NAD(P)H dehydrogenase complex, and specific residues within, further highlighting their potential significance in the evolution and/or maintenance of C4 photosynthetic machinery. This general approach can be applied to uncover intricate associations between other similar genotype-phenotype relationships.

References
1.
Paulus J, Schlieper D, Groth G . Greater efficiency of photosynthetic carbon fixation due to single amino-acid substitution. Nat Commun. 2013; 4:1518. PMC: 3586729. DOI: 10.1038/ncomms2504. View

2.
Nakamura N, Iwano M, Havaux M, Yokota A, Nakajima Munekage Y . Promotion of cyclic electron transport around photosystem I during the evolution of NADP-malic enzyme-type C4 photosynthesis in the genus Flaveria. New Phytol. 2013; 199(3):832-42. DOI: 10.1111/nph.12296. View

3.
. New grass phylogeny resolves deep evolutionary relationships and discovers C4 origins. New Phytol. 2011; 193(2):304-12. DOI: 10.1111/j.1469-8137.2011.03972.x. View

4.
Spreitzer R, Salvucci M . Rubisco: structure, regulatory interactions, and possibilities for a better enzyme. Annu Rev Plant Biol. 2002; 53:449-75. DOI: 10.1146/annurev.arplant.53.100301.135233. View

5.
Giussani L, Cota-Sanchez J, Zuloaga F, Kellogg E . A molecular phylogeny of the grass subfamily Panicoideae (Poaceae) shows multiple origins of C4 photosynthesis. Am J Bot. 2011; 88(11):1993-2012. View