» Articles » PMID: 12713273

A Semiautomated Approach to Gene Discovery Through Expressed Sequence Tag Data Mining: Discovery of New Human Transporter Genes

Overview
Journal AAPS PharmSci
Specialty Pharmacology
Date 2003 Apr 26
PMID 12713273
Citations 4
Authors
Affiliations
Soon will be listed here.
Abstract

Identification and functional characterization of the genes in the human genome remain a major challenge. A principal source of publicly available information used for this purpose is the National Center for Biotechnology Information database of expressed sequence tags (dbEST), which contains over 4 million human ESTs. To extract the information buried in this data more effectively, we have developed a semiautomated method to mine dbEST for uncharacterized human genes. Starting with a single protein input sequence, a family of related proteins from all species is compiled. This entire family is then used to mine the human EST database for new gene candidates. Evaluation of putative new gene candidates in the context of a family of characterized proteins provides a framework for inference of the structure and function of the new genes. When applied to a test data set of 28 families within the major facilitator superfamily (MFS) of membrane transporters, our protocol found 73 previously characterized human MFS genes and 43 new MFS gene candidates. Development of this approach provided insights into the problems and pitfalls of automated data mining using public databases.

Citing Articles

Energy coupling mechanisms of MFS transporters.

Zhang X, Zhao Y, Heng J, Jiang D Protein Sci. 2015; 24(10):1560-79.

PMID: 26234418 PMC: 4594656. DOI: 10.1002/pro.2759.


Structure of the YajR transporter suggests a transport mechanism based on the conserved motif A.

Jiang D, Zhao Y, Wang X, Fan J, Heng J, Liu X Proc Natl Acad Sci U S A. 2013; 110(36):14664-9.

PMID: 23950222 PMC: 3767500. DOI: 10.1073/pnas.1308127110.


Sialin (SLC17A5) functions as a nitrate transporter in the plasma membrane.

Qin L, Liu X, Sun Q, Fan Z, Xia D, Ding G Proc Natl Acad Sci U S A. 2012; 109(33):13434-9.

PMID: 22778404 PMC: 3421170. DOI: 10.1073/pnas.1116633109.


Validation of an NSP-based (negative selection pattern) gene family identification strategy.

L Frank R, Kandoth C, Ercal F BMC Bioinformatics. 2008; 9 Suppl 9:S2.

PMID: 18793465 PMC: 2537573. DOI: 10.1186/1471-2105-9-S9-S2.

References
1.
Lai E . Application of SNP technologies in medicine: lessons learned and future challenges. Genome Res. 2001; 11(6):927-9. DOI: 10.1101/gr.192301. View

2.
. Creating the gene ontology resource: design and implementation. Genome Res. 2001; 11(8):1425-33. PMC: 311077. DOI: 10.1101/gr.180801. View

3.
Hogenesch J, Ching K, Batalov S, Su A, Walker J, Zhou Y . A comparison of the Celera and Ensembl predicted gene sets reveals little overlap in novel genes. Cell. 2001; 106(4):413-5. DOI: 10.1016/s0092-8674(01)00467-6. View

4.
Doege H, Bocianski A, Scheepers A, Axer H, Eckel J, Joost H . Characterization of human glucose transporter (GLUT) 11 (encoded by SLC2A11), a novel sugar-transport facilitator specifically expressed in heart and skeletal muscle. Biochem J. 2001; 359(Pt 2):443-9. PMC: 1222165. DOI: 10.1042/0264-6021:3590443. View

5.
Botka C, Wittig T, Graul R, Nielsen C, Higaka K, Amidon G . Human proton/oligopeptide transporter (POT) genes: identification of putative human genes using bioinformatics. AAPS PharmSci. 2001; 2(2):E16. PMC: 2751030. DOI: 10.1208/ps020216. View