» Articles » PMID: 12463959

Automatic Extraction of Gene and Protein Synonyms from MEDLINE and Journal Articles

Overview
Journal Proc AMIA Symp
Date 2002 Dec 5
PMID 12463959
Citations 14
Authors
Affiliations
Soon will be listed here.
Abstract

Genes and proteins are often associated with multiple names, and more names are added as new functional or structural information is discovered. Because authors often alternate between these synonyms, information retrieval and extraction benefits from identifying these synonymous names. We have developed a method to extract automatically synonymous gene and protein names from MEDLINE and journal articles. We first identified patterns authors use to list synonymous gene and protein names. We developed SGPE (for synonym extraction of gene and protein names), a software program that recognizes the patterns and extracts from MEDLINE abstracts and full-text journal articles candidate synonymous terms. SGPE then applies a sequence of filters that automatically screen out those terms that are not gene and protein names. We evaluated our method to have an overall precision of 71% on both MEDLINE and journal articles, and 90% precision on the more suitable full-text articles alone

Citing Articles

A study of deep active learning methods to reduce labelling efforts in biomedical relation extraction.

Nachtegael C, De Stefani J, Lenaerts T PLoS One. 2023; 18(12):e0292356.

PMID: 38100453 PMC: 10723703. DOI: 10.1371/journal.pone.0292356.


Recent advances in biomedical literature mining.

Zhao S, Su C, Lu Z, Wang F Brief Bioinform. 2020; 22(3).

PMID: 32422651 PMC: 8138828. DOI: 10.1093/bib/bbaa057.


Knowledge-based best of breed approach for automated detection of clinical events based on German free text digital hospital discharge letters.

Konig M, Sander A, Demuth I, Diekmann D, Steinhagen-Thiessen E PLoS One. 2019; 14(11):e0224916.

PMID: 31774830 PMC: 6881027. DOI: 10.1371/journal.pone.0224916.


Exploring the Unexplored: Identifying Implicit and Indirect Descriptions of Biomedical Terminologies Based on Multifaceted Weighting Combinations.

Choi S Comput Math Methods Med. 2016; 2016:1637580.

PMID: 27698678 PMC: 5029054. DOI: 10.1155/2016/1637580.


Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters.

Funk C, Baumgartner Jr W, Garcia B, Roeder C, Bada M, Cohen K BMC Bioinformatics. 2014; 15:59.

PMID: 24571547 PMC: 4015610. DOI: 10.1186/1471-2105-15-59.


References
1.
Yu H, Hripcsak G, Friedman C . Mapping abbreviations to full forms in biomedical articles. J Am Med Inform Assoc. 2002; 9(3):262-72. PMC: 344586. DOI: 10.1197/jamia.m0913. View

2.
Fukuda K, Tamura A, Tsunoda T, Takagi T . Toward information extraction: identifying protein names from biological papers. Pac Symp Biocomput. 1998; :707-18. View

3.
PROUX , Rechenmann , JULLIARD , Pillet V , Jacq . Detecting Gene Symbols and Names in Biological Texts: A First Step toward Pertinent Information Extraction. Genome Inform Ser Workshop Genome Inform. 2000; 9:72-80. View

4.
Maltais L, Blake J, Eppig J, Davisson M . Rules and guidelines for mouse gene nomenclature: a condensed version. International Committee on Standardized Genetic Nomenclature for Mice. Genomics. 1997; 45(2):471-6. DOI: 10.1006/geno.1997.5010. View

5.
Yoshida M, Fukuda K, Takagi T . PNAD-CSS: a workbench for constructing a protein name abbreviation dictionary. Bioinformatics. 2000; 16(2):169-75. DOI: 10.1093/bioinformatics/16.2.169. View