Automatic Extraction of Gene and Protein Synonyms from MEDLINE and Journal Articles
Overview
Affiliations
Genes and proteins are often associated with multiple names, and more names are added as new functional or structural information is discovered. Because authors often alternate between these synonyms, information retrieval and extraction benefits from identifying these synonymous names. We have developed a method to extract automatically synonymous gene and protein names from MEDLINE and journal articles. We first identified patterns authors use to list synonymous gene and protein names. We developed SGPE (for synonym extraction of gene and protein names), a software program that recognizes the patterns and extracts from MEDLINE abstracts and full-text journal articles candidate synonymous terms. SGPE then applies a sequence of filters that automatically screen out those terms that are not gene and protein names. We evaluated our method to have an overall precision of 71% on both MEDLINE and journal articles, and 90% precision on the more suitable full-text articles alone
Nachtegael C, De Stefani J, Lenaerts T PLoS One. 2023; 18(12):e0292356.
PMID: 38100453 PMC: 10723703. DOI: 10.1371/journal.pone.0292356.
Recent advances in biomedical literature mining.
Zhao S, Su C, Lu Z, Wang F Brief Bioinform. 2020; 22(3).
PMID: 32422651 PMC: 8138828. DOI: 10.1093/bib/bbaa057.
Konig M, Sander A, Demuth I, Diekmann D, Steinhagen-Thiessen E PLoS One. 2019; 14(11):e0224916.
PMID: 31774830 PMC: 6881027. DOI: 10.1371/journal.pone.0224916.
Choi S Comput Math Methods Med. 2016; 2016:1637580.
PMID: 27698678 PMC: 5029054. DOI: 10.1155/2016/1637580.
Funk C, Baumgartner Jr W, Garcia B, Roeder C, Bada M, Cohen K BMC Bioinformatics. 2014; 15:59.
PMID: 24571547 PMC: 4015610. DOI: 10.1186/1471-2105-15-59.