» Articles » PMID: 35252462

Extracting Semantics from Census-based Reference Data

Overview
Authors
Affiliations
Soon will be listed here.
Abstract

We present preliminary findings in extracting semantics from reference data generated by the United States Census Bureau. US Census reference data is based upon surveys designed to collect demographics and other socioeconomic factors by geographical regions. These data sets contain thousands of variables; this complexity makes the reference data difficult to learn, query, and integrate into analyses. Researchers often avoid working directly with US Census reference data and instead work with census-derived extracts capturing a much smaller subset of records. We propose to use natural language processing to extract the semantics of census-based reference data and to map census variables to known ontologies. This semantic processing reduces the large volume of variables into more manageable sets of conceptual variables that can be organized by meaning and semantic type.

Citing Articles

Collaborating with and enabling diverse communities to address health inequities: The experiences of a community engagement and outreach team.

Serafica R, Evangelista L, Ward T, Peterson J, Guerrero Lopez J, Lucero J J Clin Transl Sci. 2025; 9(1):e38.

PMID: 40052051 PMC: 11883566. DOI: 10.1017/cts.2025.7.

References
1.
Wu E, Villani J, Davis A, Fareed N, Harris D, Huerta T . Community dashboards to support data-informed decision-making in the HEALing communities study. Drug Alcohol Depend. 2020; 217:108331. PMC: 7528750. DOI: 10.1016/j.drugalcdep.2020.108331. View

2.
Khare R, Wei C, Lu Z . Automatic extraction of drug indications from FDA drug labels. AMIA Annu Symp Proc. 2015; 2014:787-94. PMC: 4419914. View

3.
Khare R, Li J, Lu Z . LabeledIn: cataloging labeled indications for human drugs. J Biomed Inform. 2014; 52:448-56. PMC: 4260997. DOI: 10.1016/j.jbi.2014.08.004. View

4.
Kahn Jr C, Rubin D . Automated semantic indexing of figure captions to improve radiology image retrieval. J Am Med Inform Assoc. 2009; 16(3):380-6. PMC: 2732225. DOI: 10.1197/jamia.M2945. View

5.
Harris D, Henderson D, Corbeau A . sig2db: a Workflow for Processing Natural Language from Prescription Instructions for Clinical Data Warehouses. AMIA Jt Summits Transl Sci Proc. 2020; 2020:221-230. PMC: 7233058. View