» Articles » PMID: 37131135

Utilization of Five Data Mining Algorithms Combined with Simplified Preprocessing to Establish Reference Intervals of Thyroid-related Hormones for Non-elderly Adults

Overview
Publisher Biomed Central
Date 2023 May 2
PMID 37131135
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Despite the extensive research on data mining algorithms, there is still a lack of a standard protocol to evaluate the performance of the existing algorithms. Therefore, the study aims to provide a novel procedure that combines data mining algorithms and simplified preprocessing to establish reference intervals (RIs), with the performance of five algorithms assessed objectively as well.

Methods: Two data sets were derived from the population undergoing a physical examination. Hoffmann, Bhattacharya, Expectation Maximum (EM), kosmic, and refineR algorithms combined with two-step data preprocessing respectively were implemented in the Test data set to establish RIs for thyroid-related hormones. Algorithm-calculated RIs were compared with the standard RIs calculated from the Reference data set in which reference individuals were selected following strict inclusion and exclusion criteria. Objective assessment of the methods is implemented by the bias ratio (BR) matrix.

Results: RIs of thyroid-related hormones are established. There is a high consistency between TSH RIs established by the EM algorithm and the standard TSH RIs (BR = 0.063), although EM algorithms seems to perform poor on other hormones. RIs calculated by Hoffmann, Bhattacharya, and refineR methods for free and total triiodo-thyronine, free and total thyroxine respectively are close and match the standard RIs.

Conclusion: An effective approach for objectively evaluating the performance of the algorithm based on the BR matrix is established. EM algorithm combined with simplified preprocessing can handle data with significant skewness, but its performance is limited in other scenarios. The other four algorithms perform well for data with Gaussian or near-Gaussian distribution. Using the appropriate algorithm based on the data distribution characteristics is recommended.

Citing Articles

Insulin reference intervals in Brazilian adolescents by direct and indirect approaches: validation of a data mining method from laboratory data.

Freire M, Dias P, Souza T, Hirose C, Araujo P, Neves M J Pediatr (Rio J). 2024; 100(5):512-518.

PMID: 38670169 PMC: 11361890. DOI: 10.1016/j.jped.2024.03.009.


Comparison of results and age-related changes in establishing reference intervals for CEA, AFP, CA125, and CA199 using four indirect methods.

Chen J, Fan L, Yang Z, Yang D Pract Lab Med. 2024; 38:e00353.

PMID: 38221990 PMC: 10787276. DOI: 10.1016/j.plabm.2023.e00353.

References
1.
Concordet D, Geffre A, Braun J, Trumel C . A new approach for the determination of reference intervals from hospital-based data. Clin Chim Acta. 2009; 405(1-2):43-8. DOI: 10.1016/j.cca.2009.03.057. View

2.
Wang D, Ma C, Zou Y, Yu S, Li H, Cheng X . Gender and age-specific reference intervals of common biochemical analytes in Chinese population: Derivation using real laboratory data. J Med Biochem. 2021; 39(3):384-391. PMC: 7956001. DOI: 10.2478/jomb-2019-0046. View

3.
Farrell C, Nguyen L . Indirect Reference Intervals: Harnessing the Power of Stored Laboratory Data. Clin Biochem Rev. 2019; 40(2):99-111. PMC: 6544248. DOI: 10.33176/AACB-19-00022. View

4.
Chaker L, Bianco A, Jonklaas J, Peeters R . Hypothyroidism. Lancet. 2017; 390(10101):1550-1562. PMC: 6619426. DOI: 10.1016/S0140-6736(17)30703-1. View

5.
Zierk J, Arzideh F, Kapsner L, Prokosch H, Metzler M, Rauh M . Reference Interval Estimation from Mixed Distributions using Truncation Points and the Kolmogorov-Smirnov Distance (kosmic). Sci Rep. 2020; 10(1):1704. PMC: 6997422. DOI: 10.1038/s41598-020-58749-2. View