Multi-TGDR, a Multi-class Regularization Method, Identifies the Metabolic Profiles of Hepatocellular Carcinoma and Cirrhosis Infected with Hepatitis B or Hepatitis C Virus
Overview
Affiliations
Background: Over the last decade, metabolomics has evolved into a mainstream enterprise utilized by many laboratories globally. Like other "omics" data, metabolomics data has the characteristics of a smaller sample size compared to the number of features evaluated. Thus the selection of an optimal subset of features with a supervised classifier is imperative. We extended an existing feature selection algorithm, threshold gradient descent regularization (TGDR), to handle multi-class classification of "omics" data, and proposed two such extensions referred to as multi-TGDR. Both multi-TGDR frameworks were used to analyze a metabolomics dataset that compares the metabolic profiles of hepatocellular carcinoma (HCC) infected with hepatitis B (HBV) or C virus (HCV) with that of cirrhosis induced by HBV/HCV infection; the goal was to improve early-stage diagnosis of HCC.
Results: We applied two multi-TGDR frameworks to the HCC metabolomics data that determined TGDR thresholds either globally across classes, or locally for each class. Multi-TGDR global model selected 45 metabolites with a 0% misclassification rate (the error rate on the training data) and had a 3.82% 5-fold cross-validation (CV-5) predictive error rate. Multi-TGDR local selected 48 metabolites with a 0% misclassification rate and a 5.34% CV-5 error rate.
Conclusions: One important advantage of multi-TGDR local is that it allows inference for determining which feature is related specifically to the class/classes. Thus, we recommend multi-TGDR local be used because it has similar predictive performance and requires the same computing time as multi-TGDR global, but may provide class-specific inference.
Tian S, Wang C, Suarez-Farinas M Biomed Res Int. 2021; 2021:8862895.
PMID: 33928163 PMC: 8053058. DOI: 10.1155/2021/8862895.
Tian S, Wang C Biomed Res Int. 2019; 2019:1724898.
PMID: 31016185 PMC: 6444255. DOI: 10.1155/2019/1724898.
Meoni G, Lorini S, Monti M, Madia F, Corti G, Luchinat C Sci Rep. 2019; 9(1):4128.
PMID: 30858406 PMC: 6412048. DOI: 10.1038/s41598-019-40028-4.
Tian S, Wang C, Chang H BMC Med Inform Decis Mak. 2018; 18(Suppl 5):115.
PMID: 30526581 PMC: 6284265. DOI: 10.1186/s12911-018-0685-8.
Tian S, Wang C, Chang H F1000Res. 2018; 7:1166.
PMID: 30271585 PMC: 6124382. DOI: 10.12688/f1000research.15357.1.