Noise Injection for Training Artificial Neural Networks: a Comparison with Weight Decay and Early Stopping

Overview

Journal Med Phys

Specialty Biophysics

Date 2009 Nov 26

PMID 19928111

Citations 25

Authors

Richard M Zur

Yulei Jiang

Lorenzo L Pesce

Karen Drukker

Affiliations

Soon will be listed here.

Abstract

The purpose of this study was to investigate the effect of a noise injection method on the "overfitting" problem of artificial neural networks (ANNs) in two-class classification tasks. The authors compared ANNs trained with noise injection to ANNs trained with two other methods for avoiding overfitting: weight decay and early stopping. They also evaluated an automatic algorithm for selecting the magnitude of the noise injection. They performed simulation studies of an exclusive-or classification task with training datasets of 50, 100, and 200 cases (half normal and half abnormal) and an independent testing dataset of 2000 cases. They also compared the methods using a breast ultrasound dataset of 1126 cases. For simulated training datasets of 50 cases, the area under the receiver operating characteristic curve (AUC) was greater (by 0.03) when training with noise injection than when training without any regularization, and the improvement was greater than those from weight decay and early stopping (both of 0.02). For training datasets of 100 cases, noise injection and weight decay yielded similar increases in the AUC (0.02), whereas early stopping produced a smaller increase (0.01). For training datasets of 200 cases, the increases in the AUC were negligibly small for all methods (0.005). For the ultrasound dataset, noise injection had a greater average AUC than ANNs trained without regularization and a slightly greater average AUC than ANNs trained with weight decay. These results indicate that training ANNs with noise injection can reduce overfitting to a greater degree than early stopping and to a similar degree as weight decay.

Citing Articles

Machine Learning and Metabolomics Predict Mesenchymal Stem Cell Osteogenic Differentiation in 2D and 3D Cultures.

Klontzas M, Vernardis S, Batsali A, Papadogiannis F, Panoskaltsis N, Mantalaris A J Funct Biomater. 2024; 15(12).

PMID: 39728167 PMC: 11680063. DOI: 10.3390/jfb15120367.

mapping of the chemical exchange relayed nuclear Overhauser effect using deep magnetic resonance fingerprinting.

Power I, Rivlin M, Shmuely H, Zaiss M, Navon G, Perlman O iScience. 2024; 27(11):111209.

PMID: 39569380 PMC: 11576397. DOI: 10.1016/j.isci.2024.111209.

Photonic probabilistic machine learning using quantum vacuum noise.

Choi S, Salamin Y, Roques-Carmes C, Dangovski R, Luo D, Chen Z Nat Commun. 2024; 15(1):7760.

PMID: 39237543 PMC: 11377531. DOI: 10.1038/s41467-024-51509-0.

DCE-Qnet: deep network quantification of dynamic contrast enhanced (DCE) MRI.

Cohen O, Kargar S, Woo S, Vargas A, Otazo R MAGMA. 2024; 37(6):1077-1090.

PMID: 39112813 DOI: 10.1007/s10334-024-01189-0.

DCE-Qnet: Deep Network Quantification of Dynamic Contrast Enhanced (DCE) MRI.

Cohen O, Kargar S, Woo S, Vargas A, Otazo R ArXiv. 2024; .

PMID: 38827459 PMC: 11142325.

References

Chan H, Sahiner B, Wagner R, Petrick N . Classifier design for computer-aided diagnosis: effects of finite sample size on the mean performance of classical and neural network classifiers. Med Phys. 2000; 26(12):2654-68. DOI: 10.1118/1.598805. View

METZ , Pan . "Proper" Binormal ROC Curves: Theory and Maximum-Likelihood Estimation. J Math Psychol. 1999; 43(1):1-33. DOI: 10.1006/jmps.1998.1218. View

Kupinski M, EDWARDS D, Giger M, Metz C . Ideal observer approximation using Bayesian classification neural networks. IEEE Trans Med Imaging. 2001; 20(9):886-99. DOI: 10.1109/42.952727. View

Holmstrom L, Koistinen P . Using additive noise in back-propagation training. IEEE Trans Neural Netw. 1992; 3(1):24-38. DOI: 10.1109/72.105415. View

Wu Y, Doi K, Metz C, Asada N, Giger M . Simulation studies of data classification by artificial neural networks: potential applications in medical imaging and decision making. J Digit Imaging. 1993; 6(2):117-25. DOI: 10.1007/BF03168438. View

WRIGHT W . Bayesian approach to neural-network modeling with input uncertainty. IEEE Trans Neural Netw. 2008; 10(6):1261-70. DOI: 10.1109/72.809073. View

Sahiner B, Chan H, Hadjiiski L . Classifier performance estimation under the constraint of a finite sample size: resampling schemes applied to neural network classifiers. Neural Netw. 2008; 21(2-3):476-83. PMC: 2729493. DOI: 10.1016/j.neunet.2007.12.012. View

Metz C . ROC methodology in radiologic imaging. Invest Radiol. 1986; 21(9):720-33. DOI: 10.1097/00004424-198609000-00009. View

Drukker K, Gruszauskas N, Sennett C, Giger M . Breast US computer-aided diagnosis workstation: performance with a large clinical diagnostic population. Radiology. 2008; 248(2):392-7. PMC: 2797650. DOI: 10.1148/radiol.2482071778. View

10.

Shiraishi J, Pesce L, Metz C, Doi K . Experimental design and data analysis in receiver operating characteristic studies: lessons learned from reports in radiology from 1997 to 2006. Radiology. 2009; 253(3):822-30. PMC: 2786192. DOI: 10.1148/radiol.2533081632. View

11.

Jiang Y, Nishikawa R, Wolverton D, Metz C, Giger M, Schmidt R . Malignant and benign clustered microcalcifications: automated feature analysis and classification. Radiology. 1996; 198(3):671-8. DOI: 10.1148/radiology.198.3.8628853. View

12.

Jiang Y . Uncertainty in the output of artificial neural networks. IEEE Trans Med Imaging. 2003; 22(7):913-21. DOI: 10.1109/TMI.2003.815061. View

13.

Wagner R, Metz C, Campbell G . Assessment of medical imaging systems and computer aids: a tutorial review. Acad Radiol. 2007; 14(6):723-48. DOI: 10.1016/j.acra.2007.03.001. View

14.

Lampinen J, Vehtari A . Bayesian approach for neural networks--review and case studies. Neural Netw. 2001; 14(3):257-74. DOI: 10.1016/s0893-6080(00)00098-8. View