A Machine Learning Approach Reveals That Bacterial Spore Levels in Organic Bulk Tank Milk Are Dependent on Farm Characteristics and Meteorological Factors
Overview
Nutritional Sciences
Affiliations
Bacterial spores in raw milk can lead to quality issues in milk and milk derived products. As these spores originate from farm environments, it is important to understand contributions of farm-level factors to spore levels. This study aimed to investigate the impact of farm management practices and meteorological factors on levels of different spore types in organic raw milk using machine learning models. Raw milk from certified organic dairy farms (n = 102) located across 11 states was collected 6 times over a year and tested for standard plate count, psychrotolerant spore count, mesophilic spore count, thermophilic spore count, and butyric acid bacteria. At each sampling date, a survey about farm management practices was collected and meteorological factors were obtained on the date of sampling as well as 1, 2, and 3 days prior. The dataset was stratified separately based on the use of a parlor for milking, number of years since organic certification, and pasture time into sub-datasets to address confounders. We constructed random forest regression models to predict log mesophilic spore count, log thermophilic spore count, and log butyric acid bacteria most probable number as well as a random forest classification model to classify the presence of psychrotolerant spores in each raw milk sample. The summary statistics showed that spore levels vary considerably between certified organic farms but were only slightly higher than those from conventional farms in previous longitudinal studies. The variable importance plots from the models suggest that herd size, certification year, employee-related variables, clipping and flaming udders are important for the spore levels in organic raw milk. The small effects of these variables as shown in partial dependence plots suggest a need for individualized risk-based approach to manage spore levels. Incorporating novel data streams has the potential to enhance the performance of the model as a real-time monitoring tool.