» Articles » PMID: 26476780

Scalable Clustering Algorithms for Continuous Environmental Flow Cytometry

Overview
Journal Bioinformatics
Specialty Biology
Date 2015 Oct 19
PMID 26476780
Citations 6
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Recent technological innovations in flow cytometry now allow oceanographers to collect high-frequency flow cytometry data from particles in aquatic environments on a scale far surpassing conventional flow cytometers. The SeaFlow cytometer continuously profiles microbial phytoplankton populations across thousands of kilometers of the surface ocean. The data streams produced by instruments such as SeaFlow challenge the traditional sample-by-sample approach in cytometric analysis and highlight the need for scalable clustering algorithms to extract population information from these large-scale, high-frequency flow cytometers.

Results: We explore how available algorithms commonly used for medical applications perform at classification of such a large-scale, environmental flow cytometry data. We apply large-scale Gaussian mixture models to massive datasets using Hadoop. This approach outperforms current state-of-the-art cytometry classification algorithms in accuracy and can be coupled with manual or automatic partitioning of data into homogeneous sections for further classification gains. We propose the Gaussian mixture model with partitioning approach for classification of large-scale, high-frequency flow cytometry data.

Availability And Implementation: Source code available for download at https://github.com/jhyrkas/seaflow_cluster, implemented in Java for use with Hadoop.

Contact: hyrkas@cs.washington.edu

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

MODELING CELL POPULATIONS MEASURED BY FLOW CYTOMETRY WITH COVARIATES USING SPARSE MIXTURE OF REGRESSIONS.

Hyun B, Cape M, Ribalet F, Bien J Ann Appl Stat. 2023; 17(1):357-377.

PMID: 37485300 PMC: 10360992. DOI: 10.1214/22-aoas1631.


Advances in automated real-time flow cytometry for monitoring of bioreactor processes.

Heins A, Hoang M, Weuster-Botz D Eng Life Sci. 2022; 22(3-4):260-278.

PMID: 35382548 PMC: 8961054. DOI: 10.1002/elsc.202100082.


PhenoGMM: Gaussian Mixture Modeling of Cytometry Data Quantifies Changes in Microbial Community Structure.

Rubbens P, Props R, Kerckhof F, Boon N, Waegeman W mSphere. 2021; 6(1).

PMID: 33536320 PMC: 7860985. DOI: 10.1128/mSphere.00530-20.


Real-Time Massive Vector Field Data Processing in Edge Computing.

Zheng K, Zheng K, Fang F, Yao H, Yi Y, Zeng D Sensors (Basel). 2019; 19(11).

PMID: 31181691 PMC: 6603728. DOI: 10.3390/s19112602.


Ultrafast clustering of single-cell flow cytometry data using FlowGrid.

Ye X, Ho J BMC Syst Biol. 2019; 13(Suppl 2):35.

PMID: 30953498 PMC: 6449887. DOI: 10.1186/s12918-019-0690-2.