» Articles » PMID: 23563154

A Fully Scalable Online Pre-processing Algorithm for Short Oligonucleotide Microarray Atlases

Overview
Specialty Biochemistry
Date 2013 Apr 9
PMID 23563154
Citations 16
Authors
Affiliations
Soon will be listed here.
Abstract

Rapid accumulation of large and standardized microarray data collections is opening up novel opportunities for holistic characterization of genome function. The limited scalability of current preprocessing techniques has, however, formed a bottleneck for full utilization of these data resources. Although short oligonucleotide arrays constitute a major source of genome-wide profiling data, scalable probe-level techniques have been available only for few platforms based on pre-calculated probe effects from restricted reference training sets. To overcome these key limitations, we introduce a fully scalable online-learning algorithm for probe-level analysis and pre-processing of large microarray atlases involving tens of thousands of arrays. In contrast to the alternatives, our algorithm scales up linearly with respect to sample size and is applicable to all short oligonucleotide platforms. The model can use the most comprehensive data collections available to date to pinpoint individual probes affected by noise and biases, providing tools to guide array design and quality control. This is the only available algorithm that can learn probe-level parameters based on sequential hyperparameter updates at small consecutive batches of data, thus circumventing the extensive memory requirements of the standard approaches and opening up novel opportunities to take full advantage of contemporary microarray collections.

Citing Articles

A novel Bayesian framework for harmonizing information across tissues and studies to increase cell type deconvolution accuracy.

Deng W, Li B, Wang J, Jiang W, Yan X, Li N Brief Bioinform. 2023; 24(1).

PMID: 36631398 PMC: 9851324. DOI: 10.1093/bib/bbac616.


Partial restoration of normal intestinal microbiota in morbidly obese women six months after bariatric surgery.

Koffert J, Lahti L, Nylund L, Salminen S, Hannukainen J, Salminen P PeerJ. 2020; 8:e10442.

PMID: 33304658 PMC: 7700738. DOI: 10.7717/peerj.10442.


Associations between Pro- and Anti-Inflammatory Gastro-Intestinal Microbiota, Diet, and Cognitive Functioning in Dutch Healthy Older Adults: The NU-AGE Study.

van Soest A, Hermes G, Berendsen A, van de Rest O, Zoetendal E, Fuentes S Nutrients. 2020; 12(11).

PMID: 33198235 PMC: 7697493. DOI: 10.3390/nu12113471.


Does entry to center-based childcare affect gut microbial colonization in young infants?.

Hermes G, Eckermann H, de Vos W, de Weerth C Sci Rep. 2020; 10(1):10235.

PMID: 32581284 PMC: 7314774. DOI: 10.1038/s41598-020-66404-z.


Gut Microbiota and Body Weight in School-Aged Children: The KOALA Birth Cohort Study.

Mbakwa C, Hermes G, Penders J, Savelkoul P, Thijs C, Dagnelie P Obesity (Silver Spring). 2018; 26(11):1767-1776.

PMID: 30296366 PMC: 6646907. DOI: 10.1002/oby.22320.


References
1.
Rung J, Brazma A . Reuse of public genome-wide gene expression data. Nat Rev Genet. 2012; 14(2):89-99. DOI: 10.1038/nrg3394. View

2.
Leek J, Johnson W, Parker H, Jaffe A, Storey J . The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012; 28(6):882-3. PMC: 3307112. DOI: 10.1093/bioinformatics/bts034. View

3.
Lahti L, Elo L, Aittokallio T, Kaski S . Probabilistic analysis of probe reliability in differential gene expression studies with short oligonucleotide arrays. IEEE/ACM Trans Comput Biol Bioinform. 2010; 8(1):217-25. DOI: 10.1109/TCBB.2009.38. View

4.
Zheng-Bradley X, Rung J, Parkinson H, Brazma A . Large scale comparison of global gene expression patterns in human and mouse. Genome Biol. 2010; 11(12):R124. PMC: 3046484. DOI: 10.1186/gb-2010-11-12-r124. View

5.
Nikkila J, de Vos W . Advanced approaches to characterize the human intestinal microbiota by computational meta-analysis. J Clin Gastroenterol. 2010; 44 Suppl 1:S2-5. DOI: 10.1097/MCG.0b013e3181e5018f. View