» Articles » PMID: 23821648

Data-based Filtering for Replicated High-throughput Transcriptome Sequencing Experiments

Overview
Journal Bioinformatics
Specialty Biology
Date 2013 Jul 4
PMID 23821648
Citations 131
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: RNA sequencing is now widely performed to study differential expression among experimental conditions. As tests are performed on a large number of genes, stringent false-discovery rate control is required at the expense of detection power. Ad hoc filtering techniques are regularly used to moderate this correction by removing genes with low signal, with little attention paid to their impact on downstream analyses.

Results: We propose a data-driven method based on the Jaccard similarity index to calculate a filtering threshold for replicated RNA sequencing data. In comparisons with alternative data filters regularly used in practice, we demonstrate the effectiveness of our proposed method to correctly filter lowly expressed genes, leading to increased detection power for moderately to highly expressed genes. Interestingly, this data-driven threshold varies among experiments, highlighting the interest of the method proposed here.

Availability: The proposed filtering method is implemented in the R package HTSFilter available on Bioconductor.

Citing Articles

Profiling hippocampal neuronal populations reveals unique gene expression mosaics reflective of connectivity-based degeneration in the Ts65Dn mouse model of Down syndrome and Alzheimer's disease.

Alldred M, Ibrahim K, Pidikiti H, Lee S, Heguy A, Chiosis G Front Mol Neurosci. 2025; 18:1546375.

PMID: 40078964 PMC: 11897496. DOI: 10.3389/fnmol.2025.1546375.


Comprehensive investigation of proteoglycan gene expression in breast cancer: Discovery of a unique proteoglycan gene signature linked to the malignant phenotype.

Buraschi S, Pascal G, Liberatore F, Iozzo R Proteoglycan Res. 2025; 3(1).

PMID: 40066261 PMC: 11893098. DOI: 10.1002/pgr2.70014.


A comprehensive review and benchmark of differential analysis tools for Hi-C data.

Jorge E, Foissac S, Neuvial P, Zytnicki M, Vialaneix N Brief Bioinform. 2025; 26(2).

PMID: 40037641 PMC: 11879411. DOI: 10.1093/bib/bbaf074.


Heterochrony in orthodenticle expression is associated with ommatidial size variation between Drosophila species.

Torres-Oliva M, Buchberger E, Buffry A, Kittelmann M, Guerrero G, Sumner-Rooney L BMC Biol. 2025; 23(1):34.

PMID: 39901145 PMC: 11792340. DOI: 10.1186/s12915-025-02136-8.


Effect of exogenous treatment with zaxinone and its mimics on rice root microbiota across different growth stages.

Mazzarella T, Chialva M, de Souza L, Wang J, Votta C, Tiozon Jr R Sci Rep. 2024; 14(1):31374.

PMID: 39732893 PMC: 11682185. DOI: 10.1038/s41598-024-82833-6.


References
1.
Birney E, Andrews T, Bevan P, Caccamo M, Chen Y, Clarke L . An overview of Ensembl. Genome Res. 2004; 14(5):925-8. PMC: 479121. DOI: 10.1101/gr.1860604. View

2.
Bourgon R, Gentleman R, Huber W . Independent filtering increases detection power for high-throughput experiments. Proc Natl Acad Sci U S A. 2010; 107(21):9546-51. PMC: 2906865. DOI: 10.1073/pnas.0914005107. View

3.
Canovas A, Rincon G, Islas-Trejo A, Wickramasinghe S, Medrano J . SNP discovery in the bovine milk transcriptome using RNA-Seq technology. Mamm Genome. 2010; 21(11-12):592-8. PMC: 3002166. DOI: 10.1007/s00335-010-9297-z. View

4.
Oshlack A, Robinson M, Young M . From RNA-seq reads to differential expression results. Genome Biol. 2010; 11(12):220. PMC: 3046478. DOI: 10.1186/gb-2010-11-12-220. View

5.
Labaj P, Leparc G, Linggi B, Markillie L, Wiley H, Kreil D . Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Bioinformatics. 2011; 27(13):i383-91. PMC: 3117338. DOI: 10.1093/bioinformatics/btr247. View