The Sum of Two Halves May Be Different from the Whole-Effects of Splitting Sequencing Samples Across Lanes

Overview

Journal Genes (Basel)

Publisher MDPI

Date 2022 Dec 23

PMID 36553532

Authors

Eleanor C Williams

Ruben Chazarra-Gil

Arash Shahsavari

Irina Mohorianu

Affiliations

Soon will be listed here.

Abstract

The advances in high-throughput sequencing (HTS) have enabled the characterisation of biological processes at an unprecedented level of detail; most hypotheses in molecular biology rely on analyses of HTS data. However, achieving increased robustness and reproducibility of results remains a main challenge. Although variability in results may be introduced at various stages, e.g., alignment, summarisation or detection of differential expression, one source of variability was systematically omitted: the sequencing design, which propagates through analyses and may introduce an additional layer of technical variation. We illustrate qualitative and quantitative differences arising from splitting samples across lanes on bulk and single-cell sequencing. For bulk mRNAseq data, we focus on differential expression and enrichment analyses; for bulk ChIPseq data, we investigate the effect on peak calling and the peaks' properties. At the single-cell level, we concentrate on identifying cell subpopulations. We rely on markers used for assigning cell identities; both smartSeq and 10× data are presented. The observed reduction in the number of unique sequenced fragments limits the level of detail on which the different prediction approaches depend. Furthermore, the sequencing stochasticity adds in a weighting bias corroborated with variable sequencing depths and (yet unexplained) sequencing bias. Subsequently, we observe an overall reduction in sequencing complexity and a distortion in the biological signal across technologies, experimental contexts, organisms and tissues.

References

Thurmond J, Goodman J, Strelets V, Attrill H, Gramates L, Marygold S . FlyBase 2.0: the next generation. Nucleic Acids Res. 2018; 47(D1):D759-D765. PMC: 6323960. DOI: 10.1093/nar/gky1003. View

Steward C, Parker A, Minassian B, Sisodiya S, Frankish A, Harrow J . Genome annotation for clinical genomic diagnostics: strengths and weaknesses. Genome Med. 2017; 9(1):49. PMC: 5448149. DOI: 10.1186/s13073-017-0441-1. View

McCarthy D, Chen Y, Smyth G . Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 2012; 40(10):4288-97. PMC: 3378882. DOI: 10.1093/nar/gks042. View

Holmstrom S, Hautaniemi S, Hakkinen A . POIBM: batch correction of heterogeneous RNA-seq datasets through latent sample matching. Bioinformatics. 2022; 38(9):2474-2480. PMC: 9048693. DOI: 10.1093/bioinformatics/btac124. View

Love M, Huber W, Anders S . Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014; 15(12):550. PMC: 4302049. DOI: 10.1186/s13059-014-0550-8. View

Mohorianu I, Bretman A, Smith D, Fowler E, Dalmay T, Chapman T . Genomic responses to the socio-sexual environment in male exposed to conspecific rivals. RNA. 2017; 23(7):1048-1059. PMC: 5473139. DOI: 10.1261/rna.059246.116. View

Oshlack A, Robinson M, Young M . From RNA-seq reads to differential expression results. Genome Biol. 2010; 11(12):220. PMC: 3046478. DOI: 10.1186/gb-2010-11-12-220. View

Fei T, Zhang T, Shi W, Yu T . Mitigating the adverse impact of batch effects in sample pattern detection. Bioinformatics. 2018; 34(15):2634-2641. PMC: 6061843. DOI: 10.1093/bioinformatics/bty117. View

Kim B, Lee E, Kim J . Analysis of Technical and Biological Variability in Single-Cell RNA Sequencing. Methods Mol Biol. 2019; 1935:25-43. DOI: 10.1007/978-1-4939-9057-3_3. View

10.

Hicks S, Townes F, Teng M, Irizarry R . Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics. 2017; 19(4):562-578. PMC: 6215955. DOI: 10.1093/biostatistics/kxx053. View

11.

Moutsopoulos I, Maischak L, Lauzikaite E, Vasquez Urbina S, Williams E, Drost H . noisyR: enhancing biological signal in sequencing datasets by characterizing random technical noise. Nucleic Acids Res. 2021; 49(14):e83. PMC: 8373073. DOI: 10.1093/nar/gkab433. View

12.

Dobin A, Davis C, Schlesinger F, Drenkow J, Zaleski C, Jha S . STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2012; 29(1):15-21. PMC: 3530905. DOI: 10.1093/bioinformatics/bts635. View

13.

Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck 3rd W . Comprehensive Integration of Single-Cell Data. Cell. 2019; 177(7):1888-1902.e21. PMC: 6687398. DOI: 10.1016/j.cell.2019.05.031. View

14.

Beckers M, Mohorianu I, Stocks M, Applegate C, Dalmay T, Moulton V . Comprehensive processing of high-throughput small RNA sequencing data including quality checking, normalization, and differential expression analysis using the UEA sRNA Workbench. RNA. 2017; 23(6):823-835. PMC: 5435855. DOI: 10.1261/rna.059360.116. View

15.

Dillies M, Rau A, Aubert J, Hennequet-Antier C, Jeanmougin M, Servant N . A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief Bioinform. 2012; 14(6):671-83. DOI: 10.1093/bib/bbs046. View

16.

Chazarra-Gil R, van Dongen S, Kiselev V, Hemberg M . Flexible comparison of batch correction methods for single-cell RNA-seq using BatchBench. Nucleic Acids Res. 2021; 49(7):e42. PMC: 8053088. DOI: 10.1093/nar/gkab004. View

17.

Stupnikov A, Tripathi S, De Matos Simoes R, McArt D, Salto-Tellez M, Glazko G . samExploreR: exploring reproducibility and robustness of RNA-seq results based on SAM files. Bioinformatics. 2016; 32(21):3345-3347. DOI: 10.1093/bioinformatics/btw475. View

18.

Dal Molin A, Di Camillo B . How to design a single-cell RNA-sequencing experiment: pitfalls, challenges and perspectives. Brief Bioinform. 2018; 20(4):1384-1394. DOI: 10.1093/bib/bby007. View

19.

Liao Y, Smyth G, Shi W . featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2013; 30(7):923-30. DOI: 10.1093/bioinformatics/btt656. View

20.

Reuter J, Spacek D, Snyder M . High-throughput sequencing technologies. Mol Cell. 2015; 58(4):586-97. PMC: 4494749. DOI: 10.1016/j.molcel.2015.05.004. View