» Articles » PMID: 21801405

Addressing Challenges in the Production and Analysis of Illumina Sequencing Data

Overview
Journal BMC Genomics
Publisher Biomed Central
Specialty Genetics
Date 2011 Aug 2
PMID 21801405
Citations 70
Authors
Affiliations
Soon will be listed here.
Abstract

Advances in DNA sequencing technologies have made it possible to generate large amounts of sequence data very rapidly and at substantially lower cost than capillary sequencing. These new technologies have specific characteristics and limitations that require either consideration during project design, or which must be addressed during data analysis. Specialist skills, both at the laboratory and the computational stages of project design and analysis, are crucial to the generation of high quality data from these new platforms. The Illumina sequencers (including the Genome Analyzers I/II/IIe/IIx and the new HiScan and HiSeq) represent a widely used platform providing parallel readout of several hundred million immobilized sequences using fluorescent-dye reversible-terminator chemistry. Sequencing library quality, sample handling, instrument settings and sequencing chemistry have a strong impact on sequencing run quality. The presence of adapter chimeras and adapter sequences at the end of short-insert molecules, as well as increased error rates and short read lengths complicate many computational analyses. We discuss here some of the factors that influence the frequency and severity of these problems and provide solutions for circumventing these. Further, we present a set of general principles for good analysis practice that enable problems with sequencing runs to be identified and dealt with.

Citing Articles

Next-Generation Sequencing Methods to Determine the Accuracy of Retroviral Reverse Transcriptases: Advantages and Limitations.

Martinez Del Rio J, Menendez-Arias L Viruses. 2025; 17(2).

PMID: 40006928 PMC: 11861041. DOI: 10.3390/v17020173.


GenoPipe: identifying the genotype of origin within (epi)genomic datasets.

Lang O, Srivastava D, Pugh B, Lai W Nucleic Acids Res. 2023; 51(22):12054-12068.

PMID: 37933851 PMC: 10711449. DOI: 10.1093/nar/gkad950.


Mapinsights: deep exploration of quality issues and error profiles in high-throughput sequence data.

Das S, Biswas N, Basu A Nucleic Acids Res. 2023; 51(14):e75.

PMID: 37378434 PMC: 10415152. DOI: 10.1093/nar/gkad539.


Metagenomic Analysis of DNA Viruses with Targeted Sequence Capture of Canine Lobular Orbital Adenomas and Normal Conjunctiva.

Schaefer E, Chu S, Wylie K, Wylie T, Griffith O, Pearce J Microorganisms. 2023; 11(5).

PMID: 37317137 PMC: 10223289. DOI: 10.3390/microorganisms11051163.


Porcine fungal mock community analyses: Implications for mycobiome investigations.

Arfken A, Frey J, Carrillo N, Dike N, Onyeachonamm O, Rivera D Front Cell Infect Microbiol. 2023; 13:928353.

PMID: 36844394 PMC: 9945231. DOI: 10.3389/fcimb.2023.928353.


References
1.
Whiteford N, Skelly T, Curtis C, Ritchie M, Lohr A, Zaranek A . Swift: primary data analysis for the Illumina Solexa sequencing platform. Bioinformatics. 2009; 25(17):2194-9. PMC: 2734321. DOI: 10.1093/bioinformatics/btp383. View

2.
DeAngelis M, Wang D, HAWKINS T . Solid-phase reversible immobilization for the isolation of PCR products. Nucleic Acids Res. 1995; 23(22):4742-3. PMC: 307455. DOI: 10.1093/nar/23.22.4742. View

3.
Quail M, Kozarewa I, Smith F, Scally A, Stephens P, Durbin R . A large genome center's improvements to the Illumina sequencing system. Nat Methods. 2008; 5(12):1005-10. PMC: 2610436. DOI: 10.1038/nmeth.1270. View

4.
Li R, Yu C, Li Y, Lam T, Yiu S, Kristiansen K . SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 2009; 25(15):1966-7. DOI: 10.1093/bioinformatics/btp336. View

5.
Turner E, Lee C, Ng S, Nickerson D, Shendure J . Massively parallel exon capture and library-free resequencing across 16 genomes. Nat Methods. 2009; 6(5):315-6. PMC: 2703445. DOI: 10.1038/nmeth.f.248. View