» Articles » PMID: 28566690

FastGT: an Alignment-free Method for Calling Common SNVs Directly from Raw Sequencing Reads

Overview
Journal Sci Rep
Specialty Science
Date 2017 Jun 2
PMID 28566690
Citations 22
Authors
Affiliations
Soon will be listed here.
Abstract

We have developed a computational method that counts the frequencies of unique k-mers in FASTQ-formatted genome data and uses this information to infer the genotypes of known variants. FastGT can detect the variants in a 30x genome in less than 1 hour using ordinary low-cost server hardware. The overall concordance with the genotypes of two Illumina "Platinum" genomes is 99.96%, and the concordance with the genotypes of the Illumina HumanOmniExpress is 99.82%. Our method provides k-mer database that can be used for the simultaneous genotyping of approximately 30 million single nucleotide variants (SNVs), including >23,000 SNVs from Y chromosome. The source code of FastGT software is available at GitHub (https://github.com/bioinfo-ut/GenomeTester4/).

Citing Articles

Space-efficient computation of k-mer dictionaries for large values of k.

Diaz-Dominguez D, Leinonen M, Salmela L Algorithms Mol Biol. 2024; 19(1):14.

PMID: 38581000 PMC: 10996146. DOI: 10.1186/s13015-024-00259-1.


SAKE: Strobemer-assisted k-mer extraction.

Leinonen M, Salmela L PLoS One. 2023; 18(11):e0294415.

PMID: 38019768 PMC: 10686461. DOI: 10.1371/journal.pone.0294415.


GeneToCN: an alignment-free method for gene copy number estimation directly from next-generation sequencing reads.

Pajuste F, Remm M Sci Rep. 2023; 13(1):17765.

PMID: 37853040 PMC: 10584998. DOI: 10.1038/s41598-023-44636-z.


DOCEST-fast and accurate estimator of human NGS sequencing depth and error rate.

Kaplinski L, Mols M, Puurand T, Remm M Bioinform Adv. 2023; 3(1):vbad084.

PMID: 37641716 PMC: 10460481. DOI: 10.1093/bioadv/vbad084.


Exploring the sorghum race level diversity utilizing 272 sorghum accessions genomic resources.

Ruperao P, Gandham P, Odeny D, Mayes S, Selvanayagam S, Thirunavukkarasu N Front Plant Sci. 2023; 14:1143512.

PMID: 37008459 PMC: 10063887. DOI: 10.3389/fpls.2023.1143512.


References
1.
Roosaare M, Vaher M, Kaplinski L, Mols M, Andreson R, Lepamets M . StrainSeeker: fast identification of bacterial strains from raw sequencing reads using user-provided guide trees. PeerJ. 2017; 5:e3353. PMC: 5438578. DOI: 10.7717/peerj.3353. View

2.
Langmead B, Salzberg S . Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012; 9(4):357-9. PMC: 3322381. DOI: 10.1038/nmeth.1923. View

3.
Derrien T, Estelle J, Marco Sola S, Knowles D, Raineri E, Guigo R . Fast computation and applications of genome mappability. PLoS One. 2012; 7(1):e30377. PMC: 3261895. DOI: 10.1371/journal.pone.0030377. View

4.
Song L, Florea L, Langmead B . Lighter: fast and memory-efficient sequencing error correction without counting. Genome Biol. 2014; 15(11):509. PMC: 4248469. DOI: 10.1186/s13059-014-0509-9. View

5.
Lee H, Schatz M . Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score. Bioinformatics. 2012; 28(16):2097-105. PMC: 3413383. DOI: 10.1093/bioinformatics/bts330. View