» Articles » PMID: 31167638

Multi-allelic Positional Burrows-Wheeler Transform

Overview
Publisher Biomed Central
Specialty Biology
Date 2019 Jun 7
PMID 31167638
Citations 6
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Recent advances in whole-genome sequencing and SNP array technology have led to the generation of a large amount of genotype data. Large volumes of genotype data will require faster and more efficient methods for storing and searching the data. Positional Burrows-Wheeler Transform (PBWT) provides an appropriate data structure for bi-allelic data. With the increasing sample sizes, more multi-allelic sites are expected to be observed. Hence, there is a necessity to handle multi-allelic genotype data.

Results: In this paper, we introduce a multi-allelic version of the Positional Burrows-Wheeler Transform (mPBWT) based on the bi-allelic version for compression and searching. The time-complexity for constructing the data structure and searching within a panel containing t-allelic sites increases by a factor of t.

Conclusion: Considering the small value for the possible alleles t, the time increase for the multi-allelic PBWT will be negligible and comparable to the bi-allelic version of PBWT.

Citing Articles

Haplotype Matching with GBWT for Pangenome Graphs.

Sanaullah A, Villalobos S, Zhi D, Zhang S bioRxiv. 2025; .

PMID: 39975036 PMC: 11838520. DOI: 10.1101/2025.02.03.634410.


Discovery of runs-of-homozygosity diplotype clusters and their associations with diseases in UK Biobank.

Naseri A, Zhi D, Zhang S Elife. 2024; 13.

PMID: 38905121 PMC: 11249732. DOI: 10.7554/eLife.81698.


Minimal positional substring cover is a haplotype threading alternative to Li and Stephens model.

Sanaullah A, Zhi D, Zhang S Genome Res. 2023; 33(7):1007-1014.

PMID: 37316352 PMC: 10538481. DOI: 10.1101/gr.277673.123.


Computational graph pangenomics: a tutorial on data structures and their applications.

Baaijens J, Bonizzoni P, Boucher C, Della Vedova G, Pirola Y, Rizzi R Nat Comput. 2023; 21(1):81-108.

PMID: 36969737 PMC: 10038355. DOI: 10.1007/s11047-022-09882-6.


Minimal Positional Substring Cover: A Haplotype Threading Alternative to Li & Stephens Model.

Sanaullah A, Zhi D, Zhang S bioRxiv. 2023; .

PMID: 36711469 PMC: 9881975. DOI: 10.1101/2023.01.04.522803.


References
1.
Chen G, Marjoram P, Wall J . Fast and flexible simulation of DNA sequence data. Genome Res. 2008; 19(1):136-42. PMC: 2612967. DOI: 10.1101/gr.083634.108. View

2.
Hodgkinson A, Eyre-Walker A . Human triallelic sites: evidence for a new mutational mechanism?. Genetics. 2009; 184(1):233-41. PMC: 2815919. DOI: 10.1534/genetics.109.110510. View

3.
Sudmant P, Kitzman J, Antonacci F, Alkan C, Malig M, Tsalenko A . Diversity of human copy number variation and multicopy genes. Science. 2010; 330(6004):641-6. PMC: 3020103. DOI: 10.1126/science.1197005. View

4.
Campbell C, Sampas N, Tsalenko A, Sudmant P, Kidd J, Malig M . Population-genetic properties of differentiated human copy-number polymorphisms. Am J Hum Genet. 2011; 88(3):317-32. PMC: 3059424. DOI: 10.1016/j.ajhg.2011.02.004. View

5.
Thompson E . Identity by descent: variation in meiosis, across genomes, and in populations. Genetics. 2013; 194(2):301-26. PMC: 3664843. DOI: 10.1534/genetics.112.148825. View