Multi-allelic Positional Burrows-Wheeler Transform
Overview
Authors
Affiliations
Background: Recent advances in whole-genome sequencing and SNP array technology have led to the generation of a large amount of genotype data. Large volumes of genotype data will require faster and more efficient methods for storing and searching the data. Positional Burrows-Wheeler Transform (PBWT) provides an appropriate data structure for bi-allelic data. With the increasing sample sizes, more multi-allelic sites are expected to be observed. Hence, there is a necessity to handle multi-allelic genotype data.
Results: In this paper, we introduce a multi-allelic version of the Positional Burrows-Wheeler Transform (mPBWT) based on the bi-allelic version for compression and searching. The time-complexity for constructing the data structure and searching within a panel containing t-allelic sites increases by a factor of t.
Conclusion: Considering the small value for the possible alleles t, the time increase for the multi-allelic PBWT will be negligible and comparable to the bi-allelic version of PBWT.
Haplotype Matching with GBWT for Pangenome Graphs.
Sanaullah A, Villalobos S, Zhi D, Zhang S bioRxiv. 2025; .
PMID: 39975036 PMC: 11838520. DOI: 10.1101/2025.02.03.634410.
Naseri A, Zhi D, Zhang S Elife. 2024; 13.
PMID: 38905121 PMC: 11249732. DOI: 10.7554/eLife.81698.
Minimal positional substring cover is a haplotype threading alternative to Li and Stephens model.
Sanaullah A, Zhi D, Zhang S Genome Res. 2023; 33(7):1007-1014.
PMID: 37316352 PMC: 10538481. DOI: 10.1101/gr.277673.123.
Computational graph pangenomics: a tutorial on data structures and their applications.
Baaijens J, Bonizzoni P, Boucher C, Della Vedova G, Pirola Y, Rizzi R Nat Comput. 2023; 21(1):81-108.
PMID: 36969737 PMC: 10038355. DOI: 10.1007/s11047-022-09882-6.
Minimal Positional Substring Cover: A Haplotype Threading Alternative to Li & Stephens Model.
Sanaullah A, Zhi D, Zhang S bioRxiv. 2023; .
PMID: 36711469 PMC: 9881975. DOI: 10.1101/2023.01.04.522803.