Making Sense of Score Statistics for Sequence Alignments

Overview

Journal Brief Bioinform

Publisher Oxford University Press

Specialty Biology

Date 2001 Jul 24

PMID 11465063

Citations 12

Authors

M Pagni

C V Jongeneel

Affiliations

Soon will be listed here.

Abstract

The search for similarity between two biological sequences lies at the core of many applications in bioinformatics. This paper aims to highlight a few of the principles that should be kept in mind when evaluating the statistical significance of alignments between sequences. The extreme value distribution is first introduced, which in most cases describes the distribution of alignment scores between a query and a database. The effects of the similarity matrix and gap penalty values on the score distribution are then examined, and it is shown that the alignment statistics can undergo an abrupt phase transition. A few types of random sequence databases used in the estimation of statistical significance are presented, and the statistics employed by the BLAST, FASTA and PRSS programs are compared. Finally the different strategies used to assess the statistical significance of the matches produced by profiles and hidden Markov models are presented.

Citing Articles

ULTRA-effective labeling of tandem repeats in genomic sequence.

Olson D, Wheeler T Bioinform Adv. 2024; 4(1):vbae149.

PMID: 39575229 PMC: 11580682. DOI: 10.1093/bioadv/vbae149.

ULTRA-Effective Labeling of Repetitive Genomic Sequence.

Olson D, Wheeler T bioRxiv. 2024; .

PMID: 38895435 PMC: 11185745. DOI: 10.1101/2024.06.03.597269.

A Puzzling Anomaly in the 4-Mer Composition of the Giant Pandoravirus Genomes Reveals a Stringent New Evolutionary Selection Process.

Poirot O, Jeudy S, Abergel C, Claverie J J Virol. 2019; 93(23).

PMID: 31534042 PMC: 6854483. DOI: 10.1128/JVI.01206-19.

Zona pellucida-binding protein 2 (ZPBP2) and several proteins containing BX7B motifs in human sperm may have hyaluronic acid binding or recognition properties.

Torabi F, Bogle O, Estanyol J, Oliva R, Miller D Mol Hum Reprod. 2017; 23(12):803-816.

PMID: 29126140 PMC: 5909853. DOI: 10.1093/molehr/gax053.

Density-based hierarchical clustering of pyro-sequences on a large scale--the case of fungal ITS1.

Pagni M, Niculita-Hirzel H, Pellissier L, Dubuis A, Xenarios I, Guisan A Bioinformatics. 2013; 29(10):1268-74.

PMID: 23539304 PMC: 3654712. DOI: 10.1093/bioinformatics/btt149.