» Articles » PMID: 14656959

Identification and Characterization of Multi-species Conserved Sequences

Overview
Journal Genome Res
Specialty Genetics
Date 2003 Dec 6
PMID 14656959
Citations 158
Authors
Affiliations
Soon will be listed here.
Abstract

Comparative sequence analysis has become an essential component of studies aiming to elucidate genome function. The increasing availability of genomic sequences from multiple vertebrates is creating the need for computational methods that can detect highly conserved regions in a robust fashion. Towards that end, we are developing approaches for identifying sequences that are conserved across multiple species; we call these "Multi-species Conserved Sequences" (or MCSs). Here we report two strategies for MCS identification, demonstrating their ability to detect virtually all known actively conserved sequences (specifically, coding sequences) but very little neutrally evolving sequence (specifically, ancestral repeats). Importantly, we find that a substantial fraction of the bases within MCSs (approximately 70%) resides within non-coding regions; thus, the majority of sequences conserved across multiple vertebrate species has no known function. Initial characterization of these MCSs has revealed sequences that correspond to clusters of transcription factor-binding sites, non-coding RNA transcripts, and other candidate functional elements. Finally, the ability to detect MCSs represents a valuable metric for assessing the relative contribution of a species' sequence to identifying genomic regions of interest, and our results indicate that the currently available genome sequences are insufficient for the comprehensive identification of MCSs in the human genome.

Citing Articles

Tissue-aware interpretation of genetic variants advances the etiology of rare diseases.

Argov C, Shneyour A, Jubran J, Sabag E, Mansbach A, Sepunaru Y Mol Syst Biol. 2024; 20(11):1187-1206.

PMID: 39285047 PMC: 11535248. DOI: 10.1038/s44320-024-00061-6.


Previously unmeasured genetic diversity explains part of Lewontin's paradox in a -mer-based meta-analysis of 112 plant species.

Roberts M, Josephs E bioRxiv. 2024; .

PMID: 38798362 PMC: 11118579. DOI: 10.1101/2024.05.17.594778.


A quantitative genetic model of background selection in humans.

Buffalo V, Kern A PLoS Genet. 2024; 20(3):e1011144.

PMID: 38507461 PMC: 10984650. DOI: 10.1371/journal.pgen.1011144.


Identification of clade-wide putative -regulatory elements from conserved non-coding sequences in Cucurbitaceae genomes.

Song H, Wang Q, Zhang Z, Lin K, Pang E Hortic Res. 2023; 10(4):uhad038.

PMID: 37799630 PMC: 10548412. DOI: 10.1093/hr/uhad038.


The Goldfish Genome and Its Utility for Understanding Gene Regulation and Vertebrate Body Morphology.

Omori Y, Burgess S Methods Mol Biol. 2023; 2707:335-355.

PMID: 37668923 DOI: 10.1007/978-1-0716-3401-1_22.


References
1.
Gottgens B, Barton L, GILBERT J, Bench A, Sanchez M, Bahn S . Analysis of vertebrate SCL loci identifies conserved enhancers. Nat Biotechnol. 2000; 18(2):181-6. DOI: 10.1038/72635. View

2.
Schwartz S, Elnitski L, Li M, Weirauch M, Riemer C, Smit A . MultiPipMaker and supporting tools: Alignments and analysis of multiple genomic DNA sequences. Nucleic Acids Res. 2003; 31(13):3518-24. PMC: 168985. DOI: 10.1093/nar/gkg579. View

3.
Schwartz S, Zhang Z, Frazer K, Smit A, Riemer C, Bouck J . PipMaker--a web server for aligning two genomic DNA sequences. Genome Res. 2000; 10(4):577-86. PMC: 310868. DOI: 10.1101/gr.10.4.577. View

4.
Zhang Z, Schwartz S, Wagner L, Miller W . A greedy algorithm for aligning DNA sequences. J Comput Biol. 2000; 7(1-2):203-14. DOI: 10.1089/10665270050081478. View

5.
Batzoglou S, Pachter L, Mesirov J, Berger B, Lander E . Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res. 2000; 10(7):950-8. PMC: 310911. DOI: 10.1101/gr.10.7.950. View