» Articles » PMID: 32369554

HiC-Hiker: a Probabilistic Model to Determine Contig Orientation in Chromosome-length Scaffolds with Hi-C

Overview
Journal Bioinformatics
Specialty Biology
Date 2020 May 6
PMID 32369554
Citations 8
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: De novo assembly of reference-quality genomes used to require enormously laborious tasks. In particular, it is extremely time-consuming to build genome markers for ordering assembled contigs along chromosomes; thus, they are only available for well-established model organisms. To resolve this issue, recent studies demonstrated that Hi-C could be a powerful and cost-effective means to output chromosome-length scaffolds for non-model species with no genome marker resources, because the Hi-C contact frequency between a pair of two loci can be a good estimator of their genomic distance, even if there is a large gap between them. Indeed, state-of-the-art methods such as 3D-DNA are now widely used for locating contigs in chromosomes. However, it remains challenging to reduce errors in contig orientation because shorter contigs have fewer contacts with their neighboring contigs. These orientation errors lower the accuracy of gene prediction, read alignment, and synteny block estimation in comparative genomics.

Results: To reduce these contig orientation errors, we propose a new algorithm, named HiC-Hiker, which has a firm grounding in probabilistic theory, rigorously models Hi-C contacts across contigs, and effectively infers the most probable orientations via the Viterbi algorithm. We compared HiC-Hiker and 3D-DNA using human and worm genome contigs generated from short reads, evaluated their performances, and observed a remarkable reduction in the contig orientation error rate from 4.3% (3D-DNA) to 1.7% (HiC-Hiker). Our algorithm can consider long-range information between distal contigs and precisely estimates Hi-C read contact probabilities among contigs, which may also be useful for determining the ordering of contigs.

Availability And Implementation: HiC-Hiker is freely available at: https://github.com/ryought/hic_hiker.

Citing Articles

The Bioinformatic Applications of Hi-C and Linked Reads.

Jiang L, Quail M, Fraser-Govil J, Wang H, Shi X, Oliver K Genomics Proteomics Bioinformatics. 2024; 22(4).

PMID: 38905513 PMC: 11580686. DOI: 10.1093/gpbjnl/qzae048.


A reference quality genome assembly for the jewel scarab Chrysina gloriosa.

Sylvester T, Hoover Z, Hjelmen C, Jonika M, Blackmon L, Alfieri J G3 (Bethesda). 2024; 14(6).

PMID: 38630623 PMC: 11152064. DOI: 10.1093/g3journal/jkae084.


High-quality genome assembly and multi-omics analysis of pigment synthesis pathway in .

Ma X, Lu L, Yao F, Fang M, Wang P, Meng J Front Microbiol. 2023; 14:1211795.

PMID: 37396365 PMC: 10308021. DOI: 10.3389/fmicb.2023.1211795.


A reference genome for Bluegill (Centrarchidae: Lepomis macrochirus).

Ludt W, Corbett E, Kattawar J, Chakrabarty P, Faircloth B G3 (Bethesda). 2023; 13(3).

PMID: 36683458 PMC: 9997549. DOI: 10.1093/g3journal/jkad019.


EndHiC: assemble large contigs into chromosome-level scaffolds using the Hi-C links from contig ends.

Wang S, Wang H, Jiang F, Wang A, Liu H, Zhao H BMC Bioinformatics. 2022; 23(1):528.

PMID: 36482318 PMC: 9730666. DOI: 10.1186/s12859-022-05087-x.


References
1.
Clavijo B, Venturini L, Schudoma C, Accinelli G, Kaithakottil G, Wright J . An improved assembly and annotation of the allohexaploid wheat genome identifies complete families of agronomic genes and provides genomic evidence for chromosomal translocations. Genome Res. 2017; 27(5):885-896. PMC: 5411782. DOI: 10.1101/gr.217117.116. View

2.
Lieberman-Aiden E, van Berkum N, Williams L, Imakaev M, Ragoczy T, Telling A . Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009; 326(5950):289-93. PMC: 2858594. DOI: 10.1126/science.1181369. View

3.
Zhang J, Zhang X, Tang H, Zhang Q, Hua X, Ma X . Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. Nat Genet. 2018; 50(11):1565-1573. DOI: 10.1038/s41588-018-0237-2. View

4.
Zhang X, Zhang S, Zhao Q, Ming R, Tang H . Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat Plants. 2019; 5(8):833-845. DOI: 10.1038/s41477-019-0487-8. View

5.
Weisenfeld N, Kumar V, Shah P, Church D, Jaffe D . Direct determination of diploid genome sequences. Genome Res. 2017; 27(5):757-767. PMC: 5411770. DOI: 10.1101/gr.214874.116. View