NYU Bioinformatics Lab Courant Institute of Mathematical Sciences

SUTTA

Scoring-and-Unfolding Trimmed Tree Assembler

Description

SUTTA is a new De Novo DNA sequence assembler based on global search-methods (e.g. branch-and-bound or beam search) in order to contain the complexity of the assembly problem. Some of the features are:

  • Technology Agnostic: supports different set of technologies with minimal changes to its architecture (currently long Sanger reads and short next-generation Illumina reads). It can potentially use different long range technologies (mate-pairs, optical maps, dilution sequencing). Currently it supports standard mate-pair information available both for short and long reads.
  • Search strategy: each contig (continuous sequence of the genome) is assembled independently and dynamically without creating in advance the graph that describes the overlapping relations between all the reads;
  • Score-based: score functions are used to evaluate the DNA sequences concurrently while being assembled. The functions combine different structural properties (e.g., transitivity, coverage, mated pairs, physical maps, etc).


Illustration of the contig construction: (i) a "double-tree" is constructed by generating LEFT and RIGHT trees for the root node; (ii) best left and right paths are selected and joined together; (iii) the reads layout is computed for the set of reads in the full path.

People

Examples

Dot Plot for Brucella Suis:

Length (bp) = 3,315,173; Num. of reads = 36,276; Avg read length = 895.8; Coverage = 9.8X

Implementation

The current version of the SUTTA assembler is prototyped around the AMOS (A Modular Open-Source assembler) assembly framework. AMOS supports a central data repository of various genomic objects (reads, inserts, maps, overlaps, contigs, scaffolds, etc.) to be easily collected and indexed. The AMOS framework also provides several algorithms to perform some of the standard steps in the assembly pipeline (e.g., Trimming, Overlapping, Error Correction, Scaffolding, Validation). SUTTA's pipeline is composed of three modules: (1) overlapper, (2) contigger, and (3) multi-aligner. We developed our tools for the first two steps, instead we relied on the "make-consensus'' module available in AMOS for the computation of the final consensus sequence.

For this reason, in order to use SUTTA, it is first required to install the AMOS package available for download at http://sourceforge.net/projects/amos/files/. SUTTA has been tested on AMOS version 2.0.8.

Downloads

SUTTA is supplied for free to scientists at non-commercial institutions for educational and research purposes.
Please complete the following form to receive a copy of the software.

References

  • Narzisi G. and Mishra B.:
    Scoring-and-Unfolding Trimmed Tree Assembler: Concepts, Constructs and Comparisons.
    Bioinformatics, Oxford Journals, 2010 (DOI: 10.1093/bioinformatics/btq646).
    [Revision, Feb 10 2011]

  • Narzisi G.:
    SUTTA: Scoring-and-Unfolding Trimmed Tree Assembler. Workshop on Genomic Signal Processing and Statistics (GENSIPS), Cold Spring Harbor Laboratory (CSHL), November 10 - 12, 2010. (poster)

  • Narzisi G. and Mishra B.:
    A novel Technologically Agnostic De Novo Sequence Assembler. Systems Biology and New Sequencing Technologies (SBNST), Centre for Genomic Regulation (CRG), Barcelona, 16-17-18 June, 2010 (poster).

Acknowledgement

Research reported here was supported by grants from NSF CDI program and Abraxis BioScience, LLC.