Software Packages for Whole Genome Alignment

| Comments

More softwares and information will be added. (latest updated on 28/02/2011)

Several available tools for alignment of whole genomes.

WABA (Kent and Zahler 2000) Wobble Aware Bulk Aligner for cross-species whole genome alignment

LASTZ or BLASTZ * recommended (Schwartz et al. 2003) LASTZ is a program for aligning DNA sequences, a pairwise aligner. Originally designed to handle sequences the size of human chromosomes and from different species, it is also useful for sequences produced by NGS sequencing technologies such as Roche 454.

LAGAN (Brudno et al. 2003) The Lagan Tookit is a set of alignment programs for comparative genomics. The three main components are a pairwise aligner (LAGAN), a multiple aligner (M-LAGAN), and a glocal aligner (Shuffle-LAGAN). All three are based on the CHAOS local alignment tool and combine speed (regions up to several megabases can be aligned in minutes) with high accuracy. The results of the alignment can be visualized using the VISTA visualization tool.

MUMMER *recommended (3 papers for 3 versions MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form. For example, MUMmer 3.0 can find all 20-basepair or longer exact matches between a pair of 5-megabase genomes in 13.7 seconds, using 78 MB of memory, on a 2.4 GHz Linux desktop computer. MUMmer can also align incomplete genomes; it can easily handle the 100s or 1000s of contigs from a shotgun sequencing project, and will align them to another set of contigs or a genome using the NUCmer program included with the system. If the species are too divergent for a DNA sequence alignment to detect similarity, then the PROmer program can generate alignments based upon the six-frame translations of both input sequences.

AVID (or click here)(Bray et al. 2002) AVID is designed to be fast, memory efficient, and practical for sequence alignments of large genomic regions up to megabases long.

Cgaln (Nakato and Gotoh et al. 2010) Cgaln (Coarse grained alignment) is a program designed to align a pair of whole genomic sequences of not only bacteria but also entire chromosomes of vertebrates on a nominal desktop computer. Cgaln performs an alignment job in two steps, at the block level and then at the nucleotide level. The former “coarse-grained” alignment can explore genomic rearrangements and reduce the regions to be analyzed in the next step. The latter is devoted to detailed alignment within the limited regions found in the first stage. The output of Cgaln is ‘glocal’ in the sense that rearrangements are taken into consideration while each alignable region is extended as long as possible. Thus, Cgaln is not only fast and memory-efficient, but also can filter noisy outputs without missing the most important homologous segment pairs.

LAST **highly recommended (Kiełbasa et al. 2011)

LAST can:

  • Handle bigsequence data, e.g:
    • Compare two vertebrate genomes
    • Align billions of DNA reads to a genome
  • Indicate the reliability of each aligned column.
  • Use sequence quality data properly.
  • Compare DNA to proteins, with frameshifts.
  • Compare PSSMs to sequences
  • Calculate the likelihood of chance similarities between random sequences.
Alfresco (Dalca and Brudno 2008) A key feature of the program is to use available analysis programs relevant to comparative genome sequence analysis, combine the results of these, and graphically present them in an intuitive way, thereby facilitating the analysis of large genomic regions.

Softwares for finding nearly identical regions in whole genomes very fast

BLAT (read everything about it on Wikipedia or UCSC Genome Browser and FAQ) (Kent 2002) BLAT (the BLAST-Like Alignment Tool) is a software program developed by Jim Kent at UCSC to identify similarities between DNA sequences and protein sequences.BLAT is much faster than older tools such as BLAST for nucleotide and protein alignments, and it can also perform spliced alignments of RNA to DNA.

BLAST (megablast) OK! Everyone knows it! Just click it for latest version of blast. (Please keep in mind it’s BLAST not BLAST+, for BLAST+, click here)

SSAHA2 (Ning et al 2001 paper about SSAHA) (Sequence Search and Alignment by Hashing Algorithm) is a pairwise sequence alignment program designed for the efficient mapping of sequencing reads onto genomic reference sequences. SSAHA2 reads of most sequencing platforms (ABI-Sanger, Roche 454, Illumina-Solexa) and a range of output formats (SAM, CIGAR, PSL etc.) are supported. A pile-up pipeline for analysis and genotype calling is available as a separate package.