Software Packages for Discovering Structural Variation With Next-generation Sequencing

| Comments

More softwares and information will be added. (latest updated on 28/02/2011)

Highly recommended you to read this paper. Mapping copy number variation by population-scale genome sequencing (Mills et al. 2011)

PEMer (Korbel et al. 2009) It comprises an analysis pipeline, compatible with several next-generation sequencing platforms; simulation-based error models, yielding confidence-values for each structural variant; and a back-end database.

SegSeq (Chiang et al. 2009)  Detect and localize copy-number alterations from massively parallel sequence data. A simple approach would be to partition the genome into windows of fixed size, estimate the tumor-normal ratios for each window and use standard segmentation algorithms to decompose the genome into regions of equivalent copy number.

VariationHunter (Hormozdiari et al. 2009) Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. It’s based on maximum parsimony

MoDIL (Lee et al.2009) MoDIL, or Mixture of Distributions Indel Locator, a novel method for finding medium sized indels from high throughput sequencing datasets. The MoDIL algorithm compares the distribution of insert sizes in the sequenced library to the distribution of the observed mapped distances at a particular genomic location.

Pindel (Ye et al. 2009) A pattern growth approach, to detect breakpoints of large deletions and medium-sized insertions from paired-end short reads.

BreakDancer (Chen et al. 2009) BreakDancerMax predicts five types of structural variants: insertions, deletions, inversions, inter- and intra-chromosomal translocations from next-generation short paired-end sequencing reads using read pairs that are mapped with unexpected separation distances or orientation.

cnvHMM copy number analysis using Hidden Markov algorithm

Geometric Analysis of Structural Variants (GASV) (Sindi et al 2009) A geometric approach for identification, classification and comparison of structural variants. The software is for analysis of structural variation from paired-end sequencing and/or array-CGH data.

Sequence Variant Analyzer (SVA) SVA is a computer software project designed to annotate, visualize, and analyze the genetic variants identified through next-generation sequencing studies, including whole-genome sequencing (WGS) and exome sequencing studies.

SWT It is a collection of R functions for statistical analysisof genome-wide data, by Qunyuan Zhang (, DSG.

VarScan (Koboldt et al. 2009) VarScan is a platform-independent, technology-independent software tool for identifying SNPs and indels in massively parallel sequencing of individual and pooled samples. (not actually structural variation)

CNV-seq (Xie and Tammi. 2009) The method is based on a robust statistical model that describes the complete analysis procedure and allows the computation of essential confidence values for detection of CNV.

BreakSeq (Lam et al. 2010) a pipeline for annotation, classification and analysis of SVs at single nucleotide resolution.

CopyMap (Zöllner 2010) The program package CopyMap identifies copy number variation from oligo-hybridization and CGH data. Using a time-dependent hidden Markov model to combine evidence of copy number variants (CNVs) across multiple carriers, CopyMap is substantially more accurate than standard hidden Markov methods in identifying CNVs and calling CNV-carriers. Moreover, CopyMap provides more precise estimates of CNV-boundaries.

SLOPE (Abel et al. 2010) A quick and accurate method for locating non-SNP structural variation from targeted next-generation sequence data.
By focusing on a small (a few kb to a few Mb) target reference sequence, SLOPE can perform fast and flexible split-read alignments and determine ‘chimeric’ sequences with single-base resolution.
SLOPE aims to detect sequence breakpoints from only one side of a split read, and therefore does not rely on the insert size for detection.

HYDRA (Quinlan et al. 2010) Hydra detects structural variation (SV) breakpoints by clustering discordant paired-end alignments whose “signatures” corroborate the same putative breakpoint. Hydra can detect breakpoints caused by all classes of structural variation. Moreover, it was designed to detect variation in both unique and duplicated genomic regions; therefore, it will examine paired-end reads having multiple discordant alignments.

CnD (Simpson et al. 2010) A copy number variant caller for inbred strains.
The target organism is assumed to be inbred, and therefore homozygous, so regions of apparent heterozygous SNPs (as called by MAQ) can be used to detect copy number gains. cnD uses both the rate of these paralogous sequence variants, and the raw sequence depth, to call copy number gains and losses using a hidden markov model.

AGE (Abyzov and Gerstein 2011) defining breakpoints of genomic structural variants at single-nucleotide resolution, through optimal alignments with gap excision.

CNVnator (Abyzov et al. 2011) An approach to discover, genotype and characterize typical and atypical CNVs from family and population genome sequencing.

Other relevant softwares

SAMtools SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.

Maq Maq stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences.