Structural Variation Detection by Whole Genome De Novo Assembly

| Comments

Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly

Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1–50 kb) including insertions, deletions, inversions and their precise breakpoints, and in contrast to other methods, can resolve complex rearrangements. In total, we identified 277,243 SVs ranging in length from 1–23 kb. Validation using computational and experimental methods suggests that we achieve overall <6% false-positive rate and <10% false-negative rate in genomic regions that can be assembled, which outperforms other methods. Analysis of the SVs in the genomes of 106 individuals sequenced as part of the 1000 Genomes Project suggests that SVs account for a greater fraction of the diversity between individuals than do single-nucleotide polymorphisms (SNPs). These findings demonstrate that whole-genome de novo assembly is a feasible approach to deriving more comprehensive maps of genetic variation.


通过whole genome de novo assembly来确定Structural variation(SV)。

用拼接好的scaffolds和human reference genome来比对。BGI利用BLATLASTZ,加上一些自己开发的脚本(scripts)建立了一个SV detection的流程(pipeline),叫做SOAPsv。这篇文章等于是这个pipeline的首次实践。

验证(validation)是一个必要的程序,不像千人基因组 (paper)用实验方法(wet),他们同时用了实验(wet)和电脑计算(dry)两种方法。关于computational方法,是用BWA或者SOAPaligner来把原始的reads align到reference genome上。为什么不能用这个align的结果来做SV detection呢?其他的软件基本都是这样做的。为了“创新”,另辟蹊径,用de novo assembly,但同时还需要align to reference,是不是有点走弯路?。另外一个疑问:如果align的结果和de novo assembly的序列不一样呢?

也和另外两个软件做了比较,BreakDancer (paper)和Pindel (paper),“据说”是比这两个软件的精确性要高。(“similar sensitivity but improved precision”)但是,为什么不和其他软件比较呢?

最后要提另一篇重量级的文章:Mapping copy number variation by population-scale genome sequencing,使用了几乎所有的SV detection的软件,并实验验证,还探讨了一些SV发生机制。是一篇集大成的文章,需要仔细研究,有空写篇博客介绍一下。