Assemblathon是一个关于de novo Genome Assembly for Next-generation Sequencing 的竞赛。
各个研究组使用同一测序数据,各自进行拼接,最后比较各个结果。
第一次竞赛数据来源于模拟HiSeq 2000的sequencing,以便于分析拼接错误。
第一次竞赛结果在CSHL Biology of Genomes meeting(我明年会去)上公布,并发表在近期genome research上。
第二次竞赛使用真实测序数据,已经截止提交各自结果,正在进行后续比较分析。
Assemblathon 1: A competitive assessment of de novo short read assembly methods
Abstract
Low cost short read sequencing technology has revolutionised genomics, though it is only just becoming practical for the high quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort teams were asked to assemble a simulated Illumina HiSeq dataset of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling and copy number were made. We establish that within this benchmark (1) it is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods.