Next-generation Sequencing Quality Control Tools

| Comments

FASTX-Toolkit: The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/ FASTQ files preprocessing, including:

  • FASTQ-to-FASTA converter
    Convert FASTQ files to FASTA files.
  • FASTQ Information
    Chart Quality Statistics and Nucleotide Distribution
  • FASTQ/A Collapser
    Collapsing identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)
  • FASTQ/A Trimmer
    Shortening reads in a FASTQ or FASTQ files (removing barcodes or noise).
  • FASTQ/A Renamer
    Renames the sequence identifiers in FASTQ/A file.
  • FASTQ/A Clipper
    Removing sequencing adapters / linkers
  • FASTQ/A Reverse-Complement
    Producing the Reverse-complement of each sequence in a FASTQ/FASTA file.
  • FASTQ/A Barcode splitter
    Splitting a FASTQ/FASTA files containning multiple samples
  • FASTA Formatter
    changes the width of sequences line in a FASTA file
  • FASTA Nucleotide Changer
    Convets FASTA sequences from/to RNA/DNA
  • FASTQ Quality Filter
    Filters sequences based on quality
  • FASTQ Quality Trimmer
    Trims (cuts) sequences based on quality
  • FASTQ Masker
    Masks nucleotides with ‘N’ (or other character) based on quality
Galaxy NGS QC and manipulation tools (citation): Galaxy provides a tool suite that functions on all of the commonly known FASTQ format variants and provides a pipeline for manipulating next generation sequencing data taken from a sequencing machine all the way through the quality filtering steps.

NGSQC (citation):The NGSQC pipeline provides a set of novel quality control measures for quickly detecting a wide variety of quality issues in deep sequencing data derived from two dimensional surfaces, regardless of the assay technology used. It also enables researchers to determine whether sequencing data related to their most interesting biological discoveries are caused by sequencing quality issues. NGSQC can help to ensure that biological conclusions, in particular those based on relatively rare sequences, are not caused by low quality sequencing.

NGS QC Toolkit (citation): A toolkit for the quality control (QC) of next generation sequencing (NGS) data. The toolkit comprises of user-friendly stand alone tools for quality control of the sequence data generated using Illumina and Roche 454 platforms with detailed results in the form of tables and graphs, and filtering of high-quality sequence data. It also includes few other tools, which are helpful in NGS data quality control and analysis.

PRINSEQ (citation): PRINSEQ can be used to filter, reformat, or trim your genomic and metagenomic sequence data. It generates summary statistics of your sequences in graphical and tabular format. It is easily configurable and provides a user-friendly interface.

SolexaQA (citation): SolexaQA is a Perl-based software package that calculates quality statistics and creates visual representations of data quality from FASTQ files generated by Illumina second-generation sequencing technology (“Solexa”).