FASTX-Toolkit: The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/ FASTQ files preprocessing, including:
- FASTQ-to-FASTA converter
Convert FASTQ files to FASTA files.
- FASTQ Information
Chart Quality Statistics and Nucleotide Distribution
- FASTQ/A Collapser
Collapsing identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)
- FASTQ/A Trimmer
Shortening reads in a FASTQ or FASTQ files (removing barcodes or noise).
- FASTQ/A Renamer
Renames the sequence identifiers in FASTQ/A file.
- FASTQ/A Clipper
Removing sequencing adapters / linkers
- FASTQ/A Reverse-Complement
Producing the Reverse-complement of each sequence in a FASTQ/FASTA file.
- FASTQ/A Barcode splitter
Splitting a FASTQ/FASTA files containning multiple samples
- FASTA Formatter
changes the width of sequences line in a FASTA file
- FASTA Nucleotide Changer
Convets FASTA sequences from/to RNA/DNA
- FASTQ Quality Filter
Filters sequences based on quality
- FASTQ Quality Trimmer
Trims (cuts) sequences based on quality
- FASTQ Masker
Masks nucleotides with ‘N’ (or other character) based on quality
NGSQC (citation):The NGSQC pipeline provides a set of novel quality control measures for quickly detecting a wide variety of quality issues in deep sequencing data derived from two dimensional surfaces, regardless of the assay technology used. It also enables researchers to determine whether sequencing data related to their most interesting biological discoveries are caused by sequencing quality issues. NGSQC can help to ensure that biological conclusions, in particular those based on relatively rare sequences, are not caused by low quality sequencing.
NGS QC Toolkit (citation): A toolkit for the quality control (QC) of next generation sequencing (NGS) data. The toolkit comprises of user-friendly stand alone tools for quality control of the sequence data generated using Illumina and Roche 454 platforms with detailed results in the form of tables and graphs, and filtering of high-quality sequence data. It also includes few other tools, which are helpful in NGS data quality control and analysis.
PRINSEQ (citation): PRINSEQ can be used to filter, reformat, or trim your genomic and metagenomic sequence data. It generates summary statistics of your sequences in graphical and tabular format. It is easily configurable and provides a user-friendly interface.
SolexaQA (citation): SolexaQA is a Perl-based software package that calculates quality statistics and creates visual representations of data quality from FASTQ files generated by Illumina second-generation sequencing technology (“Solexa”).