askill
bio-read-qc-quality-reports

bio-read-qc-quality-reportsSafety 95Repository

Generate and interpret quality reports from FASTQ files using FastQC and MultiQC. Assess per-base quality, adapter content, GC bias, duplication levels, and overrepresented sequences. Use when performing initial QC on raw sequencing data or validating preprocessing results.

251 stars
5k downloads
Updated 2/14/2026

Package Files

Loading files...
SKILL.md

Version Compatibility

Reference examples tested with: pandas 2.2+

Before using code patterns, verify installed versions match. If versions differ:

  • Python: pip show <package> then help(module.function) to check signatures
  • CLI: <tool> --version then <tool> --help to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Quality Reports

Generate quality reports for FASTQ files using FastQC and aggregate multiple reports with MultiQC.

"Run quality control on FASTQ files" → Generate per-base quality, adapter content, and duplication plots, then aggregate across samples.

  • CLI: fastqc *.fastq.gz then multiqc .

FastQC - Single Sample Reports

Basic Usage

# Single file
fastqc sample.fastq.gz

# Multiple files
fastqc *.fastq.gz

# Specify output directory
fastqc -o qc_reports/ sample_R1.fastq.gz sample_R2.fastq.gz

# Set threads
fastqc -t 4 *.fastq.gz

Output Files

FastQC produces two files per input:

  • sample_fastqc.html - Interactive HTML report
  • sample_fastqc.zip - Data files and images

Key Modules

ModuleWhat It ShowsWarning Signs
Per base sequence qualityQuality scores across readDrop below Q20 at 3' end
Per sequence qualityQuality score distributionBimodal distribution
Per base sequence contentNucleotide compositionImbalance at start (normal)
Per sequence GC contentGC distributionSecondary peak (contamination)
Per base N contentUnknown basesHigh N content
Sequence length distributionRead lengthsUnexpected variation
Sequence duplicationDuplicate readsHigh duplication (PCR)
Overrepresented sequencesCommon sequencesAdapter contamination
Adapter contentAdapter sequencesVisible adapter curves

Extract Data from ZIP

# Unzip to access raw data
unzip sample_fastqc.zip

# View summary
cat sample_fastqc/summary.txt

# Get per-base quality
cat sample_fastqc/fastqc_data.txt | grep -A 50 ">>Per base sequence quality"

MultiQC - Aggregate Reports

Basic Usage

# Aggregate all FastQC reports in current directory
multiqc .

# Specify input and output
multiqc qc_reports/ -o multiqc_output/

# Custom report name
multiqc . -n my_project_qc

# Force overwrite
multiqc . -f

Common Options

# Flat directory (no sample subdirs)
multiqc --flat .

# Export data as TSV
multiqc . --export

# Only specific modules
multiqc . -m fastqc

# Exclude patterns
multiqc . --ignore '*_trimmed*'

# Include patterns
multiqc . --ignore-samples '*negative*'

Output Files

  • multiqc_report.html - Interactive HTML report
  • multiqc_data/ - Directory with data tables
    • multiqc_fastqc.txt - FastQC metrics
    • multiqc_general_stats.txt - Summary statistics
    • multiqc_sources.txt - Source files used

Extract Data Programmatically

import pandas as pd

general_stats = pd.read_csv('multiqc_data/multiqc_general_stats.txt', sep='\t')
print(general_stats.columns)

fastqc_data = pd.read_csv('multiqc_data/multiqc_fastqc.txt', sep='\t')

Batch Processing

Process Multiple Samples

# All FASTQ files in parallel
fastqc -t 8 -o qc_reports/ raw_data/*.fastq.gz

# Then aggregate
multiqc qc_reports/ -o multiqc_output/

Before and After Trimming

# Create separate directories
mkdir -p qc_reports/raw qc_reports/trimmed

# QC raw reads
fastqc -o qc_reports/raw/ raw_data/*.fastq.gz

# After trimming (using fastp, cutadapt, etc.)
fastqc -o qc_reports/trimmed/ trimmed_data/*.fastq.gz

# Compare with MultiQC
multiqc qc_reports/ -o qc_comparison/

Interpretation Guide

Quality Scores

Phred ScoreError RateInterpretation
Q400.0001Excellent
Q300.001Good (Illumina target)
Q200.01Acceptable
Q100.1Poor

Common Issues

IssueLikely CauseAction
Low quality at 3' endNormal degradationTrim 3' end
Adapter contaminationShort insertsTrim adapters
GC biasLibrary prepConsider correction
High duplicationLow complexity, PCRMark/remove duplicates
Overrepresented seqsAdapters, primersCheck sequences

Configuration

Custom Adapters

Create ~/.fastqc/Configuration/adapter_list.txt:

Custom_Adapter_Name    ACGTACGTACGT

Custom Limits

Create ~/.fastqc/Configuration/limits.txt to customize thresholds:

# Warn if mean quality below 25
quality_sequence    warn    25
quality_sequence    error   20

Related Skills

  • adapter-trimming - Remove adapters detected by FastQC
  • fastp-workflow - All-in-one QC and trimming
  • sequence-io/read-sequences - FASTQ file reading/writing

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

88/100Analyzed 3 days ago

High-quality bioinformatics skill for FASTQ quality control. Provides comprehensive coverage of FastQC and MultiQC with clear CLI examples, output interpretation tables, and batch processing guidance. Well-structured reference that can be reused across sequencing projects. Includes when-to-use trigger, structured commands, and proper skills folder organization. Minor issue with mismatched tags (api, github-actions, observability don't fit this bioinformatics QC skill).

95
90
80
85
90

Metadata

Licenseunknown
Version-
Updated2/14/2026
PublisherGPTomics

Tags

apigithub-actionsobservability