askill
bio-read-qc-fastp-workflow

bio-read-qc-fastp-workflowSafety 100Repository

All-in-one read preprocessing with fastp including adapter trimming, quality filtering, deduplication, base correction, and HTML report generation. Use when preprocessing Illumina data and wanting a single fast tool instead of separate Cutadapt, Trimmomatic, and FastQC steps.

251 stars
5k downloads
Updated 2/14/2026

Package Files

Loading files...
SKILL.md

Version Compatibility

Reference examples tested with: FastQC 0.12+, fastp 0.23+

Before using code patterns, verify installed versions match. If versions differ:

  • Python: pip show <package> then help(module.function) to check signatures
  • CLI: <tool> --version then <tool> --help to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

fastp Workflow

All-in-one preprocessing tool that handles adapter trimming, quality filtering, deduplication, and report generation in a single pass.

"Preprocess FASTQ reads with fastp" → Run adapter trimming, quality filtering, and QC reporting in a single pass.

  • CLI: fastp -i R1.fq -I R2.fq -o clean_R1.fq -O clean_R2.fq --html report.html

Basic Usage

Single-End

fastp -i input.fastq.gz -o output.fastq.gz

Paired-End

fastp -i R1.fastq.gz -I R2.fastq.gz -o R1_clean.fastq.gz -O R2_clean.fastq.gz

With Custom HTML/JSON Reports

fastp -i R1.fq.gz -I R2.fq.gz \
      -o R1_clean.fq.gz -O R2_clean.fq.gz \
      -h sample_report.html \
      -j sample_report.json

Adapter Trimming

fastp auto-detects Illumina adapters by default.

# Auto-detect (default)
fastp -i in.fq -o out.fq

# Specify adapters manually
fastp -i in.fq -o out.fq \
      --adapter_sequence AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

# Paired-end with manual adapters
fastp -i R1.fq -I R2.fq -o R1.out.fq -O R2.out.fq \
      --adapter_sequence AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
      --adapter_sequence_r2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

# Disable adapter trimming
fastp -i in.fq -o out.fq --disable_adapter_trimming

# Adapter FASTA file
fastp -i in.fq -o out.fq --adapter_fasta adapters.fa

Quality Filtering

# Per-base quality threshold (default Q15)
fastp -i in.fq -o out.fq -q 20

# Mean read quality threshold
fastp -i in.fq -o out.fq -e 25

# Max unqualified bases percent (default 40)
fastp -i in.fq -o out.fq -q 20 --unqualified_percent_limit 30

# Disable quality filtering
fastp -i in.fq -o out.fq --disable_quality_filtering

Quality Trimming

# Sliding window from 3' end (recommended)
fastp -i in.fq -o out.fq \
      --cut_right \
      --cut_right_window_size 4 \
      --cut_right_mean_quality 20

# Sliding window from 5' end
fastp -i in.fq -o out.fq \
      --cut_front \
      --cut_front_window_size 4 \
      --cut_front_mean_quality 20

# Both ends
fastp -i in.fq -o out.fq \
      --cut_front --cut_tail \
      --cut_front_window_size 4 \
      --cut_front_mean_quality 20 \
      --cut_tail_window_size 4 \
      --cut_tail_mean_quality 20

Length Filtering

# Minimum length (default 15)
fastp -i in.fq -o out.fq -l 36

# Maximum length
fastp -i in.fq -o out.fq --length_limit 150

# Required length (discard shorter AND longer)
fastp -i in.fq -o out.fq -l 100 --length_limit 100

Poly-X Trimming

# Trim poly-G (NovaSeq/NextSeq artifacts) - auto-enabled for these platforms
fastp -i in.fq -o out.fq --trim_poly_g

# Disable poly-G trimming
fastp -i in.fq -o out.fq --disable_trim_poly_g

# Trim poly-X (any homopolymer)
fastp -i in.fq -o out.fq --trim_poly_x

# Custom poly-G minimum length (default 10)
fastp -i in.fq -o out.fq --trim_poly_g --poly_g_min_len 5

N Base Handling

# Max N bases (default 5)
fastp -i in.fq -o out.fq -n 3

# Disable N filtering
fastp -i in.fq -o out.fq --n_base_limit 50

Deduplication

# Enable deduplication
fastp -i in.fq -o out.fq --dedup

# Accuracy level (1-6, higher = more memory, default 3)
fastp -i in.fq -o out.fq --dedup --dup_calc_accuracy 4

Base Correction (Paired-End Only)

# Enable overlap-based correction
fastp -i R1.fq -I R2.fq -o R1.out.fq -O R2.out.fq --correction

# Required overlap length (default 30)
fastp -i R1.fq -I R2.fq -o R1.out.fq -O R2.out.fq \
      --correction --overlap_len_require 20

Paired-End Merge

# Merge overlapping paired reads
fastp -i R1.fq -I R2.fq \
      --merge --merged_out merged.fq \
      -o R1_unmerged.fq -O R2_unmerged.fq

UMI Processing

# UMI in read (extract to header)
fastp -i in.fq -o out.fq \
      --umi --umi_loc read1 --umi_len 8

# UMI in separate read
fastp -i R1.fq -I R2.fq -o R1.out.fq -O R2.out.fq \
      --umi --umi_loc index1

# UMI locations: index1, index2, read1, read2, per_index, per_read

Complete Workflow Example

Standard Illumina Pipeline

fastp \
    -i raw_R1.fastq.gz -I raw_R2.fastq.gz \
    -o clean_R1.fastq.gz -O clean_R2.fastq.gz \
    --detect_adapter_for_pe \
    --cut_right --cut_right_window_size 4 --cut_right_mean_quality 20 \
    -q 20 -l 36 \
    --thread 8 \
    -h sample_fastp.html -j sample_fastp.json

NovaSeq/NextSeq Pipeline

fastp \
    -i raw_R1.fastq.gz -I raw_R2.fastq.gz \
    -o clean_R1.fastq.gz -O clean_R2.fastq.gz \
    --detect_adapter_for_pe \
    --trim_poly_g \
    --cut_right --cut_right_window_size 4 --cut_right_mean_quality 20 \
    -q 20 -l 36 \
    --thread 8 \
    -h sample_fastp.html -j sample_fastp.json

RNA-seq Pipeline

fastp \
    -i raw_R1.fastq.gz -I raw_R2.fastq.gz \
    -o clean_R1.fastq.gz -O clean_R2.fastq.gz \
    --detect_adapter_for_pe \
    --cut_right --cut_right_window_size 4 --cut_right_mean_quality 20 \
    -q 20 -l 50 \
    --thread 8 \
    -h sample_fastp.html -j sample_fastp.json

Output Files

FileDescription
*.htmlInteractive HTML report
*.jsonMachine-readable statistics
Output FASTQProcessed reads

JSON Report Structure

import json

with open('sample_fastp.json') as f:
    report = json.load(f)

summary = report['summary']
print(f"Total reads: {summary['before_filtering']['total_reads']}")
print(f"Passed reads: {summary['after_filtering']['total_reads']}")
print(f"Q20 rate: {summary['after_filtering']['q20_rate']:.2%}")
print(f"Q30 rate: {summary['after_filtering']['q30_rate']:.2%}")

Performance

# Set threads (default 3)
fastp -i in.fq -o out.fq --thread 8

# Disable HTML report (faster)
fastp -i in.fq -o out.fq --html /dev/null

# Process from stdin
zcat in.fq.gz | fastp --stdin -o out.fq

Related Skills

  • quality-reports - MultiQC can aggregate fastp JSON reports
  • adapter-trimming - Cutadapt for complex adapter scenarios
  • quality-filtering - Trimmomatic alternative

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

94/100Analyzed 2/19/2026

Highly comprehensive bioinformatics skill for fastp read preprocessing. Excellent coverage of all major features with clear CLI examples, multiple workflow templates for different platforms (Illumina, NovaSeq, RNA-seq), and useful related skills links. Includes version compatibility notes, JSON parsing examples, and performance tips. The description clearly explains when to use this tool vs alternatives. Tags improve discoverability. Minor gap: could include error handling/troubleshooting section. No safety concerns - pure read processing tool.

100
95
90
95
95

Metadata

Licenseunknown
Version-
Updated2/14/2026
PublisherGPTomics

Tags

apici-cdgithub-actions