askill
bio-rna-quantification-alignment-free-quant

bio-rna-quantification-alignment-free-quantSafety 95Repository

Quantify transcript expression using pseudo-alignment with Salmon or kallisto. Use when quantifying transcripts with Salmon or kallisto.

10 stars
1.2k downloads
Updated 2/16/2026

Package Files

Loading files...
SKILL.md

Alignment-Free Quantification

Quantify transcript abundance directly from FASTQ reads using pseudo-alignment (kallisto) or selective alignment (Salmon).

Salmon Workflow

Build Index

# Download transcriptome FASTA
# Ensembl: Homo_sapiens.GRCh38.cdna.all.fa.gz

# Basic index (fast, less accurate)
salmon index -t transcripts.fa -i salmon_index

# Decoy-aware index (recommended for accuracy)
# First, create decoys from genome
grep "^>" genome.fa | cut -d " " -f 1 | sed 's/>//g' > decoys.txt
cat transcripts.fa genome.fa > gentrome.fa
salmon index -t gentrome.fa -d decoys.txt -i salmon_index -p 8

Quantify Samples

# Paired-end reads
salmon quant -i salmon_index -l A \
    -1 sample_R1.fastq.gz -2 sample_R2.fastq.gz \
    -o sample_quant -p 8

# Single-end reads
salmon quant -i salmon_index -l A \
    -r sample.fastq.gz \
    -o sample_quant -p 8

Key flags:

  • -l A - Automatically detect library type
  • -p - Number of threads
  • --validateMappings - More accurate (default in recent versions)
  • --gcBias - Correct for GC bias
  • --seqBias - Correct for sequence-specific bias

Library Types

CodeDescription
AAutomatic detection (recommended)
ISRInward, stranded, read 1 from reverse
ISFInward, stranded, read 1 from forward
IUInward, unstranded

Batch Processing

for sample in sample1 sample2 sample3; do
    salmon quant -i salmon_index -l A \
        -1 ${sample}_R1.fastq.gz -2 ${sample}_R2.fastq.gz \
        -o ${sample}_quant -p 8
done

Output Files

sample_quant/
├── quant.sf           # Main quantification file
├── aux_info/          # Auxiliary information
├── cmd_info.json      # Command used
├── lib_format_counts.json  # Library format detection
└── logs/              # Log files

quant.sf format:

Name                    Length  EffectiveLength TPM         NumReads
ENST00000456328.2       1657    1477.000        0.000000    0.000
ENST00000450305.2       632     452.000         12.345678   156.789

kallisto Workflow

Build Index

kallisto index -i kallisto_index transcripts.fa

Quantify Samples

# Paired-end
kallisto quant -i kallisto_index -o sample_quant \
    sample_R1.fastq.gz sample_R2.fastq.gz

# Single-end (must specify fragment length)
kallisto quant -i kallisto_index -o sample_quant \
    --single -l 200 -s 20 sample.fastq.gz

# With bootstraps (for sleuth)
kallisto quant -i kallisto_index -o sample_quant -b 100 \
    sample_R1.fastq.gz sample_R2.fastq.gz

Key flags:

  • -b - Number of bootstrap samples
  • -t - Number of threads
  • --single - Single-end mode
  • -l - Estimated fragment length (single-end)
  • -s - Fragment length standard deviation

Output Files

sample_quant/
├── abundance.tsv      # Main quantification (text)
├── abundance.h5       # HDF5 format (for sleuth)
└── run_info.json      # Run information

abundance.tsv format:

target_id               length  eff_length  est_counts  tpm
ENST00000456328.2       1657    1477.00     0.00        0.000000
ENST00000450305.2       632     452.00      156.79      12.345678

Salmon vs kallisto

FeatureSalmonkallisto
SpeedFastFastest
AccuracyHigherGood
GC bias correctionYesNo
Decoy sequencesYesNo
Memory usageModerateLow

Recommendation: Use Salmon for production, kallisto for quick exploratory analysis.

Combining Results

# Salmon: use tximport in R
# kallisto: use tximport or sleuth

# Quick Python combination
python << 'EOF'
import pandas as pd
from pathlib import Path

samples = ['sample1', 'sample2', 'sample3']
tpm_data = {}
counts_data = {}

for sample in samples:
    quant_file = Path(f'{sample}_quant/quant.sf')  # Salmon
    # quant_file = Path(f'{sample}_quant/abundance.tsv')  # kallisto
    df = pd.read_csv(quant_file, sep='\t', index_col=0)
    tpm_data[sample] = df['TPM']
    counts_data[sample] = df['NumReads']  # or est_counts for kallisto

tpm_matrix = pd.DataFrame(tpm_data)
counts_matrix = pd.DataFrame(counts_data)
tpm_matrix.to_csv('tpm_matrix.csv')
counts_matrix.to_csv('counts_matrix.csv')
EOF

Quality Checks

# Check mapping rate from Salmon logs
grep "Mapping rate" sample_quant/logs/salmon_quant.log

# Check library type detection
cat sample_quant/lib_format_counts.json

Good metrics:

  • Mapping rate > 70%
  • Consistent library type across samples

Common Issues

Low mapping rate:

  • Wrong transcriptome version
  • Contamination in samples
  • Wrong library type

Inconsistent library types:

  • Mixed library preparations
  • Sample swap

Related Skills

  • read-qc/fastp-workflow - Upstream preprocessing
  • rna-quantification/tximport-workflow - Import results to R
  • rna-quantification/count-matrix-qc - QC of quantification
  • differential-expression/deseq2-basics - Downstream analysis

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

95/100Analyzed 2/12/2026

An excellent, highly actionable technical guide for RNA-seq quantification using Salmon and kallisto. It provides comprehensive workflows, comparison tables, and scripts for downstream processing, though the metadata tags are mismatched.

95
95
95
95
98

Metadata

Licenseunknown
Version-
Updated2/16/2026
Publishermdbabumiamssm

Tags

ci-cdgithub-actionsobservability