askill
bio-rna-quantification-featurecounts-counting

bio-rna-quantification-featurecounts-countingSafety 95Repository

Count reads per gene from aligned BAM files using Subread featureCounts. Use when processing BAM files from STAR/HISAT2 to generate gene-level counts for DESeq2/edgeR.

251 stars
5k downloads
Updated 2/14/2026

Package Files

Loading files...
SKILL.md

Version Compatibility

Reference examples tested with: DESeq2 1.42+, HISAT2 2.2.1+, STAR 2.7.11+, Subread 2.0+, edgeR 4.0+, pandas 2.2+, scanpy 1.10+

Before using code patterns, verify installed versions match. If versions differ:

  • Python: pip show <package> then help(module.function) to check signatures
  • R: packageVersion('<pkg>') then ?function_name to verify parameters
  • CLI: <tool> --version then <tool> --help to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

featureCounts Counting

"Count reads per gene from my BAM files" → Assign aligned reads to genomic features using a GTF annotation to produce a gene-by-sample count matrix for DE analysis.

  • CLI: featureCounts -a genes.gtf -o counts.txt sample1.bam sample2.bam

Count reads mapping to genomic features (genes, exons) from BAM files.

Basic Usage

# Single sample
featureCounts -a annotation.gtf -o counts.txt aligned.bam

# Multiple samples (recommended - single matrix output)
featureCounts -a annotation.gtf -o counts.txt sample1.bam sample2.bam sample3.bam

# All BAMs in directory
featureCounts -a annotation.gtf -o counts.txt *.bam

Paired-End Data

# Count fragments, not reads (required for paired-end)
featureCounts -p --countReadPairs -a annotation.gtf -o counts.txt *.bam

# Check proper pairs only
featureCounts -p --countReadPairs -B -C -a annotation.gtf -o counts.txt *.bam

Flags:

  • -p - Input is paired-end
  • --countReadPairs - Count fragments instead of reads
  • -B - Only count properly paired reads
  • -C - Don't count chimeric fragments

Strand-Specific Libraries

# Unstranded (default)
featureCounts -s 0 -a annotation.gtf -o counts.txt *.bam

# Forward stranded (e.g., dUTP, NSR)
featureCounts -s 1 -a annotation.gtf -o counts.txt *.bam

# Reverse stranded (e.g., Illumina TruSeq, most common)
featureCounts -s 2 -a annotation.gtf -o counts.txt *.bam

Determining strandedness: Use infer_experiment.py from RSeQC or check library prep protocol.

Feature Types

# Count at gene level (default)
featureCounts -t exon -g gene_id -a annotation.gtf -o counts.txt *.bam

# Count at transcript level
featureCounts -t exon -g transcript_id -a annotation.gtf -o counts.txt *.bam

# Count CDS only
featureCounts -t CDS -g gene_id -a annotation.gtf -o counts.txt *.bam

Flags:

  • -t - Feature type in GTF (default: exon)
  • -g - Meta-feature attribute (default: gene_id)

Multi-Mapping Reads

# Discard multi-mappers (default, recommended for DE)
featureCounts -a annotation.gtf -o counts.txt *.bam

# Count multi-mappers (fractional)
featureCounts -M --fraction -a annotation.gtf -o counts.txt *.bam

# Count multi-mappers (full count to each location)
featureCounts -M -a annotation.gtf -o counts.txt *.bam

Overlapping Features

# Discard reads overlapping multiple features (default)
featureCounts -a annotation.gtf -o counts.txt *.bam

# Count reads overlapping multiple features
featureCounts -O -a annotation.gtf -o counts.txt *.bam

# Fractional count for overlaps
featureCounts -O --fraction -a annotation.gtf -o counts.txt *.bam

Performance Options

# Use multiple threads
featureCounts -T 8 -a annotation.gtf -o counts.txt *.bam

# Use less memory (slower)
featureCounts --largeBAM -a annotation.gtf -o counts.txt *.bam

Output Files

featureCounts produces two files:

  1. counts.txt - Main count matrix
Geneid  Chr     Start   End     Strand  Length  sample1.bam  sample2.bam
GENE1   chr1    100     500     +       400     1523         1891
GENE2   chr1    1000    2000    -       1000    892          756
  1. counts.txt.summary - Assignment statistics
Status                  sample1.bam  sample2.bam
Assigned                1523456      1678234
Unassigned_Unmapped     12345        11234
Unassigned_NoFeatures   234567       245678

Extract Count Matrix

# Remove first 6 columns (metadata) to get just counts
cut -f1,7- counts.txt | tail -n +2 > count_matrix.txt

Python Processing

import pandas as pd

counts = pd.read_csv('counts.txt', sep='\t', comment='#')
count_matrix = counts.set_index('Geneid').iloc[:, 5:]  # Skip metadata columns
count_matrix.columns = [c.replace('.bam', '') for c in count_matrix.columns]
count_matrix.to_csv('count_matrix.csv')

R Processing

counts <- read.table('counts.txt', header=TRUE, row.names=1, skip=1)
count_matrix <- counts[, 6:ncol(counts)]  # Skip metadata columns
colnames(count_matrix) <- gsub('.bam', '', colnames(count_matrix))

Common Issues

Low assignment rate:

  • Check strandedness setting (-s)
  • Verify GTF matches reference genome version
  • Check BAM alignment quality

Zero counts for known expressed genes:

  • Ensure feature type matches GTF (-t exon vs -t gene)
  • Check gene_id attribute name in GTF

Related Skills

  • alignment-files/sam-bam-basics - Input BAM file handling
  • genome-intervals/gtf-gff-handling - GTF annotation files
  • differential-expression/deseq2-basics - Downstream analysis with counts
  • rna-quantification/count-matrix-qc - QC of count data

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

95/100Analyzed 2/15/2026

Metadata

Licenseunknown
Version-
Updated2/14/2026
PublisherGPTomics

Tags

api