Version Compatibility
Reference examples tested with: bcftools 1.19+, pysam 0.22+, samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
- Python:
pip show <package>thenhelp(module.function)to check signatures - CLI:
<tool> --versionthen<tool> --helpto confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Pileup Generation
Generate pileup data for variant calling and position-level analysis.
"Generate pileup from BAM" → Produce per-position read summaries showing depth, bases, and qualities.
- CLI:
samtools mpileup -f ref.fa input.bam - Python:
bam.pileup(chrom, start, end)(pysam)
"Count alleles at a position" → Extract per-base read support at a specific genomic coordinate.
- Python: iterate
pileup_column.pileupsand count bases (pysam)
What is Pileup?
Pileup shows all reads covering each position in the reference, used for:
- Variant calling (with bcftools)
- Coverage analysis
- Allele frequency calculation
- SNP/indel detection
samtools mpileup
Basic Pileup
samtools mpileup -f reference.fa input.bam > pileup.txt
Pileup for Variant Calling (Output BCF)
samtools mpileup -f reference.fa -g input.bam -o output.bcf
Pileup Specific Region
samtools mpileup -f reference.fa -r chr1:1000000-2000000 input.bam
Regions from BED
samtools mpileup -f reference.fa -l targets.bed input.bam
Multiple BAM Files
samtools mpileup -f reference.fa sample1.bam sample2.bam sample3.bam > pileup.txt
Output Format
Text pileup format (6 columns per sample):
chr1 1000 A 15 ............... FFFFFFFFFFF
chr1 1001 T 12 ............ FFFFFFFFFFFF
| Column | Description |
|---|---|
| 1 | Chromosome |
| 2 | Position (1-based) |
| 3 | Reference base |
| 4 | Read depth |
| 5 | Read bases |
| 6 | Base qualities |
Read Bases Encoding
| Symbol | Meaning |
|---|---|
. | Match on forward strand |
, | Match on reverse strand |
ACGT | Mismatch (uppercase = forward) |
acgt | Mismatch (lowercase = reverse) |
^Q | Start of read (Q = MAPQ as ASCII) |
$ | End of read |
+NNN | Insertion of N bases |
-NNN | Deletion of N bases |
* | Deleted base |
> / < | Reference skip (intron) |
Quality Filtering Options
Minimum Mapping Quality
samtools mpileup -f reference.fa -q 20 input.bam
Minimum Base Quality
samtools mpileup -f reference.fa -Q 20 input.bam
Combined Quality Filters
samtools mpileup -f reference.fa -q 20 -Q 20 input.bam
Maximum Depth
# Prevent memory issues with high coverage
samtools mpileup -f reference.fa -d 1000 input.bam
Variant Calling Pipeline
Goal: Call variants from alignment data using the pileup-based approach.
Approach: Pipe samtools mpileup output directly into bcftools call for variant detection, applying quality filters at the pileup stage.
mpileup to bcftools call
samtools mpileup -f reference.fa input.bam | bcftools call -mv -o variants.vcf
Direct BCF Output
samtools mpileup -f reference.fa -g -o output.bcf input.bam
bcftools call -mv output.bcf -o variants.vcf
Full Pipeline
samtools mpileup -f reference.fa -q 20 -Q 20 input.bam | \
bcftools call -mv -Oz -o variants.vcf.gz
bcftools index variants.vcf.gz
pysam Python Alternative
Basic Pileup
import pysam
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for pileup_column in bam.pileup('chr1', 1000000, 1001000):
print(f'{pileup_column.reference_name}:{pileup_column.pos} depth={pileup_column.n}')
Access Reads at Position
import pysam
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for pileup_column in bam.pileup('chr1', 1000000, 1000001, truncate=True):
print(f'Position: {pileup_column.pos}')
print(f'Depth: {pileup_column.n}')
for pileup_read in pileup_column.pileups:
if pileup_read.is_del:
print(' Deletion')
elif pileup_read.is_refskip:
print(' Reference skip')
else:
qpos = pileup_read.query_position
base = pileup_read.alignment.query_sequence[qpos]
qual = pileup_read.alignment.query_qualities[qpos]
print(f' {base} (Q{qual})')
Count Alleles at Position
import pysam
from collections import Counter
def allele_counts(bam_path, chrom, pos):
counts = Counter()
with pysam.AlignmentFile(bam_path, 'rb') as bam:
for pileup_column in bam.pileup(chrom, pos, pos + 1, truncate=True):
if pileup_column.pos != pos:
continue
for pileup_read in pileup_column.pileups:
if pileup_read.is_del:
counts['DEL'] += 1
elif pileup_read.is_refskip:
continue
else:
qpos = pileup_read.query_position
base = pileup_read.alignment.query_sequence[qpos]
counts[base.upper()] += 1
return dict(counts)
counts = allele_counts('input.bam', 'chr1', 1000000)
print(counts) # {'A': 45, 'G': 5}
Calculate Allele Frequency
import pysam
from collections import Counter
def allele_frequency(bam_path, chrom, pos, min_qual=20):
counts = Counter()
with pysam.AlignmentFile(bam_path, 'rb') as bam:
for pileup_column in bam.pileup(chrom, pos, pos + 1, truncate=True,
min_base_quality=min_qual):
if pileup_column.pos != pos:
continue
for pileup_read in pileup_column.pileups:
if pileup_read.is_del or pileup_read.is_refskip:
continue
qpos = pileup_read.query_position
base = pileup_read.alignment.query_sequence[qpos]
counts[base.upper()] += 1
total = sum(counts.values())
if total == 0:
return {}
return {base: count / total for base, count in counts.items()}
freq = allele_frequency('input.bam', 'chr1', 1000000)
for base, f in sorted(freq.items(), key=lambda x: -x[1]):
print(f'{base}: {f:.1%}')
Pileup with Quality Filtering
import pysam
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for pileup_column in bam.pileup('chr1', 1000000, 1001000,
truncate=True,
min_mapping_quality=20,
min_base_quality=20):
print(f'{pileup_column.pos}: {pileup_column.n}')
Generate Pileup Text
import pysam
def pileup_text(bam_path, ref_path, chrom, start, end):
with pysam.AlignmentFile(bam_path, 'rb') as bam:
with pysam.FastaFile(ref_path) as ref:
for pileup_column in bam.pileup(chrom, start, end, truncate=True):
pos = pileup_column.pos
ref_base = ref.fetch(chrom, pos, pos + 1)
depth = pileup_column.n
bases = []
for pileup_read in pileup_column.pileups:
if pileup_read.is_del:
bases.append('*')
elif pileup_read.is_refskip:
bases.append('>')
else:
qpos = pileup_read.query_position
base = pileup_read.alignment.query_sequence[qpos]
if base.upper() == ref_base.upper():
bases.append('.' if not pileup_read.alignment.is_reverse else ',')
else:
bases.append(base.upper() if not pileup_read.alignment.is_reverse else base.lower())
print(f'{chrom}\t{pos+1}\t{ref_base}\t{depth}\t{"".join(bases)}')
pileup_text('input.bam', 'reference.fa', 'chr1', 1000000, 1000100)
Pileup Options Summary
| Option | Description |
|---|---|
-f FILE | Reference FASTA (required) |
-r REGION | Restrict to region |
-l FILE | BED file of regions |
-q INT | Min mapping quality |
-Q INT | Min base quality |
-d INT | Max depth (default 8000) |
-g | Output BCF format |
-u | Uncompressed BCF output |
Quick Reference
| Task | Command |
|---|---|
| Basic pileup | samtools mpileup -f ref.fa in.bam |
| Quality filter | samtools mpileup -f ref.fa -q 20 -Q 20 in.bam |
| Region | samtools mpileup -f ref.fa -r chr1:1-1000 in.bam |
| BCF output | samtools mpileup -f ref.fa -g in.bam -o out.bcf |
| To bcftools | samtools mpileup -f ref.fa in.bam | bcftools call -mv |
Common Errors
| Error | Cause | Solution |
|---|---|---|
No FASTA reference | Missing -f option | Add -f reference.fa |
Reference mismatch | Wrong reference | Use same reference as alignment |
| Out of memory | High coverage region | Use -d to cap depth |
Related Skills
- alignment-filtering - Filter BAM before pileup
- reference-operations - Index reference for pileup
- bam-statistics - depth command for coverage
- variant-calling - bcftools call from pileup
