askill
pysam

pysamSafety 90Repository

Python module for reading, manipulating and writing genomic alignment formats (SAM/BAM/CRAM) and variant files (VCF/BCF). Wrapper for htslib.

0 stars
1.2k downloads
Updated 2/1/2026

Package Files

Loading files...
SKILL.md

Pysam - Genomic Alignments

Used for high-throughput sequencing pipelines. It allows efficient access to billions of DNA fragments aligned to a reference genome.

When to Use

  • Processing next-generation sequencing (NGS) data.
  • Analyzing genomic variants (SNPs, indels).
  • Extracting reads from specific genomic regions.
  • Building custom bioinformatics pipelines.
  • Quality control of sequencing data.

Core Principles

Indexed Access

BAM files must be indexed (.bai) for efficient random access to genomic regions.

Coordinate System

Genomic coordinates are 0-based (Python-style) for positions, but 1-based for ranges in some contexts.

Read Attributes

Each read contains sequence, quality scores, alignment position, and flags.

Quick Reference

Standard Imports

import pysam

Basic Patterns

# 1. Open BAM file
samfile = pysam.AlignmentFile("aligned_reads.bam", "rb")

# 2. Iterate over reads in a specific genomic region
for read in samfile.fetch("chr1", 10000, 10100):
    print(f"Read: {read.query_name}, Quality: {read.mapping_quality}")
    print(f"Sequence: {read.query_sequence}")
    print(f"Position: {read.reference_start}")

# 3. Variant analysis (VCF)
vcf = pysam.VariantFile("mutations.vcf")
for rec in vcf.fetch("chr1", 10000, 10100):
    print(f"Pos: {rec.pos}, Ref: {rec.ref}, Alt: {rec.alts}")
    print(f"Genotype: {rec.samples['sample1']['GT']}")

# 4. Writing aligned reads
outfile = pysam.AlignmentFile("output.bam", "wb", template=samfile)
for read in samfile:
    if read.mapping_quality > 30:
        outfile.write(read)
outfile.close()

Critical Rules

✅ DO

  • Always use indexed files - Create index with pysam.index("file.bam") for fast access.
  • Check read flags - Use read.is_paired, read.is_unmapped to filter reads.
  • Handle unmapped reads - Unmapped reads have reference_start = -1.
  • Close files explicitly - Use context managers or .close() to avoid resource leaks.

❌ DON'T

  • Don't iterate over entire BAM - Use fetch() with regions for efficiency.
  • Don't ignore quality scores - Low-quality bases can cause false variants.
  • Don't mix coordinate systems - Be consistent with 0-based vs 1-based indexing.

Advanced Patterns

Counting Reads per Gene

# Using a gene annotation file
genes = {}  # gene_name -> (chr, start, end)
for read in samfile.fetch():
    # Check if read overlaps any gene
    for gene, (chr, start, end) in genes.items():
        if read.reference_name == chr and start <= read.reference_start < end:
            genes[gene]['count'] += 1

Variant Filtering

# Filter variants by quality and depth
for rec in vcf.fetch():
    depth = rec.samples['sample1']['DP']
    qual = rec.qual
    if depth > 10 and qual > 20:
        # Process high-quality variant
        pass

Pysam provides the low-level access needed for genomic data processing, enabling researchers to work directly with the raw data of life itself.

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

95/100Analyzed 2/13/2026

An excellent, high-quality reference for the pysam library. It provides clear context on when to use the tool, actionable code snippets for common tasks, and critical best practices.

90
95
95
90
95

Metadata

Licenseunknown
Version-
Updated2/1/2026
Publishertondevrel

Tags

No tags yet.