askill
bio-alignment-sorting

bio-alignment-sortingSafety 95Repository

Sort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.

10 stars
1.2k downloads
Updated 2/16/2026

Package Files

Loading files...
SKILL.md

Alignment Sorting

Sort alignment files by coordinate or read name using samtools and pysam.

Sort Orders

OrderFlagUse Case
CoordinatedefaultIndexing, visualization, variant calling
Name-nPaired-end processing, fixmate, markdup
Tag-t TAGSort by specific tag value

samtools sort

Sort by Coordinate (Default)

samtools sort -o sorted.bam input.bam

Sort by Read Name

samtools sort -n -o namesorted.bam input.bam

Multi-threaded Sorting

samtools sort -@ 8 -o sorted.bam input.bam

Control Memory Usage

samtools sort -m 4G -@ 4 -o sorted.bam input.bam

Set Temporary Directory

samtools sort -T /tmp/sort_tmp -o sorted.bam input.bam

Specify Output Format

# Output as BAM (default)
samtools sort -O bam -o sorted.bam input.bam

# Output as CRAM
samtools sort -O cram --reference ref.fa -o sorted.cram input.bam

Sort by Tag

# Sort by cell barcode (10x Genomics)
samtools sort -t CB -o sorted_by_barcode.bam input.bam

Pipe from Aligner

bwa mem ref.fa reads.fq | samtools sort -o aligned.bam

samtools collate

Group paired reads together without full sorting (faster than name sort for some workflows):

# Collate paired reads
samtools collate -o collated.bam input.bam

# With output prefix for temp files
samtools collate -O input.bam /tmp/collate > collated.bam

# Fast mode (output to stdout)
samtools collate -u -O input.bam /tmp/collate | samtools fastq -1 R1.fq -2 R2.fq -

Check Sort Order

From Header

samtools view -H input.bam | grep "^@HD"
# SO:coordinate = coordinate sorted
# SO:queryname = name sorted
# SO:unsorted = not sorted

Verify Sorted

# Check if coordinate sorted (returns 0 if sorted)
samtools view input.bam | awk '$4 < prev {exit 1} {prev=$4}'

pysam Python Alternative

Sort with pysam

import pysam

pysam.sort('-o', 'sorted.bam', 'input.bam')

Sort by Name

pysam.sort('-n', '-o', 'namesorted.bam', 'input.bam')

Sort with Options

pysam.sort('-@', '4', '-m', '2G', '-o', 'sorted.bam', 'input.bam')

Manual Sorting in Python

import pysam

with pysam.AlignmentFile('input.bam', 'rb') as infile:
    header = infile.header
    reads = list(infile)

reads.sort(key=lambda r: (r.reference_id, r.reference_start))

with pysam.AlignmentFile('sorted.bam', 'wb', header=header) as outfile:
    for read in reads:
        outfile.write(read)

Check Sort Order in pysam

import pysam

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    hd = bam.header.get('HD', {})
    sort_order = hd.get('SO', 'unknown')
    print(f'Sort order: {sort_order}')

Stream Sort from Aligner

For streaming from aligners, use shell pipes (simpler and more reliable):

import subprocess

subprocess.run(
    'bwa mem ref.fa reads.fq | samtools sort -o aligned.bam',
    shell=True, check=True
)

Or use pysam with a named pipe:

import os
import pysam
import subprocess

os.mkfifo('aligner.pipe')
try:
    aligner = subprocess.Popen(['bwa', 'mem', 'ref.fa', 'reads.fq'],
                               stdout=open('aligner.pipe', 'w'))
    pysam.sort('-o', 'aligned.bam', 'aligner.pipe')
    aligner.wait()
finally:
    os.unlink('aligner.pipe')

samtools merge

Combine multiple BAM files into one.

Basic Merge

samtools merge merged.bam sample1.bam sample2.bam sample3.bam

Merge with Threads

samtools merge -@ 4 merged.bam sample1.bam sample2.bam sample3.bam

Merge from File List

# files.txt contains one BAM path per line
samtools merge -b files.txt merged.bam

Force Overwrite

samtools merge -f merged.bam sample1.bam sample2.bam

Merge Specific Region

samtools merge -R chr1:1000000-2000000 merged_region.bam sample1.bam sample2.bam

pysam Merge

import pysam

pysam.merge('-f', 'merged.bam', 'sample1.bam', 'sample2.bam', 'sample3.bam')

Common Workflows

Align and Sort

bwa mem -t 8 ref.fa R1.fq R2.fq | samtools sort -@ 4 -o aligned.bam
samtools index aligned.bam

Re-sort by Name for Duplicate Marking

# Full workflow: sort by name, fixmate, sort by coord, markdup
samtools sort -n -o namesorted.bam input.bam
samtools fixmate -m namesorted.bam fixmate.bam
samtools sort -o sorted.bam fixmate.bam
samtools markdup sorted.bam marked.bam

Convert Name-sorted to Coordinate-sorted

samtools sort -o coord_sorted.bam name_sorted.bam
samtools index coord_sorted.bam

Extract FASTQ from Sorted BAM

# Collate first to group pairs
samtools collate -u -O input.bam /tmp/collate | \
    samtools fastq -1 R1.fq -2 R2.fq -0 /dev/null -s /dev/null -

Performance Tips

ParameterEffect
-@ NUse N additional threads
-m SIZEMemory per thread (e.g., 4G)
-T PREFIXTemp file location (use fast disk)
-l LEVELCompression level (1-9, default 6)

Optimal Settings for Large Files

# Use 8 threads, 4GB per thread, low compression for speed
samtools sort -@ 8 -m 4G -l 1 -o sorted.bam input.bam

Quick Reference

TaskCommand
Sort by coordinatesamtools sort -o out.bam in.bam
Sort by namesamtools sort -n -o out.bam in.bam
Sort with threadssamtools sort -@ 8 -o out.bam in.bam
Collate pairssamtools collate -o out.bam in.bam
Merge BAMssamtools merge out.bam in1.bam in2.bam
Check sort ordersamtools view -H in.bam | grep "^@HD"
Sort + indexsamtools sort -o out.bam in.bam && samtools index out.bam

Common Errors

ErrorCauseSolution
out of memoryInsufficient RAMUse -m to limit per-thread memory
disk fullTemp files filling diskUse -T to specify different location
truncated fileInterrupted sortRe-run sort from original

Related Skills

  • sam-bam-basics - View and convert alignment files
  • alignment-indexing - Index after coordinate sorting
  • duplicate-handling - Requires name-sorted input for fixmate
  • alignment-filtering - Filter before or after sorting

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

87/100Analyzed yesterday

Comprehensive skill covering BAM alignment sorting with samtools and pysam. Includes all sort orders, threading, memory control, merge, collate operations. Well-structured with tables, code examples, common workflows, performance tips, and error troubleshooting. The 'github-actions' tag is irrelevant but doesn't diminish the practical value. High-quality technical reference for bioinformatics work.

95
85
85
85
90

Metadata

Licenseunknown
Version-
Updated2/16/2026
Publishermdbabumiamssm

Tags

github-actions