askill
bio-phylo-distance-calculations

bio-phylo-distance-calculationsSafety 100Repository

Compute evolutionary distances and build phylogenetic trees using Biopython Bio.Phylo.TreeConstruction. Use when creating distance matrices from alignments, building NJ/UPGMA trees, or generating bootstrap consensus trees.

1 stars
1.2k downloads
Updated 2/22/2026

Package Files

Loading files...
SKILL.md

Distance Calculations and Tree Building

Compute distances from alignments and construct phylogenetic trees.

Required Import

from Bio import Phylo, AlignIO
from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor
from Bio.Phylo.TreeConstruction import DistanceMatrix
from Bio.Phylo.TreeConstruction import ParsimonyScorer, ParsimonyTreeConstructor, NNITreeSearcher
from Bio.Phylo.Consensus import strict_consensus, majority_consensus, bootstrap_trees, bootstrap_consensus

Distance Matrix from Alignment

from Bio import AlignIO
from Bio.Phylo.TreeConstruction import DistanceCalculator

alignment = AlignIO.read('alignment.fasta', 'fasta')

# Create calculator with distance model
calculator = DistanceCalculator('identity')  # Simple identity-based distance
dm = calculator.get_distance(alignment)
print(dm)

# Available models for DNA
calculator = DistanceCalculator('blastn')  # BLASTN-style distance

# Available models for protein
calculator = DistanceCalculator('blosum62')  # BLOSUM62-based distance

Available Distance Models

ModelTypeDescription
identityDNA/Protein1 - (identical positions / total)
blastnDNABLASTN scoring distance
transDNATransition/transversion weighted
blosum62ProteinBLOSUM62 matrix distance
blosum45ProteinBLOSUM45 matrix distance
blosum80ProteinBLOSUM80 matrix distance
pam250ProteinPAM250 matrix distance
pam30ProteinPAM30 matrix distance

Building Trees with Distance Methods

Neighbor Joining (NJ)

from Bio import AlignIO
from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor

alignment = AlignIO.read('alignment.fasta', 'fasta')
calculator = DistanceCalculator('identity')
dm = calculator.get_distance(alignment)

constructor = DistanceTreeConstructor()
nj_tree = constructor.nj(dm)
Phylo.draw_ascii(nj_tree)

UPGMA

constructor = DistanceTreeConstructor()
upgma_tree = constructor.upgma(dm)
Phylo.draw_ascii(upgma_tree)

One-Step Tree Building

# Build tree directly from alignment
constructor = DistanceTreeConstructor(calculator, 'nj')
tree = constructor.build_tree(alignment)

# Or with UPGMA
constructor = DistanceTreeConstructor(calculator, 'upgma')
tree = constructor.build_tree(alignment)

Pairwise Distances Between Taxa

from Bio import Phylo

tree = Phylo.read('tree.nwk', 'newick')

# Distance between two taxa (sum of branch lengths)
taxon1 = tree.find_any(name='Human')
taxon2 = tree.find_any(name='Mouse')
dist = tree.distance(taxon1, taxon2)
print(f'Distance Human-Mouse: {dist:.4f}')

# All pairwise distances
terminals = tree.get_terminals()
for i, t1 in enumerate(terminals):
    for t2 in terminals[i+1:]:
        d = tree.distance(t1, t2)
        print(f'{t1.name}-{t2.name}: {d:.4f}')

Creating Distance Matrix Manually

from Bio.Phylo.TreeConstruction import DistanceMatrix

names = ['A', 'B', 'C', 'D']
# Lower triangular matrix (including diagonal)
matrix = [
    [0],
    [0.1, 0],
    [0.2, 0.15, 0],
    [0.3, 0.25, 0.2, 0]
]
dm = DistanceMatrix(names, matrix)
print(dm)

# Build tree from custom matrix
constructor = DistanceTreeConstructor()
tree = constructor.nj(dm)

Parsimony Tree Construction

from Bio import AlignIO, Phylo
from Bio.Phylo.TreeConstruction import ParsimonyScorer, NNITreeSearcher, ParsimonyTreeConstructor

alignment = AlignIO.read('alignment.fasta', 'fasta')

# Create scorer and searcher
scorer = ParsimonyScorer()
searcher = NNITreeSearcher(scorer)

# Build parsimony tree (needs starting tree)
constructor = DistanceTreeConstructor(DistanceCalculator('identity'), 'nj')
starting_tree = constructor.build_tree(alignment)

pars_constructor = ParsimonyTreeConstructor(searcher, starting_tree)
pars_tree = pars_constructor.build_tree(alignment)

print(f'Parsimony score: {scorer.get_score(pars_tree, alignment)}')
Phylo.draw_ascii(pars_tree)

Bootstrap Analysis

from Bio import AlignIO
from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor
from Bio.Phylo.Consensus import bootstrap_trees, bootstrap_consensus, majority_consensus

alignment = AlignIO.read('alignment.fasta', 'fasta')
calculator = DistanceCalculator('identity')
constructor = DistanceTreeConstructor(calculator, 'nj')

# Generate bootstrap trees
boot_trees = list(bootstrap_trees(alignment, 100, constructor))
print(f'Generated {len(boot_trees)} bootstrap trees')

# Get bootstrap consensus
consensus = bootstrap_consensus(alignment, 100, constructor, majority_consensus)
Phylo.draw_ascii(consensus)

Consensus Tree Methods

from Bio.Phylo.Consensus import strict_consensus, majority_consensus, adam_consensus

trees = list(Phylo.parse('bootstrap.nwk', 'newick'))

# Strict consensus (only clades in ALL trees)
strict = strict_consensus(trees)

# Majority rule consensus (clades in >50% of trees)
majority = majority_consensus(trees, cutoff=0.5)

# Adam consensus
adam = adam_consensus(trees)

Phylo.draw_ascii(majority)

Tree Depths and Total Length

tree = Phylo.read('tree.nwk', 'newick')

# Total branch length
total = tree.total_branch_length()
print(f'Total branch length: {total:.4f}')

# Depths from root to each node
depths = tree.depths()
for clade, depth in depths.items():
    if clade.is_terminal():
        print(f'{clade.name}: {depth:.4f}')

# Maximum depth (tree height)
tree_height = max(depths.values())
print(f'Tree height: {tree_height:.4f}')

Comparing Tree Distances

tree1 = Phylo.read('tree1.nwk', 'newick')
tree2 = Phylo.read('tree2.nwk', 'newick')

# Compare total branch lengths
len1 = tree1.total_branch_length()
len2 = tree2.total_branch_length()
print(f'Tree 1 total: {len1:.4f}')
print(f'Tree 2 total: {len2:.4f}')

# Compare specific pairwise distances
taxa = ['Human', 'Mouse']
t1 = [tree1.find_any(name=t) for t in taxa]
t2 = [tree2.find_any(name=t) for t in taxa]

d1 = tree1.distance(t1[0], t1[1])
d2 = tree2.distance(t2[0], t2[1])
print(f'Human-Mouse distance: Tree1={d1:.4f}, Tree2={d2:.4f}')

Complete Pipeline: Alignment to Bootstrapped Tree

from Bio import AlignIO, Phylo
from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor
from Bio.Phylo.Consensus import bootstrap_consensus, majority_consensus

alignment = AlignIO.read('sequences.aln', 'clustal')
print(f'Alignment: {len(alignment)} sequences, {alignment.get_alignment_length()} positions')

calculator = DistanceCalculator('identity')
constructor = DistanceTreeConstructor(calculator, 'nj')

# Build simple tree
simple_tree = constructor.build_tree(alignment)
simple_tree.ladderize()

# Build bootstrap consensus (100 replicates)
consensus_tree = bootstrap_consensus(alignment, 100, constructor, majority_consensus)
consensus_tree.ladderize()

Phylo.write(simple_tree, 'nj_tree.nwk', 'newick')
Phylo.write(consensus_tree, 'bootstrap_consensus.nwk', 'newick')

Quick Reference: Distance Models

DNA Models

ModelDescription
identitySimple mismatch counting
blastnBLASTN-style scoring
transWeights transitions vs transversions

Protein Models

ModelDescription
blosum62General proteins
blosum45Divergent proteins
blosum80Similar proteins
pam250Distant homologs
pam30Close homologs

Related Skills

  • tree-io - Save constructed trees to files
  • tree-visualization - Draw resulting trees
  • tree-manipulation - Root and process built trees
  • alignment/alignment-io - Read alignments for tree building
  • alignment/msa-statistics - Alignment quality before tree building

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

95/100Analyzed 2/23/2026

High-quality bioinformatics skill for computing evolutionary distances and building phylogenetic trees using Biopython. Comprehensive coverage of distance matrix creation (multiple models for DNA/protein), tree construction methods (NJ, UPGMA, Parsimony), bootstrap analysis, and consensus methods. Well-structured with clear code examples, tables for quick reference, and a complete pipeline example. Includes 'when to use' trigger in description. The ci-cd tag seems misplaced for a bioinformatics skill, but the technical content is accurate and actionable.

100
95
90
95
95

Metadata

Licenseunknown
Version-
Updated2/22/2026
Publishermajiayu000

Tags

ci-cd