Secondary Structure Prediction
Predict RNA secondary structures using thermodynamic models. ViennaRNA provides MFE folding, partition function analysis, consensus structure prediction from alignments, and RNA-RNA interaction prediction.
RNAfold: Single Sequence Folding
MFE Structure
# Basic MFE folding (reads sequence from stdin or file)
echo "GGGAAACCC" | RNAfold
# With partition function (-p) and base-pair probabilities
echo "GGGAAACCC" | RNAfold -p
# Output PostScript dot plot and structure plot
echo ">myRNA" > input.fa
echo "GGGCUAUUAGCUCAGUUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGCUGAUUCGAAUUCAGCAUAGCCCA" >> input.fa
RNAfold -p --noPS < input.fa # Suppress PostScript files
Key RNAfold Options
| Option | Description |
|---|---|
-p | Compute partition function and base-pair probabilities |
--MEA | Compute maximum expected accuracy structure |
-d2 | Dangling end energies on both sides of helices (default) |
-T 37 | Temperature in Celsius (default: 37) |
--noLP | No lonely pairs (isolated base pairs) |
--noPS | Suppress PostScript output files |
-C | Read structure constraints from input |
--shape | Incorporate SHAPE reactivity data |
Constrained Folding
# Force specific positions paired/unpaired
# Constraint notation: '.' = unconstrained, 'x' = unpaired, '(' ')' = forced pair
echo -e ">constrained\nGGGCUAUUAGCUCAGUUGGUUAGAGCGCACC\n...xxxx.........................." | RNAfold -C
RNAalifold: Consensus Structure from Alignment
Predicts a consensus structure from a multiple sequence alignment, combining thermodynamic stability with covariation evidence.
# Input: Stockholm or ClustalW alignment format
RNAalifold --aln alignment.sto
# With covariation weighting and partition function
RNAalifold --cfactor 0.6 --nfactor 0.5 -p alignment.sto
# RIBOSUM scoring for better covariation detection
RNAalifold --ribosum_scoring alignment.sto
| Option | Description |
|---|---|
--cfactor | Covariation weight (default: 1.0, lower = more thermodynamic) |
--nfactor | Non-compatible penalty (default: 1.0) |
--ribosum_scoring | Use RIBOSUM matrices for covariation |
-p | Partition function for consensus |
RNAcofold: RNA-RNA Interaction
Predicts the hybridization structure of two RNA molecules.
# Two sequences separated by '&'
echo "GCGCGC&GCGCGC" | RNAcofold
# With partition function
echo "GCGCGC&GCGCGC" | RNAcofold -p
LinearFold: Fast Folding for Long Sequences
For sequences longer than ~5,000 nt, LinearFold provides O(n) time complexity instead of O(n^3).
# LinearFold (if installed separately)
echo "GGGAAACCC" | linearfold
# ViennaRNA also supports --maxBPspan for long sequences
RNAfold --maxBPspan 300 < long_sequence.fa
ViennaRNA Python API
import RNA
sequence = 'GGGCUAUUAGCUCAGUUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGCUGAUUCGAAUUCAGCAUAGCCCA'
# MFE folding
structure, mfe = RNA.fold(sequence)
print(f'Structure: {structure}')
print(f'MFE: {mfe:.2f} kcal/mol')
# Partition function and base-pair probabilities
fc = RNA.fold_compound(sequence)
structure_pf, pf_energy = fc.pf()
print(f'Ensemble energy: {pf_energy:.2f} kcal/mol')
# Base-pair probability matrix
bpp = fc.bpp()
# Centroid structure (most representative of the ensemble)
centroid, centroid_dist = fc.centroid()
print(f'Centroid: {centroid}')
print(f'Distance to ensemble: {centroid_dist:.2f}')
# MEA structure (maximum expected accuracy)
mea_struct, mea_val = fc.MEA()
print(f'MEA structure: {mea_struct}')
print(f'MEA value: {mea_val:.2f}')
Folding with Constraints (Python)
import RNA
sequence = 'GGGCUAUUAGCUCAGUUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGCUGAUUCGAAUUCAGCAUAGCCCA'
md = RNA.md()
md.uniq_ML = 1 # Unique multiloop decomposition
fc = RNA.fold_compound(sequence, md)
# Force position 10 unpaired (0-indexed)
fc.hc_add_up(10, RNA.CONSTRAINT_CONTEXT_ALL_LOOPS)
# Force positions 1-3 paired with 70-72
fc.hc_add_bp(1, 72, RNA.CONSTRAINT_CONTEXT_ALL_LOOPS)
structure, mfe = fc.mfe()
print(f'Constrained: {structure} ({mfe:.2f} kcal/mol)')
SHAPE-Constrained Folding (Python)
import RNA
sequence = 'GGGCUAUUAGCUCAGUUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGCUGAUUCGAAUUCAGCAUAGCCCA'
fc = RNA.fold_compound(sequence)
# SHAPE reactivities: negative values = no data
# Deigan et al. (2009) parameters: m=1.8, b=-0.6 (default for SHAPE)
reactivities = [-999] + [0.1, 0.05, 0.8, 0.9, 0.2, 0.1, 0.3] # 1-indexed, -999 = missing
fc.sc_add_SHAPE_deigan(reactivities, 1.8, -0.6)
structure, mfe = fc.mfe()
print(f'SHAPE-guided: {structure} ({mfe:.2f} kcal/mol)')
Structure Comparison
import RNA
struct1 = '(((....)))'
struct2 = '(((....).))'
# Base-pair distance
bp_dist = RNA.bp_distance(struct1, struct2)
print(f'Base-pair distance: {bp_dist}')
# Tree edit distance (more sophisticated)
tree1 = RNA.make_tree(RNA.expand_Full(struct1))
tree2 = RNA.make_tree(RNA.expand_Full(struct2))
tree_dist = RNA.tree_edit_distance(tree1, tree2)
print(f'Tree edit distance: {tree_dist}')
Structure Formats
| Format | Description | Example |
|---|---|---|
| Dot-bracket | Parentheses for pairs, dots for unpaired | (((...))) |
| CT (connect) | Tab-delimited: index, base, prev, next, pair, index | Standard for mfold |
| BPSEQ | Three columns: position, nucleotide, pair partner (0=unpaired) | Used by comparative databases |
| WUSS | Extended dot-bracket with pseudoknot notation | <<..AA..>>..aa |
Format Conversion
import RNA
sequence = 'GGGAAACCC'
structure = '(((...)))'
# Dot-bracket to base-pair list
pt = RNA.ptable(structure)
pairs = [(i, pt[i]) for i in range(1, len(pt)) if pt[i] > i]
print(f'Base pairs: {pairs}')
# Dot-bracket to BPSEQ
for i in range(1, len(sequence) + 1):
print(f'{i} {sequence[i-1]} {pt[i]}')
Visualization
Forna (web-based)
# Generate JSON for forna viewer (http://rna.tbi.univie.ac.at/forna/)
import json
forna_data = {
'sequence': 'GGGCUAUUAGCUCAGUUGGUUAGAGCGCACC',
'structure': '((((....((((......))))....))))'
}
print(json.dumps(forna_data))
R2DT (standardized 2D layouts)
# R2DT provides template-based 2D layouts for known RNA families
# Requires Docker
docker run -v $(pwd):/data rnacentral/r2dt draw /data/input.fa /data/output/
Matplotlib Dot Plot
import RNA
import matplotlib.pyplot as plt
import numpy as np
sequence = 'GGGCUAUUAGCUCAGUUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGCUGAUUCGAAUUCAGCAUAGCCCA'
fc = RNA.fold_compound(sequence)
fc.pf()
bpp = fc.bpp()
n = len(sequence)
matrix = np.zeros((n, n))
for i in range(1, n + 1):
for j in range(i + 1, n + 1):
matrix[i-1][j-1] = bpp[i][j]
fig, ax = plt.subplots(figsize=(8, 8))
ax.imshow(matrix, cmap='YlOrRd', origin='lower', vmin=0, vmax=1)
ax.set_xlabel('Position')
ax.set_ylabel('Position')
ax.set_title('Base-pair probability matrix')
plt.tight_layout()
plt.savefig('bpp_dotplot.png', dpi=150)
Quality Thresholds
| Metric | Threshold | Rationale |
|---|---|---|
| MFE z-score | < -2.0 | Sequence folds significantly better than shuffled controls |
| Ensemble diversity | < 5.0 | Low diversity indicates a well-defined structure |
| Base-pair probability | > 0.9 | High confidence for individual pairs |
| Covariation score | > 0.0 | Positive covariation supports predicted pair |
Related Skills
- ncrna-search - Classify structured RNAs by family using Infernal/Rfam
- structure-probing - Use experimental SHAPE/DMS data to constrain predictions
- genome-annotation/ncrna-annotation - Genome-wide ncRNA annotation
- sequence-manipulation/sequence-properties - Sequence composition analysis
