askill
bio-rna-structure-secondary-structure-prediction

bio-rna-structure-secondary-structure-predictionSafety 95Repository

Predicts RNA secondary structures using minimum free energy folding and partition function analysis with ViennaRNA (RNAfold, RNAalifold, RNAcofold). Computes base-pair probabilities, centroid structures, and consensus structures from alignments. Use when predicting RNA folding, evaluating structural stability, or comparing structures across homologs.

270 stars
5.4k downloads
Updated 2/17/2026

Package Files

Loading files...
SKILL.md

Secondary Structure Prediction

Predict RNA secondary structures using thermodynamic models. ViennaRNA provides MFE folding, partition function analysis, consensus structure prediction from alignments, and RNA-RNA interaction prediction.

RNAfold: Single Sequence Folding

MFE Structure

# Basic MFE folding (reads sequence from stdin or file)
echo "GGGAAACCC" | RNAfold

# With partition function (-p) and base-pair probabilities
echo "GGGAAACCC" | RNAfold -p

# Output PostScript dot plot and structure plot
echo ">myRNA" > input.fa
echo "GGGCUAUUAGCUCAGUUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGCUGAUUCGAAUUCAGCAUAGCCCA" >> input.fa
RNAfold -p --noPS < input.fa  # Suppress PostScript files

Key RNAfold Options

OptionDescription
-pCompute partition function and base-pair probabilities
--MEACompute maximum expected accuracy structure
-d2Dangling end energies on both sides of helices (default)
-T 37Temperature in Celsius (default: 37)
--noLPNo lonely pairs (isolated base pairs)
--noPSSuppress PostScript output files
-CRead structure constraints from input
--shapeIncorporate SHAPE reactivity data

Constrained Folding

# Force specific positions paired/unpaired
# Constraint notation: '.' = unconstrained, 'x' = unpaired, '(' ')' = forced pair
echo -e ">constrained\nGGGCUAUUAGCUCAGUUGGUUAGAGCGCACC\n...xxxx.........................." | RNAfold -C

RNAalifold: Consensus Structure from Alignment

Predicts a consensus structure from a multiple sequence alignment, combining thermodynamic stability with covariation evidence.

# Input: Stockholm or ClustalW alignment format
RNAalifold --aln alignment.sto

# With covariation weighting and partition function
RNAalifold --cfactor 0.6 --nfactor 0.5 -p alignment.sto

# RIBOSUM scoring for better covariation detection
RNAalifold --ribosum_scoring alignment.sto
OptionDescription
--cfactorCovariation weight (default: 1.0, lower = more thermodynamic)
--nfactorNon-compatible penalty (default: 1.0)
--ribosum_scoringUse RIBOSUM matrices for covariation
-pPartition function for consensus

RNAcofold: RNA-RNA Interaction

Predicts the hybridization structure of two RNA molecules.

# Two sequences separated by '&'
echo "GCGCGC&GCGCGC" | RNAcofold

# With partition function
echo "GCGCGC&GCGCGC" | RNAcofold -p

LinearFold: Fast Folding for Long Sequences

For sequences longer than ~5,000 nt, LinearFold provides O(n) time complexity instead of O(n^3).

# LinearFold (if installed separately)
echo "GGGAAACCC" | linearfold

# ViennaRNA also supports --maxBPspan for long sequences
RNAfold --maxBPspan 300 < long_sequence.fa

ViennaRNA Python API

import RNA

sequence = 'GGGCUAUUAGCUCAGUUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGCUGAUUCGAAUUCAGCAUAGCCCA'

# MFE folding
structure, mfe = RNA.fold(sequence)
print(f'Structure: {structure}')
print(f'MFE: {mfe:.2f} kcal/mol')

# Partition function and base-pair probabilities
fc = RNA.fold_compound(sequence)
structure_pf, pf_energy = fc.pf()
print(f'Ensemble energy: {pf_energy:.2f} kcal/mol')

# Base-pair probability matrix
bpp = fc.bpp()

# Centroid structure (most representative of the ensemble)
centroid, centroid_dist = fc.centroid()
print(f'Centroid: {centroid}')
print(f'Distance to ensemble: {centroid_dist:.2f}')

# MEA structure (maximum expected accuracy)
mea_struct, mea_val = fc.MEA()
print(f'MEA structure: {mea_struct}')
print(f'MEA value: {mea_val:.2f}')

Folding with Constraints (Python)

import RNA

sequence = 'GGGCUAUUAGCUCAGUUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGCUGAUUCGAAUUCAGCAUAGCCCA'

md = RNA.md()
md.uniq_ML = 1  # Unique multiloop decomposition

fc = RNA.fold_compound(sequence, md)

# Force position 10 unpaired (0-indexed)
fc.hc_add_up(10, RNA.CONSTRAINT_CONTEXT_ALL_LOOPS)

# Force positions 1-3 paired with 70-72
fc.hc_add_bp(1, 72, RNA.CONSTRAINT_CONTEXT_ALL_LOOPS)

structure, mfe = fc.mfe()
print(f'Constrained: {structure} ({mfe:.2f} kcal/mol)')

SHAPE-Constrained Folding (Python)

import RNA

sequence = 'GGGCUAUUAGCUCAGUUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGCUGAUUCGAAUUCAGCAUAGCCCA'

fc = RNA.fold_compound(sequence)

# SHAPE reactivities: negative values = no data
# Deigan et al. (2009) parameters: m=1.8, b=-0.6 (default for SHAPE)
reactivities = [-999] + [0.1, 0.05, 0.8, 0.9, 0.2, 0.1, 0.3]  # 1-indexed, -999 = missing
fc.sc_add_SHAPE_deigan(reactivities, 1.8, -0.6)

structure, mfe = fc.mfe()
print(f'SHAPE-guided: {structure} ({mfe:.2f} kcal/mol)')

Structure Comparison

import RNA

struct1 = '(((....)))'
struct2 = '(((....).))'

# Base-pair distance
bp_dist = RNA.bp_distance(struct1, struct2)
print(f'Base-pair distance: {bp_dist}')

# Tree edit distance (more sophisticated)
tree1 = RNA.make_tree(RNA.expand_Full(struct1))
tree2 = RNA.make_tree(RNA.expand_Full(struct2))
tree_dist = RNA.tree_edit_distance(tree1, tree2)
print(f'Tree edit distance: {tree_dist}')

Structure Formats

FormatDescriptionExample
Dot-bracketParentheses for pairs, dots for unpaired(((...)))
CT (connect)Tab-delimited: index, base, prev, next, pair, indexStandard for mfold
BPSEQThree columns: position, nucleotide, pair partner (0=unpaired)Used by comparative databases
WUSSExtended dot-bracket with pseudoknot notation<<..AA..>>..aa

Format Conversion

import RNA

sequence = 'GGGAAACCC'
structure = '(((...)))'

# Dot-bracket to base-pair list
pt = RNA.ptable(structure)
pairs = [(i, pt[i]) for i in range(1, len(pt)) if pt[i] > i]
print(f'Base pairs: {pairs}')

# Dot-bracket to BPSEQ
for i in range(1, len(sequence) + 1):
    print(f'{i} {sequence[i-1]} {pt[i]}')

Visualization

Forna (web-based)

# Generate JSON for forna viewer (http://rna.tbi.univie.ac.at/forna/)
import json

forna_data = {
    'sequence': 'GGGCUAUUAGCUCAGUUGGUUAGAGCGCACC',
    'structure': '((((....((((......))))....))))'
}
print(json.dumps(forna_data))

R2DT (standardized 2D layouts)

# R2DT provides template-based 2D layouts for known RNA families
# Requires Docker
docker run -v $(pwd):/data rnacentral/r2dt draw /data/input.fa /data/output/

Matplotlib Dot Plot

import RNA
import matplotlib.pyplot as plt
import numpy as np

sequence = 'GGGCUAUUAGCUCAGUUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGCUGAUUCGAAUUCAGCAUAGCCCA'
fc = RNA.fold_compound(sequence)
fc.pf()
bpp = fc.bpp()

n = len(sequence)
matrix = np.zeros((n, n))
for i in range(1, n + 1):
    for j in range(i + 1, n + 1):
        matrix[i-1][j-1] = bpp[i][j]

fig, ax = plt.subplots(figsize=(8, 8))
ax.imshow(matrix, cmap='YlOrRd', origin='lower', vmin=0, vmax=1)
ax.set_xlabel('Position')
ax.set_ylabel('Position')
ax.set_title('Base-pair probability matrix')
plt.tight_layout()
plt.savefig('bpp_dotplot.png', dpi=150)

Quality Thresholds

MetricThresholdRationale
MFE z-score< -2.0Sequence folds significantly better than shuffled controls
Ensemble diversity< 5.0Low diversity indicates a well-defined structure
Base-pair probability> 0.9High confidence for individual pairs
Covariation score> 0.0Positive covariation supports predicted pair

Related Skills

  • ncrna-search - Classify structured RNAs by family using Infernal/Rfam
  • structure-probing - Use experimental SHAPE/DMS data to constrain predictions
  • genome-annotation/ncrna-annotation - Genome-wide ncRNA annotation
  • sequence-manipulation/sequence-properties - Sequence composition analysis

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

91/100Analyzed 7 hours ago

High-quality, comprehensive skill for RNA secondary structure prediction using ViennaRNA. Covers CLI tools (RNAfold, RNAalifold, RNAcofold), Python API, constraints, visualization, and quality metrics. Well-structured with practical examples, tables, and code blocks. Tags don't match content but skill is broadly reusable.

95
90
88
92
90

Metadata

Licenseunknown
Version-
Updated2/17/2026
PublisherGPTomics

Tags

apiobservability