askill
scfoundation-model-agent

scfoundation-model-agentSafety 90Repository

Unified agent for leveraging single-cell foundation models (scGPT, scBERT, Geneformer, scFoundation) for cross-species annotation, perturbation prediction, and gene network inference.

10 stars
1.2k downloads
Updated 2/16/2026

Package Files

Loading files...
SKILL.md

scFoundation Model Agent

The scFoundation Model Agent provides a unified interface to leverage state-of-the-art single-cell foundation models for diverse downstream tasks. It integrates scGPT, scBERT, Geneformer, scFoundation, and emerging models to enable cross-species cell annotation, in silico perturbation prediction, gene regulatory network inference, and batch integration.

When to Use This Skill

  • When annotating cell types across species (human, mouse, cross-species).
  • For predicting perturbation effects (knockouts, drug treatments) in silico.
  • To infer gene regulatory networks from single-cell data.
  • When integrating batches without losing biological signal.
  • For generating cell embeddings for downstream analysis.

Core Capabilities

  1. Cross-Species Cell Annotation: Transfer cell type labels across species using unified embeddings.

  2. In Silico Perturbation: Predict gene expression changes from knockouts/treatments.

  3. Gene Regulatory Network Inference: Discover TF-target relationships from attention patterns.

  4. Batch Integration: Remove technical variation while preserving biology.

  5. Cell Embedding Generation: Generate universal cell representations for any downstream task.

  6. Multi-Model Ensemble: Combine predictions from multiple foundation models.

Supported Foundation Models

ModelParametersTraining DataStrengths
scGPT50M33M human cellsGeneral purpose, perturbations
Geneformer10M30M cellsChromatin, gene networks
scBERT20M1.2M cellsCell type annotation
scFoundation100M50M cellsLarge-scale, multi-species
scTab15M22M cellsTabular prediction
UCE (Universal Cell Embeddings)100M36M cellsCross-species transfer

Workflow

  1. Input: Single-cell RNA-seq data (AnnData format).

  2. Model Selection: Choose appropriate model(s) for task.

  3. Preprocessing: Tokenize genes, normalize expression.

  4. Inference: Generate embeddings or predictions.

  5. Task Execution: Annotation, perturbation, or network inference.

  6. Ensemble (Optional): Combine multi-model predictions.

  7. Output: Annotated data, predictions, networks.

Example Usage

User: "Use scGPT to predict the effect of CRISPR knockout of TP53 on these cancer cells."

Agent Action:

python3 Skills/Genomics/scFoundation_Model_Agent/foundation_predict.py \
    --input cancer_cells.h5ad \
    --model scgpt \
    --task perturbation \
    --perturbation "TP53 knockout" \
    --model_checkpoint scgpt_human_gene_v1.pt \
    --output tp53_ko_predictions.h5ad

Task-Specific Usage

Cell Type Annotation

python3 foundation_predict.py \
    --input query_cells.h5ad \
    --model geneformer \
    --task annotation \
    --reference tabula_sapiens.h5ad \
    --output annotated_cells.h5ad

Gene Network Inference

python3 foundation_predict.py \
    --input cells.h5ad \
    --model scgpt \
    --task grn_inference \
    --transcription_factors tf_list.txt \
    --output gene_network.csv

Batch Integration

python3 foundation_predict.py \
    --input multi_batch.h5ad \
    --model scfoundation \
    --task integration \
    --batch_key batch \
    --output integrated.h5ad

Output Formats

TaskOutputFormat
AnnotationCell type labels.h5ad obs column
PerturbationPredicted expression.h5ad layer
GRNTF-target edges.csv, .graphml
IntegrationCorrected embeddings.h5ad obsm
EmbeddingsCell representations.h5ad obsm

Performance Benchmarks

TaskModelDatasetPerformance
AnnotationscGPTTabula Sapiens93% accuracy
AnnotationGeneformerHLCA91% accuracy
Perturbation (R²)scGPTNorman 20190.87
Integration (kBET)scFoundationMulti-atlas0.92
Cross-speciesUCEHuman→Mouse85% F1

AI/ML Architecture

Transformer Backbone:

  • Gene-level tokenization
  • Attention-based gene interactions
  • Masked expression prediction pretraining

Perturbation Module:

  • Conditional generation
  • Counterfactual prediction
  • Dose-response modeling

Transfer Learning:

  • Zero-shot annotation
  • Few-shot fine-tuning
  • Domain adaptation

Prerequisites

  • Python 3.10+
  • PyTorch 2.0+
  • transformers, flash-attn
  • Scanpy, AnnData
  • Model-specific weights
  • GPU with 16GB+ VRAM

Related Skills

  • Nicheformer_Spatial_Agent - For spatial foundation models
  • scGPT_Agent - Dedicated scGPT workflows
  • Cell_Type_Annotation - Traditional annotation methods
  • Pathway_Analysis - Gene set enrichment

Model Selection Guide

Use CaseRecommended ModelReason
General annotationscGPTBroad training, robust
Cross-speciesUCESpecies-agnostic embeddings
PerturbationscGPTBest perturbation performance
GRN inferenceGeneformerAttention → regulatory links
Large-scalescFoundationEfficient, scalable
Tabular predictionscTabOptimized for classification

Special Considerations

  1. Gene Coverage: Models trained on variable gene sets; check overlap
  2. Species: Some models human-only; use UCE for cross-species
  3. Compute: Large models need significant GPU memory
  4. Fine-Tuning: Task-specific fine-tuning improves performance
  5. Versioning: Model weights update frequently; track versions

Ensemble Strategies

StrategyMethodBenefit
Majority VoteMode of predictionsRobust to outliers
Weighted AverageConfidence-weightedLeverages uncertainty
StackingMeta-modelLearns model strengths
Attention FusionCross-model attentionDeep integration

Author

AI Group - Biomedical AI Platform

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

95/100Analyzed 2/11/2026

An exceptionally comprehensive and actionable skill for single-cell genomics. It provides clear triggers, detailed model comparisons, specific CLI examples, and technical architecture details, making it highly useful for bioinformatics agents.

90
98
92
98
95

Metadata

Licenseunknown
Version-
Updated2/16/2026
Publishermdbabumiamssm

Tags

github-actionsprompting