askill
wandb

wandbSafety --Repository

Track ML experiments, version artifacts, and manage models with Weights & Biases (W&B). Use when: (1) Setting up wandb experiment tracking (init, config, logging metrics/media), (2) Integrating with PyTorch, HuggingFace, Lightning, or Ray Train, (3) Versioning datasets and models with Artifacts, (4) Using the Model Registry for staging and deployment, (5) Running hyperparameter sweeps (Bayesian, grid, random), (6) Querying runs and exporting data via the Public API, (7) Advanced operations (alerts, custom charts, Tables, reports, run grouping), (8) Debugging tracking issues and run recovery.

3 stars
1.2k downloads
Updated 3/7/2026

Package Files

Loading files...
SKILL.md

Weights & Biases (wandb)

W&B is the ML experiment tracking, artifact versioning, and model registry platform. Assumes a dedicated W&B deployment with BYOB (Bring Your Own Bucket). SDK version: 0.18.x+.

Setup

Add wandb to your container image dependencies. Configure via env vars in your pod spec:

env:
- name: WANDB_API_KEY
  valueFrom:
    secretKeyRef:
      name: wandb-secret
      key: api-key
- name: WANDB_BASE_URL
  value: "https://wandb.example.com"    # your dedicated deployment
- name: WANDB_PROJECT
  value: "my-project"
- name: WANDB_ENTITY
  value: "my-team"

Environment Variables Reference

VariablePurposeDefault
WANDB_API_KEYAuthentication keyrequired
WANDB_BASE_URLW&B server URLhttps://api.wandb.ai
WANDB_PROJECTDefault project name"uncategorized"
WANDB_ENTITYTeam or user entityUser default
WANDB_RUN_GROUPGroup related runsNone
WANDB_JOB_TYPERun type label (train, eval, etc.)None
WANDB_TAGSComma-separated tagsNone
WANDB_NOTESRun descriptionNone
WANDB_NAMERun display nameAuto-generated
WANDB_DIRLocal directory for run files./wandb
WANDB_SILENTSuppress console outputfalse
WANDB_RESUMEResume behavior"never"
WANDB_LOG_MODELAuto-log model checkpointsfalse
WANDB_WATCHAuto-log model gradients/parametersfalse
WANDB_DISABLEDDisable wandb entirelyfalse

Experiment Tracking

wandb.init() Settings

import wandb

run = wandb.init(
    project="my-project",
    entity="my-team",
    name="llama-8b-sft-lr1e4",
    group="llama-8b-sft",                # group related runs
    job_type="train",                     # train, eval, preprocess, etc.
    tags=["sft", "llama", "8b"],
    notes="SFT with learning rate 1e-4",
    config={                              # hyperparameters and settings
        "model": "meta-llama/Llama-3.1-8B-Instruct",
        "learning_rate": 1e-4,
        "batch_size": 32,
        "epochs": 3,
    },
    resume="allow",                       # resume behavior (see below)
    reinit=True,                          # allow multiple init() in same process
    save_code=True,                       # save the main script
    settings=wandb.Settings(
        start_method="thread",            # thread, fork, forkserver
    ),
)

Resume Behavior

ValueBehavior
"never" (default)Always create new run
"allow"Resume if run ID exists, else create new
"must"Must resume existing run (error if not found)
"auto"Auto-resume from environment (uses WANDB_RUN_ID)

For crash recovery in training jobs, set resume="allow" and provide a deterministic id:

run = wandb.init(
    id=f"training-{model_name}-{experiment_id}",  # deterministic ID
    resume="allow",
    project="my-project",
)

Logging Metrics

# Basic logging
wandb.log({"train/loss": 0.5, "train/accuracy": 0.85, "epoch": 1})

# Step-based logging (explicit step)
wandb.log({"train/loss": 0.3}, step=100)

# Commit control (batch multiple calls into one step)
wandb.log({"train/loss": 0.3}, commit=False)
wandb.log({"train/accuracy": 0.9}, commit=True)  # both logged at same step

# Summary metrics (final values shown in run table)
wandb.run.summary["best_accuracy"] = 0.95
wandb.run.summary["best_epoch"] = 7

# Define metric properties (x-axis, summary, goal)
wandb.define_metric("train/*", step_metric="global_step")
wandb.define_metric("eval/*", step_metric="epoch")
wandb.define_metric("eval/accuracy", summary="max", goal="maximize")
wandb.define_metric("eval/loss", summary="min", goal="minimize")

define_metric Reference

SettingPurposeValues
step_metricX-axis for this metricAny logged metric name
summaryHow to summarize in run table"min", "max", "mean", "last", "best", "copy", "none"
goalOptimization direction"minimize", "maximize"
hiddenHide from default chartsTrue/False
overwriteAllow redefiningTrue/False

Logging Media

# Images
wandb.log({"samples": [wandb.Image(img, caption=f"Sample {i}") for i, img in enumerate(images)]})

# Tables (structured data)
table = wandb.Table(columns=["input", "prediction", "label", "correct"])
for inp, pred, label in results:
    table.add_data(inp, pred, label, pred == label)
wandb.log({"predictions": table})

# Histograms
wandb.log({"weight_dist": wandb.Histogram(model.fc.weight.data.cpu())})

# Audio
wandb.log({"audio": wandb.Audio(audio_array, sample_rate=16000)})

# Point clouds, 3D objects, HTML
wandb.log({"scene": wandb.Object3D(point_cloud)})
wandb.log({"report": wandb.Html(html_string)})

Custom Charts

Built-in chart functions: wandb.plot.line(), wandb.plot.scatter(), wandb.plot.bar(), wandb.plot.confusion_matrix(), wandb.plot.pr_curve(), wandb.plot.roc_curve(). All take a wandb.Table and column names.

Alerts

# Send alert when metric crosses threshold
if val_loss > 2.0:
    wandb.alert(
        title="Training diverging",
        text=f"val_loss={val_loss:.4f} exceeded threshold at step {step}",
        level=wandb.AlertLevel.WARN,  # INFO, WARN, ERROR
        wait_duration=300,            # don't re-alert for 5 minutes
    )

Framework Integrations

PyTorch (Manual)

wandb.watch(model, log="all", log_freq=100)  # log gradients + parameters

for step, batch in enumerate(dataloader):
    loss = train_step(model, batch)
    wandb.log({"train/loss": loss, "global_step": step})

wandb.unwatch(model)

wandb.watch() options:

SettingPurposeDefault
logWhat to log: "gradients", "parameters", "all", None"gradients"
log_freqLog every N steps1000
log_graphLog computation graphFalse

HuggingFace Transformers

from transformers import TrainingArguments

training_args = TrainingArguments(
    report_to="wandb",
    run_name="llama-sft",
    logging_steps=10,
    # WANDB_PROJECT, WANDB_ENTITY set via env vars in pod spec
)

Set WANDB_LOG_MODEL="checkpoint" to auto-log checkpoints as artifacts, or "end" to log only the final model.

PyTorch Lightning

from lightning.pytorch.loggers import WandbLogger

logger = WandbLogger(
    project="my-project",
    name="my-run",
    log_model="all",           # log checkpoints as artifacts
    save_dir="/tmp/wandb",
)
trainer = pl.Trainer(logger=logger)

Ray Train

from ray.train import RunConfig
from ray.air.integrations.wandb import WandbLoggerCallback

run_config = RunConfig(
    callbacks=[WandbLoggerCallback(
        project="my-project",
        group="ray-experiment",
        log_config=True,
    )],
)

lm-evaluation-harness

import lm_eval
results = lm_eval.simple_evaluate(
    model="vllm",
    model_args="pretrained=my-model",
    tasks=["mmlu"],
    wandb_args="project=eval,name=llama-8b-mmlu,job_type=eval",
)

Artifacts

Artifacts version datasets, models, and other files with automatic lineage tracking.

Artifact Types

TypeConventionUse Case
datasetVersioned training/eval datamy-dataset:v3
modelModel checkpointsllama-sft:latest
codeSource code snapshotstraining-code:v1
Custom stringAny file collectionconfigs:v2

Creating and Logging Artifacts

# Log a model checkpoint
artifact = wandb.Artifact(
    name="llama-sft-model",
    type="model",
    description="SFT fine-tuned Llama 3.1 8B",
    metadata={
        "model": "meta-llama/Llama-3.1-8B-Instruct",
        "epochs": 3,
        "eval_loss": 0.42,
    },
)
artifact.add_dir("./checkpoint-best")           # add entire directory
# or: artifact.add_file("model.safetensors")    # add single file
# or: artifact.add_reference("s3://bucket/model/")  # reference (no copy)
wandb.log_artifact(artifact)

Artifact References (BYOB)

With BYOB, artifacts can reference files in your own bucket without copying:

artifact = wandb.Artifact("training-data", type="dataset")
artifact.add_reference("s3://my-bucket/datasets/openorca/", max_objects=100000)
wandb.log_artifact(artifact)

# Later, download (fetches from your bucket)
artifact = wandb.use_artifact("training-data:latest")
artifact.download(root="/data/openorca")

Using Artifacts (Input)

# Declare dependency and download
artifact = wandb.use_artifact("my-team/my-project/training-data:v3")
data_dir = artifact.download(root="/data/training")

# Get metadata without downloading
artifact = wandb.use_artifact("my-team/my-project/llama-sft-model:latest")
print(artifact.metadata)  # {"model": "...", "epochs": 3, ...}

Artifact Aliases

AliasPurpose
latestMost recently logged version
bestCustom alias for best-performing version
productionCustom alias for deployed version
# Set aliases
artifact.aliases = ["best", "production"]
artifact.save()

# Or via API
api = wandb.Api()
artifact = api.artifact("my-team/my-project/llama-sft-model:v5")
artifact.aliases.append("production")
artifact.save()

Model Registry

The Model Registry organizes model artifacts with lifecycle management:

# Link artifact to registry
run.link_artifact(
    artifact,
    target_path="my-team/model-registry/LLama-3.1-SFT",
    aliases=["staging"],
)

Registry Lifecycle

StageMeaning
stagingReady for validation
productionDeployed/serving
Custom aliasesAny workflow-specific labels

Querying the Registry

api = wandb.Api()

# List all registered models
collections = api.artifact_type("model", project="model-registry").collections()
for collection in collections:
    print(collection.name, collection.aliases)

# Get production model
model = api.artifact("my-team/model-registry/LLama-3.1-SFT:production")
model.download(root="/models/production")

Sweeps (Hyperparameter Optimization)

Sweep Config

sweep_config = {
    "method": "bayes",                    # bayes, grid, random
    "metric": {"name": "eval/accuracy", "goal": "maximize"},
    "parameters": {
        "learning_rate": {
            "distribution": "log_uniform_values",
            "min": 1e-6,
            "max": 1e-3,
        },
        "batch_size": {"values": [16, 32, 64]},
        "epochs": {"value": 10},          # fixed parameter
        "optimizer": {"values": ["adam", "adamw"]},
        "weight_decay": {"min": 0.0, "max": 0.1},
    },
    "early_terminate": {
        "type": "hyperband",
        "min_iter": 3,
        "eta": 2,
    },
}

Sweep Parameter Distributions

DistributionFieldsFor
constantvalueFixed value
categoricalvalues (list)Discrete choices
uniformmin, maxContinuous uniform
log_uniform_valuesmin, maxLog-uniform (for learning rates)
normalmu, sigmaNormal distribution
int_uniformmin, maxInteger range
q_uniformmin, max, qQuantized uniform

Running Sweeps

sweep_id = wandb.sweep(sweep_config, project="hp-search")
wandb.agent(sweep_id, function=train_func, count=50)

Sweep Methods

MethodDescriptionBest For
bayesBayesian optimization with Gaussian processesSmall-medium search spaces
gridExhaustive grid searchSmall discrete spaces
randomRandom samplingLarge spaces, baseline

Public API (Programmatic Access)

api = wandb.Api()

# Query runs with filters
runs = api.runs("my-team/my-project", filters={
    "tags": {"$in": ["production"]},
    "summary_metrics.eval/accuracy": {"$gt": 0.9},
    "state": "finished",
})

# Export to DataFrame
import pandas as pd
data = [{
    "name": r.name,
    "lr": r.config.get("learning_rate"),
    "accuracy": r.summary.get("eval/accuracy"),
    "duration": r.summary.get("_wandb", {}).get("runtime"),
} for r in runs]
df = pd.DataFrame(data)

# Download run history
run = api.run("my-team/my-project/run-id")
history = run.history(samples=1000)       # pandas DataFrame
system_metrics = run.history(stream="events")  # system metrics (GPU, CPU)

# Delete old runs
for run in api.runs("my-team/my-project", filters={"created_at": {"$lt": "2024-01-01"}}):
    run.delete()

API Filter Operators

OperatorExample
$eq{"config.model": {"$eq": "llama"}}
$ne{"state": {"$ne": "crashed"}}
$gt, $gte, $lt, $lte{"summary_metrics.loss": {"$lt": 0.5}}
$in{"tags": {"$in": ["prod"]}}
$nin{"tags": {"$nin": ["debug"]}}
$regex{"name": {"$regex": "llama.*"}}

Run Grouping and Comparison

Groups

Group related runs (e.g., distributed training workers, experiment variants):

wandb.init(group="llama-8b-ablation", job_type="train")

All runs in a group appear together in the UI with aggregated metrics.

Tags

# Set at init
wandb.init(tags=["sft", "llama-8b", "production"])

# Add/remove during run
wandb.run.tags = wandb.run.tags + ("best-model",)

Debugging

See references/troubleshooting.md for:

  • Sync failures and recovery
  • Large artifact handling
  • Rate limiting
  • Run resume and crash recovery
  • Common integration pitfalls

Cross-References

Reference

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

AI review pending.

Metadata

Licenseunknown
Version-
Updated3/7/2026
Publishertylertitsworth

Tags

apigithubgithub-actionsllmobservabilitysecurity