This skill should be used when users want to run any workload on Hugging Face Jobs infrastructure. Covers UV scripts, Docker-based jobs, hardware selection, cost estimation, authentication with tokens, secrets management, timeout configuration, and result persistence. Designed for general-purpose compute workloads including data processing, inference, experiments, batch jobs, and any Python-based tasks. Should be invoked for tasks involving cloud compute, GPU workloads, or when users mention running jobs on Hugging Face infrastructure without local setup.
hugging-face-jobs follows the SKILL.md standard. Use the install command to add it to your agent stack.
---
name: hugging-face-jobs
description: This skill should be used when users want to run any workload on Hugging Face Jobs infrastructure. Covers UV scripts, Docker-based jobs, hardware selection, cost estimation, authentication with tokens, secrets management, timeout configuration, and result persistence. Designed for general-purpose compute workloads including data processing, inference, experiments, batch jobs, and any Python-based tasks. Should be invoked for tasks involving cloud compute, GPU workloads, or when users mention running jobs on Hugging Face infrastructure without local setup.
license: Complete terms in LICENSE.txt
---
# Running Workloads on Hugging Face Jobs
## Overview
Run any workload on fully managed Hugging Face infrastructure. No local setup required—jobs run on cloud CPUs, GPUs, or TPUs and can persist results to the Hugging Face Hub.
**Common use cases:**
- **Data Processing** - Transform, filter, or analyze large datasets
- **Batch Inference** - Run inference on thousands of samples
- **Experiments & Benchmarks** - Reproducible ML experiments
- **Model Training** - Fine-tune models (see `model-trainer` skill for TRL-specific training)
- **Synthetic Data Generation** - Generate datasets using LLMs
- **Development & Testing** - Test code without local GPU setup
- **Scheduled Jobs** - Automate recurring tasks
**For model training specifically:** See the `model-trainer` skill for TRL-based training workflows.
## When to Use This Skill
Use this skill when users want to:
- Run Python workloads on cloud infrastructure
- Execute jobs without local GPU/TPU setup
- Process data at scale
- Run batch inference or experiments
- Schedule recurring tasks
- Use GPUs/TPUs for any workload
- Persist results to the Hugging Face Hub
## Key Directives
When assisting with jobs:
1. **ALWAYS use `hf_jobs()` MCP tool** - Submit jobs using `hf_jobs("uv", {...})` or `hf_jobs("run", {...})`. The `script` parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string to `hf_jobs()`.
2. **Always handle authentication** - Jobs that interact with the Hub require `HF_TOKEN` via secrets. See Token Usage section below.
3. **Provide job details after submission** - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.
4. **Set appropriate timeouts** - Default 30min may be insufficient for long-running tasks.
## Prerequisites Checklist
Before starting any job, verify:
### ✅ **Account & Authentication**
- Hugging Face Account with [Pro](https://hf.co/pro), [Team](https://hf.co/enterprise), or [Enterprise](https://hf.co/enterprise) plan (Jobs require paid plan)
- Authenticated login: Check with `hf_whoami()`
- **HF_TOKEN for Hub Access** ⚠️ CRITICAL - Required for any Hub operations (push models/datasets, download private repos, etc.)
- Token must have appropriate permissions (read for downloads, write for uploads)
### ✅ **Token Usage** (See Token Usage section for details)
**When tokens are required:**
- Pushing models/datasets to Hub
- Accessing private repositories
- Using Hub APIs in scripts
- Any authenticated Hub operations
**How to provide tokens:**
```python
{
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # Recommended: automatic token
}
```
**⚠️ CRITICAL:** The `$HF_TOKEN` placeholder is automatically replaced with your logged-in token. Never hardcode tokens in scripts.
## Token Usage Guide
### Understanding Tokens
**What are HF Tokens?**
- Authentication credentials for Hugging Face Hub
- Required for authenticated operations (push, private repos, API access)
- Stored securely on your machine after `hf auth login`
**Token Types:**
- **Read Token** - Can download models/datasets, read private repos
- **Write Token** - Can push models/datasets, create repos, modify content
- **Organization Token** - Can act on behalf of an organization
### When Tokens Are Required
**Always Required:**
- Pushing models/datasets to Hub
- Accessing private repositories
- Creating new repositories
- Modifying existing repositories
- Using Hub APIs programmatically
**Not Required:**
- Downloading public models/datasets
- Running jobs that don't interact with Hub
- Reading public repository information
### How to Provide Tokens to Jobs
#### Method 1: Automatic Token (Recommended)
```python
hf_jobs("uv", {
"script": "your_script.py",
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ Automatic replacement
})
```
**How it works:**
- `$HF_TOKEN` is a placeholder that gets replaced with your actual token
- Uses the token from your logged-in session (`hf auth login`)
- Most secure and convenient method
- Token is encrypted server-side when passed as a secret
**Benefits:**
- No token exposure in code
- Uses your current login session
- Automatically updated if you re-login
- Works seamlessly with MCP tools
#### Method 2: Explicit Token (Not Recommended)
```python
hf_jobs("uv", {
"script": "your_script.py",
"secrets": {"HF_TOKEN": "hf_abc123..."} # ⚠️ Hardcoded token
})
```
**When to use:**
- Only if automatic token doesn't work
- Testing with a specific token
- Organization tokens (use with caution)
**Security concerns:**
- Token visible in code/logs
- Must manually update if token rotates
- Risk of token exposure
#### Method 3: Environment Variable (Less Secure)
```python
hf_jobs("uv", {
"script": "your_script.py",
"env": {"HF_TOKEN": "hf_abc123..."} # ⚠️ Less secure than secrets
})
```
**Difference from secrets:**
- `env` variables are visible in job logs
- `secrets` are encrypted server-side
- Always prefer `secrets` for tokens
### Using Tokens in Scripts
**In your Python script, tokens are available as environment variables:**
```python
# /// script
# dependencies = ["huggingface-hub"]
# ///
import os
from huggingface_hub import HfApi
# Token is automatically available if passed via secrets
token = os.environ.get("HF_TOKEN")
# Use with Hub API
api = HfApi(token=token)
# Or let huggingface_hub auto-detect
api = HfApi() # Automatically uses HF_TOKEN env var
```
**Best practices:**
- Don't hardcode tokens in scripts
- Use `os.environ.get("HF_TOKEN")` to access
- Let `huggingface_hub` auto-detect when possible
- Verify token exists before Hub operations
### Token Verification
**Check if you're logged in:**
```python
from huggingface_hub import whoami
user_info = whoami() # Returns your username if authenticated
```
**Verify token in job:**
```python
import os
assert "HF_TOKEN" in os.environ, "HF_TOKEN not found!"
token = os.environ["HF_TOKEN"]
print(f"Token starts with: {token[:7]}...") # Should start with "hf_"
```
### Common Token Issues
**Error: 401 Unauthorized**
- **Cause:** Token missing or invalid
- **Fix:** Add `secrets={"HF_TOKEN": "$HF_TOKEN"}` to job config
- **Verify:** Check `hf_whoami()` works locally
**Error: 403 Forbidden**
- **Cause:** Token lacks required permissions
- **Fix:** Ensure token has write permissions for push operations
- **Check:** Token type at https://huggingface.co/settings/tokens
**Error: Token not found in environment**
- **Cause:** `secrets` not passed or wrong key name
- **Fix:** Use `secrets={"HF_TOKEN": "$HF_TOKEN"}` (not `env`)
- **Verify:** Script checks `os.environ.get("HF_TOKEN")`
**Error: Repository access denied**
- **Cause:** Token doesn't have access to private repo
- **Fix:** Use token from account with access
- **Check:** Verify repo visibility and your permissions
### Token Security Best Practices
1. **Never commit tokens** - Use `$HF_TOKEN` placeholder or environment variables
2. **Use secrets, not env** - Secrets are encrypted server-side
3. **Rotate tokens regularly** - Generate new tokens periodically
4. **Use minimal permissions** - Create tokens with only needed permissions
5. **Don't share tokens** - Each user should use their own token
6. **Monitor token usage** - Check token activity in Hub settings
### Complete Token Example
```python
# Example: Push results to Hub
hf_jobs("uv", {
"script": """
# /// script
# dependencies = ["huggingface-hub", "datasets"]
# ///
import os
from huggingface_hub import HfApi
from datasets import Dataset
# Verify token is available
assert "HF_TOKEN" in os.environ, "HF_TOKEN required!"
# Use token for Hub operations
api = HfApi(token=os.environ["HF_TOKEN"])
# Create and push dataset
data = {"text": ["Hello", "World"]}
dataset = Dataset.from_dict(data)
dataset.push_to_hub("username/my-dataset", token=os.environ["HF_TOKEN"])
print("✅ Dataset pushed successfully!")
""",
"flavor": "cpu-basic",
"timeout": "30m",
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ Token provided securely
})
```
## Quick Start: Two Approaches
### Approach 1: UV Scripts (Recommended)
UV scripts use PEP 723 inline dependencies for clean, self-contained workloads.
**MCP Tool:**
```python
hf_jobs("uv", {
"script": """
# /// script
# dependencies = ["transformers", "torch"]
# ///
from transformers import pipeline
import torch
# Your workload here
classifier = pipeline("sentiment-analysis")
result = classifier("I love Hugging Face!")
print(result)
""",
"flavor": "cpu-basic",
"timeout": "30m"
})
```
**CLI Equivalent:**
```bash
hf jobs uv run my_script.py --flavor cpu-basic --timeout 30m
```
**Python API:**
```python
from huggingface_hub import run_uv_job
run_uv_job("my_script.py", flavor="cpu-basic", timeout="30m")
```
**Benefits:** Direct MCP tool usage, clean code, dependencies declared inline, no file saving required
**When to use:** Default choice for all workloads, custom logic, any scenario requiring `hf_jobs()`
#### Custom Docker Images for UV Scripts
By default, UV scripts use `ghcr.io/astral-sh/uv:python3.12-bookworm-slim`. For ML workloads with complex dependencies, use pre-built images:
```python
hf_jobs("uv", {
"script": "inference.py",
"image": "vllm/vllm-openai:latest", # Pre-built image with vLLM
"flavor": "a10g-large"
})
```
**CLI:**
```bash
hf jobs uv run --image vllm/vllm-openai:latest --flavor a10g-large inference.py
```
**Benefits:** Faster startup, pre-installed dependencies, optimized for specific frameworks
#### Python Version
By default, UV scripts use Python 3.12. Specify a different version:
```python
hf_jobs("uv", {
"script": "my_script.py",
"python": "3.11", # Use Python 3.11
"flavor": "cpu-basic"
})
```
**Python API:**
```python
from huggingface_hub import run_uv_job
run_uv_job("my_script.py", python="3.11")
```
#### Working with Scripts
⚠️ **Important:** There are *two* "script path" stories depending on how you run Jobs:
- **Using the `hf_jobs()` MCP tool (recommended in this repo)**: the `script` value must be **inline code** (a string) or a **URL**. A local filesystem path (like `"./scripts/foo.py"`) won't exist inside the remote container.
- **Using the `hf jobs uv run` CLI**: local file paths **do work** (the CLI uploads your script).
**Common mistake with `hf_jobs()` MCP tool:**
```python
# ❌ Will fail (remote container can't see your local path)
hf_jobs("uv", {"script": "./scripts/foo.py"})
```
**Correct patterns with `hf_jobs()` MCP tool:**
```python
# ✅ Inline: read the local script file and pass its *contents*
from pathlib import Path
script = Path("hf-jobs/scripts/foo.py").read_text()
hf_jobs("uv", {"script": script})
# ✅ URL: host the script somewhere reachable
hf_jobs("uv", {"script": "https://huggingface.co/datasets/uv-scripts/.../raw/main/foo.py"})
# ✅ URL from GitHub
hf_jobs("uv", {"script": "https://raw.githubusercontent.com/huggingface/trl/main/trl/scripts/sft.py"})
```
**CLI equivalent (local paths supported):**
```bash
hf jobs uv run ./scripts/foo.py -- --your --args
```
#### Adding Dependencies at Runtime
Add extra dependencies beyond what's in the PEP 723 header:
```python
hf_jobs("uv", {
"script": "inference.py",
"dependencies": ["transformers", "torch>=2.0"], # Extra deps
"flavor": "a10g-small"
})
```
**Python API:**
```python
from huggingface_hub import run_uv_job
run_uv_job("inference.py", dependencies=["transformers", "torch>=2.0"])
```
### Approach 2: Docker-Based Jobs
Run jobs with custom Docker images and commands.
**MCP Tool:**
```python
hf_jobs("run", {
"image": "python:3.12",
"command": ["python", "-c", "print('Hello from HF Jobs!')"],
"flavor": "cpu-basic",
"timeout": "30m"
})
```
**CLI Equivalent:**
```bash
hf jobs run python:3.12 python -c "print('Hello from HF Jobs!')"
```
**Python API:**
```python
from huggingface_hub import run_job
run_job(image="python:3.12", command=["python", "-c", "print('Hello!')"], flavor="cpu-basic")
```
**Benefits:** Full Docker control, use pre-built images, run any command
**When to use:** Need specific Docker images, non-Python workloads, complex environments
**Example with GPU:**
```python
hf_jobs("run", {
"image": "pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel",
"command": ["python", "-c", "import torch; print(torch.cuda.get_device_name())"],
"flavor": "a10g-small",
"timeout": "1h"
})
```
**Using Hugging Face Spaces as Images:**
You can use Docker images from HF Spaces:
```python
hf_jobs("run", {
"image": "hf.co/spaces/lhoestq/duckdb", # Space as Docker image
"command": ["duckdb", "-c", "SELECT 'Hello from DuckDB!'"],
"flavor": "cpu-basic"
})
```
**CLI:**
```bash
hf jobs run hf.co/spaces/lhoestq/duckdb duckdb -c "SELECT 'Hello!'"
```
### Finding More UV Scripts on Hub
The `uv-scripts` organization provides ready-to-use UV scripts stored as datasets on Hugging Face Hub:
```python
# Discover available UV script collections
dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})
# Explore a specific collection
hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)
```
**Popular collections:** OCR, classification, synthetic-data, vLLM, dataset-creation
## Hardware Selection
> **Reference:** [HF Jobs Hardware Docs](https://huggingface.co/docs/hub/en/spaces-config-reference) (updated 07/2025)
| Workload Type | Recommended Hardware | Use Case |
|---------------|---------------------|----------|
| Data processing, testing | `cpu-basic`, `cpu-upgrade` | Lightweight tasks |
| Small models, demos | `t4-small` | <1B models, quick tests |
| Medium models | `t4-medium`, `l4x1` | 1-7B models |
| Large models, production | `a10g-small`, `a10g-large` | 7-13B models |
| Very large models | `a100-large` | 13B+ models |
| Batch inference | `a10g-large`, `a100-large` | High-throughput |
| Multi-GPU workloads | `l4x4`, `a10g-largex2`, `a10g-largex4` | Parallel/large models |
| TPU workloads | `v5e-1x1`, `v5e-2x2`, `v5e-2x4` | JAX/Flax, TPU-optimized |
**All Available Flavors:**
- **CPU:** `cpu-basic`, `cpu-upgrade`
- **GPU:** `t4-small`, `t4-medium`, `l4x1`, `l4x4`, `a10g-small`, `a10g-large`, `a10g-largex2`, `a10g-largex4`, `a100-large`
- **TPU:** `v5e-1x1`, `v5e-2x2`, `v5e-2x4`
**Guidelines:**
- Start with smaller hardware for testing
- Scale up based on actual needs
- Use multi-GPU for parallel workloads or large models
- Use TPUs for JAX/Flax workloads
- See `references/hardware_guide.md` for detailed specifications
## Critical: Saving Results
**⚠️ EPHEMERAL ENVIRONMENT—MUST PERSIST RESULTS**
The Jobs environment is temporary. All files are deleted when the job ends. If results aren't persisted, **ALL WORK IS LOST**.
### Persistence Options
**1. Push to Hugging Face Hub (Recommended)**
```python
# Push models
model.push_to_hub("username/model-name", token=os.environ["HF_TOKEN"])
# Push datasets
dataset.push_to_hub("username/dataset-name", token=os.environ["HF_TOKEN"])
# Push artifacts
api.upload_file(
path_or_fileobj="results.json",
path_in_repo="results.json",
repo_id="username/results",
token=os.environ["HF_TOKEN"]
)
```
**2. Use External Storage**
```python
# Upload to S3, GCS, etc.
import boto3
s3 = boto3.client('s3')
s3.upload_file('results.json', 'my-bucket', 'results.json')
```
**3. Send Results via API**
```python
# POST results to your API
import requests
requests.post("https://your-api.com/results", json=results)
```
### Required Configuration for Hub Push
**In job submission:**
```python
{
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # Enables authentication
}
```
**In script:**
```python
import os
from huggingface_hub import HfApi
# Token automatically available from secrets
api = HfApi(token=os.environ.get("HF_TOKEN"))
# Push your results
api.upload_file(...)
```
### Verification Checklist
Before submitting:
- [ ] Results persistence method chosen
- [ ] `secrets={"HF_TOKEN": "$HF_TOKEN"}` if using Hub
- [ ] Script handles missing token gracefully
- [ ] Test persistence path works
**See:** `references/hub_saving.md` for detailed Hub persistence guide
## Timeout Management
**⚠️ DEFAULT: 30 MINUTES**
Jobs automatically stop after the timeout. For long-running tasks like training, always set a custom timeout.
### Setting Timeouts
**MCP Tool:**
```python
{
"timeout": "2h" # 2 hours
}
```
**Supported formats:**
- Integer/float: seconds (e.g., `300` = 5 minutes)
- String with suffix: `"5m"` (minutes), `"2h"` (hours), `"1d"` (days)
- Examples: `"90m"`, `"2h"`, `"1.5h"`, `300`, `"1d"`
**Python API:**
```python
from huggingface_hub import run_job, run_uv_job
run_job(image="python:3.12", command=[...], timeout="2h")
run_uv_job("script.py", timeout=7200) # 2 hours in seconds
```
### Timeout Guidelines
| Scenario | Recommended | Notes |
|----------|-------------|-------|
| Quick test | 10-30 min | Verify setup |
| Data processing | 1-2 hours | Depends on data size |
| Batch inference | 2-4 hours | Large batches |
| Experiments | 4-8 hours | Multiple runs |
| Long-running | 8-24 hours | Production workloads |
**Always add 20-30% buffer** for setup, network delays, and cleanup.
**On timeout:** Job killed immediately, all unsaved progress lost
## Cost Estimation
**General guidelines:**
```
Total Cost = (Hours of runtime) × (Cost per hour)
```
**Example calculations:**
**Quick test:**
- Hardware: cpu-basic ($0.10/hour)
- Time: 15 minutes (0.25 hours)
- Cost: $0.03
**Data processing:**
- Hardware: l4x1 ($2.50/hour)
- Time: 2 hours
- Cost: $5.00
**Batch inference:**
- Hardware: a10g-large ($5/hour)
- Time: 4 hours
- Cost: $20.00
**Cost optimization tips:**
1. Start small - Test on cpu-basic or t4-small
2. Monitor runtime - Set appropriate timeouts
3. Use checkpoints - Resume if job fails
4. Optimize code - Reduce unnecessary compute
5. Choose right hardware - Don't over-provision
## Monitoring and Tracking
### Check Job Status
**MCP Tool:**
```python
# List all jobs
hf_jobs("ps")
# Inspect specific job
hf_jobs("inspect", {"job_id": "your-job-id"})
# View logs
hf_jobs("logs", {"job_id": "your-job-id"})
# Cancel a job
hf_jobs("cancel", {"job_id": "your-job-id"})
```
**Python API:**
```python
from huggingface_hub import list_jobs, inspect_job, fetch_job_logs, cancel_job
# List your jobs
jobs = list_jobs()
# List running jobs only
running = [j for j in list_jobs() if j.status.stage == "RUNNING"]
# Inspect specific job
job_info = inspect_job(job_id="your-job-id")
# View logs
for log in fetch_job_logs(job_id="your-job-id"):
print(log)
# Cancel a job
cancel_job(job_id="your-job-id")
```
**CLI:**
```bash
hf jobs ps # List jobs
hf jobs logs <job-id> # View logs
hf jobs cancel <job-id> # Cancel job
```
**Remember:** Wait for user to request status checks. Avoid polling repeatedly.
### Job URLs
After submission, jobs have monitoring URLs:
```
https://huggingface.co/jobs/username/job-id
```
View logs, status, and details in the browser.
### Wait for Multiple Jobs
```python
import time
from huggingface_hub import inspect_job, run_job
# Run multiple jobs
jobs = [run_job(image=img, command=cmd) for img, cmd in workloads]
# Wait for all to complete
for job in jobs:
while inspect_job(job_id=job.id).status.stage not in ("COMPLETED", "ERROR"):
time.sleep(10)
```
## Scheduled Jobs
Run jobs on a schedule using CRON expressions or predefined schedules.
**MCP Tool:**
```python
# Schedule a UV script that runs every hour
hf_jobs("scheduled uv", {
"script": "your_script.py",
"schedule": "@hourly",
"flavor": "cpu-basic"
})
# Schedule with CRON syntax
hf_jobs("scheduled uv", {
"script": "your_script.py",
"schedule": "0 9 * * 1", # 9 AM every Monday
"flavor": "cpu-basic"
})
# Schedule a Docker-based job
hf_jobs("scheduled run", {
"image": "python:3.12",
"command": ["python", "-c", "print('Scheduled!')"],
"schedule": "@daily",
"flavor": "cpu-basic"
})
```
**Python API:**
```python
from huggingface_hub import create_scheduled_job, create_scheduled_uv_job
# Schedule a Docker job
create_scheduled_job(
image="python:3.12",
command=["python", "-c", "print('Running on schedule!')"],
schedule="@hourly"
)
# Schedule a UV script
create_scheduled_uv_job("my_script.py", schedule="@daily", flavor="cpu-basic")
# Schedule with GPU
create_scheduled_uv_job(
"ml_inference.py",
schedule="0 */6 * * *", # Every 6 hours
flavor="a10g-small"
)
```
**Available schedules:**
- `@annually`, `@yearly` - Once per year
- `@monthly` - Once per month
- `@weekly` - Once per week
- `@daily` - Once per day
- `@hourly` - Once per hour
- CRON expression - Custom schedule (e.g., `"*/5 * * * *"` for every 5 minutes)
**Manage scheduled jobs:**
```python
# MCP Tool
hf_jobs("scheduled ps") # List scheduled jobs
hf_jobs("scheduled inspect", {"job_id": "..."}) # Inspect details
hf_jobs("scheduled suspend", {"job_id": "..."}) # Pause
hf_jobs("scheduled resume", {"job_id": "..."}) # Resume
hf_jobs("scheduled delete", {"job_id": "..."}) # Delete
```
**Python API for management:**
```python
from huggingface_hub import (
list_scheduled_jobs,
inspect_scheduled_job,
suspend_scheduled_job,
resume_scheduled_job,
delete_scheduled_job
)
# List all scheduled jobs
scheduled = list_scheduled_jobs()
# Inspect a scheduled job
info = inspect_scheduled_job(scheduled_job_id)
# Suspend (pause) a scheduled job
suspend_scheduled_job(scheduled_job_id)
# Resume a scheduled job
resume_scheduled_job(scheduled_job_id)
# Delete a scheduled job
delete_scheduled_job(scheduled_job_id)
```
## Webhooks: Trigger Jobs on Events
Trigger jobs automatically when changes happen in Hugging Face repositories.
**Python API:**
```python
from huggingface_hub import create_webhook
# Create webhook that triggers a job when a repo changes
webhook = create_webhook(
job_id=job.id,
watched=[
{"type": "user", "name": "your-username"},
{"type": "org", "name": "your-org-name"}
],
domains=["repo", "discussion"],
secret="your-secret"
)
```
**How it works:**
1. Webhook listens for changes in watched repositories
2. When triggered, the job runs with `WEBHOOK_PAYLOAD` environment variable
3. Your script can parse the payload to understand what changed
**Use cases:**
- Auto-process new datasets when uploaded
- Trigger inference when models are updated
- Run tests when code changes
- Generate reports on repository activity
**Access webhook payload in script:**
```python
import os
import json
payload = json.loads(os.environ.get("WEBHOOK_PAYLOAD", "{}"))
print(f"Event type: {payload.get('event', {}).get('action')}")
```
See [Webhooks Documentation](https://huggingface.co/docs/huggingface_hub/guides/webhooks) for more details.
## Common Workload Patterns
This repository ships ready-to-run UV scripts in `hf-jobs/scripts/`. Prefer using them instead of inventing new templates.
### Pattern 1: Dataset → Model Responses (vLLM) — `scripts/generate-responses.py`
**What it does:** loads a Hub dataset (chat `messages` or a `prompt` column), applies a model chat template, generates responses with vLLM, and **pushes** the output dataset + dataset card back to the Hub.
**Requires:** GPU + **write** token (it pushes a dataset).
```python
from pathlib import Path
script = Path("hf-jobs/scripts/generate-responses.py").read_text()
hf_jobs("uv", {
"script": script,
"script_args": [
"username/input-dataset",
"username/output-dataset",
"--messages-column", "messages",
"--model-id", "Qwen/Qwen3-30B-A3B-Instruct-2507",
"--temperature", "0.7",
"--top-p", "0.8",
"--max-tokens", "2048",
],
"flavor": "a10g-large",
"timeout": "4h",
"secrets": {"HF_TOKEN": "$HF_TOKEN"},
})
```
### Pattern 2: CoT Self-Instruct Synthetic Data — `scripts/cot-self-instruct.py`
**What it does:** generates synthetic prompts/answers via CoT Self-Instruct, optionally filters outputs (answer-consistency / RIP), then **pushes** the generated dataset + dataset card to the Hub.
**Requires:** GPU + **write** token (it pushes a dataset).
```python
from pathlib import Path
script = Path("hf-jobs/scripts/cot-self-instruct.py").read_text()
hf_jobs("uv", {
"script": script,
"script_args": [
"--seed-dataset", "davanstrien/s1k-reasoning",
"--output-dataset", "username/synthetic-math",
"--task-type", "reasoning",
"--num-samples", "5000",
"--filter-method", "answer-consistency",
],
"flavor": "l4x4",
"timeout": "8h",
"secrets": {"HF_TOKEN": "$HF_TOKEN"},
})
```
### Pattern 3: Streaming Dataset Stats (Polars + HF Hub) — `scripts/finepdfs-stats.py`
**What it does:** scans parquet directly from Hub (no 300GB download), computes temporal stats, and (optionally) uploads results to a Hub dataset repo.
**Requires:** CPU is often enough; token needed **only** if you pass `--output-repo` (upload).
```python
from pathlib import Path
script = Path("hf-jobs/scripts/finepdfs-stats.py").read_text()
hf_jobs("uv", {
"script": script,
"script_args": [
"--limit", "10000",
"--show-plan",
"--output-repo", "username/finepdfs-temporal-stats",
],
"flavor": "cpu-upgrade",
"timeout": "2h",
"env": {"HF_XET_HIGH_PERFORMANCE": "1"},
"secrets": {"HF_TOKEN": "$HF_TOKEN"},
})
```
## Common Failure Modes
### Out of Memory (OOM)
**Fix:**
1. Reduce batch size or data chunk size
2. Process data in smaller batches
3. Upgrade hardware: cpu → t4 → a10g → a100
### Job Timeout
**Fix:**
1. Check logs for actual runtime
2. Increase timeout with buffer: `"timeout": "3h"`
3. Optimize code for faster execution
4. Process data in chunks
### Hub Push Failures
**Fix:**
1. Add to job: `secrets={"HF_TOKEN": "$HF_TOKEN"}`
2. Verify token in script: `assert "HF_TOKEN" in os.environ`
3. Check token permissions
4. Verify repo exists or can be created
### Missing Dependencies
**Fix:**
Add to PEP 723 header:
```python
# /// script
# dependencies = ["package1", "package2>=1.0.0"]
# ///
```
### Authentication Errors
**Fix:**
1. Check `hf_whoami()` works locally
2. Verify `secrets={"HF_TOKEN": "$HF_TOKEN"}` in job config
3. Re-login: `hf auth login`
4. Check token has required permissions
## Troubleshooting
**Common issues:**
- Job times out → Increase timeout, optimize code
- Results not saved → Check persistence method, verify HF_TOKEN
- Out of Memory → Reduce batch size, upgrade hardware
- Import errors → Add dependencies to PEP 723 header
- Authentication errors → Check token, verify secrets parameter
**See:** `references/troubleshooting.md` for complete troubleshooting guide
## Resources
### References (In This Skill)
- `references/token_usage.md` - Complete token usage guide
- `references/hardware_guide.md` - Hardware specs and selection
- `references/hub_saving.md` - Hub persistence guide
- `references/troubleshooting.md` - Common issues and solutions
### Scripts (In This Skill)
- `scripts/generate-responses.py` - vLLM batch generation: dataset → responses → push to Hub
- `scripts/cot-self-instruct.py` - CoT Self-Instruct synthetic data generation + filtering → push to Hub
- `scripts/finepdfs-stats.py` - Polars streaming stats over `finepdfs-edu` parquet on Hub (optional push)
### External Links
**Official Documentation:**
- [HF Jobs Guide](https://huggingface.co/docs/huggingface_hub/guides/jobs) - Main documentation
- [HF Jobs CLI Reference](https://huggingface.co/docs/huggingface_hub/guides/cli#hf-jobs) - Command line interface
- [HF Jobs API Reference](https://huggingface.co/docs/huggingface_hub/package_reference/hf_api) - Python API details
- [Hardware Flavors Reference](https://huggingface.co/docs/hub/en/spaces-config-reference) - Available hardware
**Related Tools:**
- [UV Scripts Guide](https://docs.astral.sh/uv/guides/scripts/) - PEP 723 inline dependencies
- [UV Scripts Organization](https://huggingface.co/uv-scripts) - Community UV script collection
- [HF Hub Authentication](https://huggingface.co/docs/huggingface_hub/quick-start#authentication) - Token setup
- [Webhooks Documentation](https://huggingface.co/docs/huggingface_hub/guides/webhooks) - Event triggers
## Key Takeaways
1. **Submit scripts inline** - The `script` parameter accepts Python code directly; no file saving required unless user requests
2. **Jobs are asynchronous** - Don't wait/poll; let user check when ready
3. **Always set timeout** - Default 30 min may be insufficient; set appropriate timeout
4. **Always persist results** - Environment is ephemeral; without persistence, all work is lost
5. **Use tokens securely** - Always use `secrets={"HF_TOKEN": "$HF_TOKEN"}` for Hub operations
6. **Choose appropriate hardware** - Start small, scale up based on needs (see hardware guide)
7. **Use UV scripts** - Default to `hf_jobs("uv", {...})` with inline scripts for Python workloads
8. **Handle authentication** - Verify tokens are available before Hub operations
9. **Monitor jobs** - Provide job URLs and status check commands
10. **Optimize costs** - Choose right hardware, set appropriate timeouts
## Quick Reference: MCP Tool vs CLI vs Python API
| Operation | MCP Tool | CLI | Python API |
|-----------|----------|-----|------------|
| Run UV script | `hf_jobs("uv", {...})` | `hf jobs uv run script.py` | `run_uv_job("script.py")` |
| Run Docker job | `hf_jobs("run", {...})` | `hf jobs run image cmd` | `run_job(image, command)` |
| List jobs | `hf_jobs("ps")` | `hf jobs ps` | `list_jobs()` |
| View logs | `hf_jobs("logs", {...})` | `hf jobs logs <id>` | `fetch_job_logs(job_id)` |
| Cancel job | `hf_jobs("cancel", {...})` | `hf jobs cancel <id>` | `cancel_job(job_id)` |
| Schedule UV | `hf_jobs("scheduled uv", {...})` | - | `create_scheduled_uv_job()` |
| Schedule Docker | `hf_jobs("scheduled run", {...})` | - | `create_scheduled_job()` |