askill
promptfoo

promptfooSafety 100Repository

Promptfoo evaluation framework for testing and comparing LLM outputs. Use when writing eval configs, creating test cases, debugging eval runs, or working with assertions.

0 stars
1.2k downloads
Updated 2/5/2026

Package Files

Loading files...
SKILL.md

Promptfoo

Promptfoo is a CLI tool for testing and comparing LLM outputs.

Config File

The CLI auto-discovers promptfooconfig.yaml in the current directory. Use -c path for other locations.

Supported extensions: .yaml, .json, .js

Configuration

# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: "What this eval tests"

prompts:
  - file://prompt.txt
  - |
    Inline prompt with {{variable}} substitution

providers:
  - anthropic:messages:claude-sonnet-4-5-20250929

defaultTest:
  options:
    provider:
      config:
        temperature: 0.0
        max_tokens: 4096

tests:
  - description: "What this case tests"
    vars:
      variable: "value"
      from_file: file://data/input.txt
    assert:
      - type: contains
        value: "expected substring"

# Or load tests from files
tests: file://cases/all.yaml

outputPath: ./results.json

evaluateOptions:
  maxConcurrency: 4

Provider IDs

ModelID
Opus 4.5anthropic:messages:claude-opus-4-5-20251101
Sonnet 4.5anthropic:messages:claude-sonnet-4-5-20250929
Haiku 4.5anthropic:messages:claude-haiku-4-5-20251001

Provider config: temperature, max_tokens, top_p, top_k, tools, tool_choice

Prompts

  • file://path.txt — load from file (path relative to config)
  • Inline string with {{variable}} Nunjucks substitution
  • Chat format via JSON: [{"role": "system", "content": "..."}, {"role": "user", "content": "{{input}}"}]

Assertion Types

TypeUseValue
containsSubstring match"expected text"
icontainsCase-insensitive substring"expected text"
equalsExact match"exact value"
regexPattern match"\\d{4}-\\d{2}-\\d{2}"
is-jsonValid JSON output
contains-jsonOutput contains JSON
starts-withPrefix match"prefix"
costMax costthreshold: 0.01
latencyMax response time (ms)threshold: 5000
javascriptCustom JS expressionoutput.includes('x')
pythonCustom Pythonfile://check.py:fn_name
llm-rubricLLM-as-judgerubric text
similarSemantic similarityvalue: "text", threshold: 0.8
model-graded-factualityFact checking

Prefix any assertion with not- to negate (e.g., not-contains).

llm-rubric

Uses an LLM to grade output against a rubric:

assert:
  - type: llm-rubric
    value: |
      The response should:
      - Mention at least 3 factors
      - Include specific examples
    threshold: 0.7
    provider: anthropic:messages:claude-sonnet-4-5-20250929

javascript

Inline expressions or functions. Access output (string) and context (with vars, prompt):

assert:
  - type: javascript
    value: output.length > 100 && output.includes('route')
  - type: javascript
    value: |
      const data = JSON.parse(output);
      return data.calories >= 200 && data.calories <= 300;

Test Organization

Split cases into separate files and reference them:

tests:
  - file://cases/basic.yaml
  - file://cases/edge-cases.yaml

Each case file contains a YAML array of test objects.

CLI

npx promptfoo eval                         # Run with auto-discovered config
npx promptfoo eval -c path/to/config.yaml  # Specific config
npx promptfoo eval --filter-metadata key=v # Filter tests
npx promptfoo view                         # Web UI for results
npx promptfoo cache clear                  # Clear result cache

References

Consult the configuration reference and Anthropic provider docs for full details.

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

95/100Analyzed 2/13/2026

A comprehensive and high-density reference for the Promptfoo evaluation framework. It provides detailed configuration schemas, assertion types, provider IDs, and CLI commands, making it highly effective for configuring and running LLM evaluations.

100
95
100
95
95

Metadata

Licenseunknown
Version-
Updated2/5/2026
Publishermajiayu000

Tags

llmpromptingtesting