
Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics.
evaluate-model follows the SKILL.md standard. Use the install command to add it to your agent stack.
---
name: evaluate-model
description: "Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics."
mcp_fallback: none
category: ml
tier: 2
user-invocable: false
---
# Evaluate Model
Measure machine learning model performance using appropriate metrics for the task (classification, regression, etc.).
## When to Use
- Comparing different model architectures
- Assessing performance on test/validation datasets
- Detecting overfitting or underfitting
- Reporting model accuracy for papers and documentation
## Quick Reference
```mojo
# Mojo model evaluation pattern
struct ModelEvaluator:
fn evaluate_classification(
mut self,
predictions: ExTensor,
ground_truth: ExTensor
) -> Tuple[Float32, Float32, Float32]:
# Returns accuracy, precision, recall
...
fn evaluate_regression(
mut self,
predictions: ExTensor,
ground_truth: ExTensor
) -> Tuple[Float32, Float32]:
# Returns MSE, MAE
...
```
## Workflow
1. **Load test data**: Prepare test/validation dataset
2. **Generate predictions**: Run model inference on test set
3. **Select metrics**: Choose appropriate metrics (accuracy, precision, recall, F1, AUC, MSE, etc.)
4. **Calculate metrics**: Compute performance metrics
5. **Analyze results**: Compare to baseline and identify strengths/weaknesses
## Output Format
Evaluation report:
- Task type (classification, regression, etc.)
- Metrics (accuracy, precision, recall, F1, AUC, etc.)
- Per-class breakdown (if applicable)
- Comparison to baseline model
- Confusion matrix (classification)
- Error analysis
## References
- See CLAUDE.md > Language Preference (Mojo for ML models)
- See `train-model` skill for model training
- See `/notes/review/mojo-ml-patterns.md` for Mojo tensor operations