Skillsdeploying-monitoring-stacks

deploying-monitoring-stacks

Monitor use when deploying monitoring stacks including Prometheus, Grafana, and Datadog. Trigger with phrases like "deploy monitoring stack", "setup prometheus", "configure grafana", or "install datadog agent". Generates production-ready configurations with metric collection, visualization dashboards, and alerting rules.

jeremylongshore

1.2k stars

24.6k downloads

Updated 2d ago

Repository

Readme

deploying-monitoring-stacks follows the SKILL.md standard. Use the install command to add it to your agent stack.

---
name: deploying-monitoring-stacks
description: |
  Monitor use when deploying monitoring stacks including Prometheus, Grafana, and Datadog. Trigger with phrases like "deploy monitoring stack", "setup prometheus", "configure grafana", or "install datadog agent". Generates production-ready configurations with metric collection, visualization dashboards, and alerting rules.
allowed-tools: Read, Write, Edit, Grep, Glob, Bash(docker:*), Bash(kubectl:*)
version: 1.0.0
author: Jeremy Longshore <jeremy@intentsolutions.io>
license: MIT
---
# Monitoring Stack Deployer

This skill provides automated assistance for monitoring stack deployer tasks.

## Overview

Deploys monitoring stacks (Prometheus/Grafana/Datadog) including collectors, scraping config, dashboards, and alerting rules for production systems.

## Prerequisites

Before using this skill, ensure:
- Target infrastructure is identified (Kubernetes, Docker, bare metal)
- Metric endpoints are accessible from monitoring platform
- Storage backend is configured for time-series data
- Alert notification channels are defined (email, Slack, PagerDuty)
- Resource requirements are calculated based on scale

## Instructions

1. **Select Platform**: Choose Prometheus/Grafana, Datadog, or hybrid approach
2. **Deploy Collectors**: Install exporters and agents on monitored systems
3. **Configure Scraping**: Define metric collection endpoints and intervals
4. **Set Up Storage**: Configure retention policies and data compaction
5. **Create Dashboards**: Build visualization panels for key metrics
6. **Define Alerts**: Create alerting rules with appropriate thresholds
7. **Test Monitoring**: Verify metrics flow and alert triggering

## Output

**Prometheus + Grafana (Kubernetes):**
```yaml
# {baseDir}/monitoring/prometheus.yaml


## Overview

This skill provides automated assistance for the described functionality.

## Examples

Example usage patterns will be demonstrated in context.
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    scrape_configs:
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
spec:
  replicas: 1
  template:
    spec:
      containers:
      - name: prometheus
        image: prom/prometheus:latest
        args:
          - '--config.file=/etc/prometheus/prometheus.yml'
          - '--storage.tsdb.retention.time=30d'
        ports:
        - containerPort: 9090
```

**Grafana Dashboard Configuration:**
```json
{
  "dashboard": {
    "title": "Application Metrics",
    "panels": [
      {
        "title": "CPU Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(container_cpu_usage_seconds_total[5m])"
          }
        ]
      }
    ]
  }
}
```

## Error Handling

**Metrics Not Appearing**
- Error: "No data points"
- Solution: Verify scrape targets are accessible and returning metrics

**High Cardinality**
- Error: "Too many time series"
- Solution: Reduce label combinations or increase Prometheus resources

**Alert Not Firing**
- Error: "Alert condition met but no notification"
- Solution: Check Alertmanager configuration and notification channels

**Dashboard Load Failure**
- Error: "Failed to load dashboard"
- Solution: Verify Grafana datasource configuration and permissions

## Examples

- "Deploy Prometheus + Grafana on Kubernetes and add alerts for high error rate and latency."
- "Install Datadog agents across hosts and configure a dashboard for CPU/memory saturation."

## Resources

- Prometheus documentation: https://prometheus.io/docs/
- Grafana documentation: https://grafana.com/docs/
- Example dashboards in {baseDir}/monitoring-examples/

Install

Requires askill CLI v1.0+

Metadata

LicenseUnknown

Version-

Updated2d ago

Publisherjeremylongshore

deploying-monitoring-stacks

Readme

Install

Metadata

Tags