askill
ssmd-dq-run

ssmd-dq-runSafety 95Repository

How to run ssmd DQ checks locally and in-cluster, interpret scores, trigger email reports, and verify results. Use when running data quality checks, re-sending DQ emails, or verifying pipeline health after deployments or backfills.

1 stars
1.2k downloads
Updated 2/20/2026

Package Files

Loading files...
SKILL.md

ssmd-dq-run

Procedures for running ssmd Data Quality checks and interpreting results.

Source Files

FilePurpose
data/dq.pyDQRunner engine — 13 checks, scoring, CLI
data/dq_email.pyEmail report wrapper — runs all feeds, HTML output
data/DockerfileDQ image: python:3.12-slim + duckdb + gcloud monitoring

Running DQ Locally

Requires gcloud auth application-default login for GCS access.

# Single feed
uv run data/dq.py --date 2026-02-17 --feed kalshi --stream crypto

# With verbose progress
uv run data/dq.py --date 2026-02-17 --feed kalshi --stream crypto --verbose

# JSON output (for programmatic use)
uv run data/dq.py --date 2026-02-17 --feed kalshi --stream crypto --json

# Non-default prefix (when GCS prefix differs from feed name)
uv run data/dq.py --date 2026-02-17 --feed kraken-futures --stream futures --prefix kraken-futures
uv run data/dq.py --date 2026-02-17 --feed polymarket --stream markets --prefix polymarket

All Three Feeds

Run all feeds in parallel for full pipeline verification:

uv run data/dq.py --date 2026-02-17 --feed kalshi --stream crypto
uv run data/dq.py --date 2026-02-17 --feed kraken-futures --stream futures --prefix kraken-futures
uv run data/dq.py --date 2026-02-17 --feed polymarket --stream markets --prefix polymarket

Feed Parameters

Feed--feed--stream--prefix
Kalshikalshicrypto(default: kalshi)
Kraken Futureskraken-futuresfutureskraken-futures
Polymarketpolymarketmarketspolymarket

Running DQ In-Cluster

The DQ CronJob runs at 03:30 UTC daily (after parquet-gen at 02:00 UTC).

Manifest: clusters/gke-prod/apps/ssmd/cronjobs/dq-daily.yaml

Trigger a manual DQ email run

kubectl create job --from=cronjob/ssmd-dq-daily ssmd-dq-manual-MMDD -n ssmd

Watch progress

kubectl logs -n ssmd job/ssmd-dq-manual-MMDD -f

Re-run for a specific date

The CronJob defaults to yesterday. To override:

kubectl create job --from=cronjob/ssmd-dq-daily ssmd-dq-rerun-MMDD -n ssmd --dry-run=client -o yaml | \
  sed 's|dq_email.py|dq_email.py --date 2026-02-17|' | \
  kubectl apply -f -

Interpreting Scores

Grades

GradeScore RangeMeaning
GREEN>= 98Pipeline healthy, all checks passing
YELLOW>= 85Minor issues, investigate when convenient
RED< 85Significant issues, investigate promptly

Check Statuses

StatusWeightMeaning
pass1.0Check passed
warn0.7Threshold exceeded but not critical
fail0.0Check failed
skipexcludedNot enough data to run, excluded from score

Score = average of weights * 100.

Exit Codes

  • dq.py exits 1 if any check has status fail
  • dq_email.py always exits 0 (email is the alert mechanism)

Notebook / Programmatic Usage

from dq import DQRunner

runner = DQRunner(bucket="ssmd-data", feed="kalshi", stream="crypto")
results = runner.run("2026-02-12")
results.summary()       # print human-readable report
results.score()         # float 0-100
results.to_json()       # JSON string

# Ad-hoc queries via the shared DuckDB connection
runner.con.execute(
    "SELECT * FROM read_parquet('gcs://ssmd-data/kalshi/crypto/2026-02-12/ticker_*.parquet') LIMIT 10"
).fetchdf()

# Date range
all_results = runner.run_range("2026-02-10", "2026-02-17")

Email Report

dq_email.py runs all 3 feeds, generates an HTML email with per-feed grades and check details, and sends via SMTP.

Required env vars: SMTP_USER, SMTP_PASS, SMTP_TO Optional: SMTP_HOST (default: smtp.gmail.com), SMTP_PORT (default: 587)

These are provided in-cluster via the ssmd-smtp-credentials Secret.

Post-Deploy / Post-Backfill Verification

After deploying a new DQ version or backfilling parquet data:

  1. Run DQ locally for all 3 feeds (see commands above)
  2. Verify target checks show PASS
  3. Optionally trigger in-cluster email: kubectl create job --from=cronjob/ssmd-dq-daily ...
  4. Verify email arrives with corrected scores

Image Build

DQ image is built from data/Dockerfile, triggered by dq-v* tags in the 899bushwick repo (not ssmd).

See the ssmd-deploy skill for full deployment procedure.

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

84/100Analyzed 2/24/2026

Well-structured technical reference for running ssmd DQ checks locally and in-cluster. Provides comprehensive coverage of commands, score interpretation, and programmatic usage. Strong actionability with clear examples and tables. Slight reduction due to internal-only nature of the specific GKE/bucket infrastructure, though content is accurate and reusable as a reference pattern.

95
90
65
90
85

Metadata

Licenseunknown
Version-
Updated2/20/2026
Publisheraaronwald

Tags

ci-cdobservabilitysecurity