
Publisher on askill
Microsoft DeepSpeed — ZeRO stages 1/2/3, CPU/NVMe offloading, HuggingFace integration, communication tuning, and DeepSpeed-Inference. Use when configuring distributed training or inference with DeepSp...
PromQL (Prometheus Query Language) reference — selectors, operators, functions, aggregations, and query patterns. Use when writing or debugging PromQL queries for Prometheus metrics, Grafana dashboard...
External Secrets Operator (ESO) — SecretStore, ClusterSecretStore, ExternalSecret, PushSecret, templating, and multi-tenant patterns. Use when syncing secrets from AWS Secrets Manager or Vault into K8...
Prometheus + Grafana on Kubernetes via kube-prometheus-stack — ServiceMonitors, PrometheusRules, dashboards, DCGM GPU metrics, alerting, and remote storage. Use when setting up monitoring. For PromQL...
GitHub Actions CI — workflows, triggers, matrix, reusable workflows, composite actions, self-hosted runners (ARC), caching, secrets, artifacts, containers, security, best practices, recommended action...
Tailscale Kubernetes operator — Service exposure, Ingress, egress, Connectors, ProxyClass, ProxyGroup, DNSConfig, and Funnel. Use when integrating Tailscale with K8s. NOT for Tailscale CLI or non-K8s...
Ray Data — Dataset creation, preprocessing pipelines, GPU-accelerated transforms, streaming into Train/Serve, and performance tuning. Use when building data pipelines for Ray ML workloads.
PyTorch FSDP — sharding strategies, mixed precision, activation checkpointing, auto_wrap_policy, checkpointing, and HuggingFace integration. Use when training models too large for a single GPU with FS...