PR

Details

/claim #41

  • Reproducible test setup: Uses kind cluster with CoreDNS scaling reproduction
  • Test data in a test.log file: Real CoreDNS failure timeline with early warning signals
  • CRE Rule: CRE-2025-0071 - CoreDNS unavailable detection with severity 1 (high)
  • Local Testing: ✅ Verified with preq CLI - rule fires correctly on all patterns

CRE Details

  • CRE ID: CRE-2025-0071
  • Severity: 1 (High)
  • Impact Score: 9/10
  • Title: CoreDNS unavailable
  • Category: kubernetes-problem

Problem Detection

This rule provides early warning for CoreDNS failures by detecting:

  1. Scaling to zero: Scaled down replica set coredns-.+ from [1-9]+ to 0
  2. Container termination: Stopping container coredns
  3. Readiness probe failures: Readiness probe failed.+connection refused

Why This Matters

CoreDNS unavailability is a critical failure that can break all service discovery in the cluster, cause cascading readiness probe failures, lead to complete cluster outage, and is commonly cited in production post-mortems.

Commands for Sample Data

A full reproduction repo is available here with a README and script that automates the below simplified commands

The test.log file associated with CRE-2025-0071 was generated by reproducing CoreDNS failure scenarios. The core commands executed to produce the log entries are:

1. To reproduce CoreDNS scaling failure:

# Scale CoreDNS to zero (triggers immediate detection)
kubectl -n kube-system scale deployment/coredns --replicas=0

# Capture the scaling event
kubectl -n kube-system get events --sort-by=.lastTimestamp | grep coredns

2. To capture readiness probe failures:

# Monitor pod termination and readiness failures
kubectl -n kube-system get events --watch --field-selector reason=Killing,reason=Unhealthy | grep coredns

3. To collect timeline for test.log:

# Collect events with timestamps
kubectl -n kube-system get events --sort-by=.lastTimestamp -o custom-columns=TIME:.lastTimestamp,TYPE:.type,REASON:.reason,OBJECT:.involvedObject.name,MESSAGE:.message | grep coredns

The test.log contains real failure events that demonstrate how this rule detects CoreDNS unavailability before DNS queries start timing out, providing critical early warning for cluster-wide DNS outages.

Video

https://github.com/user-attachments/assets/e07ed205-2354-4124-8c7d-7d01d56a0ade

Claim

Total prize pool $250
Total paid $250
Status Approved
Submitted June 02, 2025
Last updated June 02, 2025

Contributors

NI

Nicolas Yarosz

@yarosz

100%

Sponsors

PR

Prequel

@prequel-dev

$250 paid