This PR introduces a comprehensive detection rule for Stable Diffusion WebUI CUDA Out of Memory failures - addressing one of the most critical and widespread issues affecting AUTOMATIC1111 Stable Diffusion deployments globally. The rule identifies CUDA memory exhaustion leading to complete WebUI service failure requiring manual intervention.
CRE-2025-0130 Playground: Test Rule
High-Severity Issue: Stable Diffusion WebUI CUDA failures cause:
Why This Matters: Stable Diffusion CUDA failures are particularly dangerous because:
| # | Issue Type | Example Error Pattern |
|---|---|---|
| 1 | CUDA Memory Exhaustion | torch.cuda.OutOfMemoryError: CUDA out of memory |
| 2 | Model Loading Failures | Failed to allocate tensor on device |
| 3 | Generation Process Crashes | Fatal error during image generation |
| 4 | WebUI Unresponsiveness | Gradio interface becoming unresponsive |
| 5 | Recovery Failures | Recovery failed - WebUI requires restart |
| 6 | CUDA Context Corruption | CUDA context may be corrupted |
| 7 | Complete Service Failure | Complete service failure - manual intervention required |
cd stable-diffusion-demo
cat logs/sd-webui-cuda-oom.log | preq -r ../rules/cre-2025-0130/stable-diffusion-cuda-oom.yaml -d
Test Results:
Repo link (private invitation already send) https://github.com/MAVRICK-1/cuda-oom
https://github.com/user-attachments/assets/321e1dd6-49da-4139-8c3c-9bf9f2164f89
./start-demo.sh
cat logs/roop-cuda-oom.log | preq -r stable-diffusion-cuda-oom.yaml -d
Fixes #130 /claim #130
Rishi Mondal
@MAVRICK-1
Prequel
@prequel-dev