🔧 Fix Summary

This PR resolves the sporadic SSH ‘Permission denied (publickey)’ errors reported in #7724.

🎯 Root Cause Analysis

The issue was caused by three interconnected problems in SSH multiplexing:

  1. Incomplete SSH Key Validation - Only checked if file exists, not content validity or permissions
  2. Missing Retry Logic - Failed connections weren’t retried, leaving stale multiplexed sockets
  3. No Connection Recovery - Health checks detected failures but didn’t auto-recover

✅ Fixes Implemented

Fix 1: Enhanced SSH Key Validation

  • ✅ Verify file permissions (must be 0600 per SSH spec)
  • ✅ Validate PEM format with regex check
  • ✅ Test key accessibility with ssh-keygen
  • ✅ Detailed logging for diagnostics

Fix 2: Automatic Retry with Recovery

  • ✅ Up to 3 retry attempts on connection failures
  • ✅ Re-validate key before each retry
  • ✅ Clean up stale mux sockets between attempts
  • ✅ Progressive backoff with 1s sleep

Fix 3: Self-Healing Health Checks

  • ✅ Auto-recover connections on health check failures
  • ✅ Rebuild multiplexed connection when unhealthy
  • ✅ Log all recovery attempts

📊 Testing Recommendations

  1. Monitor SSH connection logs for retry attempts
  2. Force a key permission change and verify auto-fix
  3. Test with network interruptions to verify recovery
  4. Verify no ‘Permission denied’ errors in next deployment

💡 Impact

This should eliminate the sporadic authentication failures by:

  • Ensuring SSH keys are always valid and accessible
  • Automatically recovering from temporary connection issues
  • Providing detailed diagnostics for any remaining issues

Fixes: #7724

/claim #7724

Claim

Total prize pool $250
Total paid $0
Status Pending
Submitted February 02, 2026
Last updated February 02, 2026

Contributors

AN

andynewtw

@andynewtw

100%

Sponsors

ZA

Zach Latta

@zachlatta

$250