/claim #2112

feat: SNMP Provider — SNMPv1/v2c/v3 Trap Receiver + OID Polling + 25 Unit Tests

Closes #2112


Why this PR

SNMP is the industry-standard protocol for network device monitoring — routers, switches, firewalls, UPS units, and servers all emit SNMP traps when something goes wrong. Without an SNMP provider, Keep users running on-prem or hybrid infrastructure have no way to ingest these alerts.

I reviewed all five existing SNMP bounty PRs (#5525, #5552, #5599, #5637, #6107) to understand what each one got right, where each one fell short, and what a production-grade implementation actually needs. This PR is the result of that analysis.


What’s in this PR

Files changed

File Lines Purpose
keep/providers/snmp_provider/__init__.py 3 Module export
keep/providers/snmp_provider/snmp_provider.py 525 Provider implementation
keep/providers/snmp_provider/test_snmp_provider.py 351 25 unit tests

Feature comparison with competing PRs

Feature This PR #5525 #5552 #5599 #5637 #6107
SNMPv1 traps partial partial partial
SNMPv2c traps
SNMPv3 (auth+priv) partial partial partial
Trap listener daemon thread
Clean dispose() lifecycle
Thread-safe alert cache + lock
Optional OID polling
JSON-configurable OID→alert mapping
Longest-prefix OID matching
Built-in enterprise severity defaults
Graceful fallback (no pysnmp)
Bad JSON config handled safely
Unit tests 25 ✅ 4 0 0 0 0

Design decisions and why

1. Longest-prefix OID matching

All five competing PRs use exact-match OID lookups. In practice, enterprise SNMP implementations send trap OIDs with trailing instance identifiers (e.g. 1.3.6.1.4.1.9.9.13.3.0.1 instead of exactly 1.3.6.1.4.1.9.9.13). Exact match silently drops these traps.

This PR implements longest-prefix matching: all configured OID prefixes are sorted by length (descending) and the first match wins. This mirrors how real NMS tools (Nagios, Zabbix, PRTG) handle OID-based routing.

def _map_oid_to_alert(self, oid: str) -> dict:
# Sort by prefix length descending — longest match wins
for prefix in sorted(self._oids_mapping.keys(), key=len, reverse=True):
if oid.startswith(prefix):
return self._oids_mapping[prefix]
return {}

2. Built-in enterprise severity defaults

When no OID mapping is configured, the provider infers severity from well-known IETF and enterprise OID prefixes. This means zero-config works out of the box for common network events:

OID prefix Trap type Inferred severity
1.3.6.1.6.3.1.1.5.3 linkDown critical
1.3.6.1.6.3.1.1.5.5 authenticationFailure critical
1.3.6.1.6.3.1.1.5.2 warmStart warning
1.3.6.1.6.3.1.1.5.1 coldStart info
1.3.6.1.6.3.1.1.5.4 linkUp info
1.3.6.1.4.1.9.* Cisco enterprise high
1.3.6.1.4.1.2636.* Juniper enterprise high
1.3.6.1.4.1.11.* HP/HPE enterprise high
1.3.6.1.4.1.2011.* Huawei enterprise medium

3. Thread-safe alert caching with copy-on-read

The trap listener thread writes to self._alerts under a threading.Lock. get_alerts() returns a shallow copy so callers cannot mutate the internal state. All competing PRs that have a cache skip the lock entirely.

def get_alerts(self, ...) -> list[AlertDto]:
if not self._listener_running:
self._start_trap_listener()
with self._lock:
return list(self._alerts) # return copy, not reference

4. Graceful degradation without pysnmp

pysnmp-lextudio is an optional dependency. If it is not installed the provider logs a warning and get_alerts() returns an empty list rather than raising an ImportError. This avoids crashing the entire Keep process on providers that do not have the optional dep installed.

5. SNMPv3 auth+priv support

Full USM (User-based Security Model) support with configurable auth protocol (MD5/SHA) and privacy protocol (DES/AES). Credentials are marked sensitive: True so they are redacted in Keep’s UI and logs.

6. Safe JSON config handling

If oids_mapping or poll_targets contains invalid JSON, the provider logs a warning and falls back to empty mapping/list instead of raising at startup. None of the competing PRs handle this.


Test coverage

$ cd keep/providers/snmp_provider && python3 -m unittest test_snmp_provider -v
test_dispose_joins_running_threads ... ok
test_dispose_sets_stop_event ... ok
test_dispose_with_no_threads_does_not_raise ... ok
test_calls_start_listener_when_not_running ... ok
test_returns_copy_not_reference ... ok
test_returns_list ... ok
test_bad_oids_mapping_uses_empty ... ok
test_bad_poll_targets_uses_empty ... ok
test_exact_oid_returns_config ... ok
test_longest_prefix_wins ... ok
test_no_match_returns_empty ... ok
test_prefix_match ... ok
test_case_insensitive ... ok
test_critical ... ok
test_empty_returns_none ... ok
test_unknown_returns_none ... ok
test_cisco_oid_is_high ... ok
test_cold_start_is_info ... ok
test_link_down_is_critical ... ok
test_unknown_defaults_to_info ... ok
test_invalid_version_raises ... ok
test_v3_without_username_raises ... ok
test_valid_v1 ... ok
test_valid_v2c ... ok
test_valid_v3_with_username ... ok
----------------------------------------------------------------------
Ran 25 tests in 0.007s
OK

All 25 tests pass without pysnmp installed — pysnmp is fully mocked at the sys.modules level before any imports so the test suite is self-contained and CI-friendly.

Test classes

Class Tests What is covered
TestValidateConfig 5 v1/v2c/v3 valid; invalid version raises; v3 no username raises
TestOidMapping 4 exact match; prefix match; longest prefix wins; no match returns empty
TestSeverityInference 4 linkDown→critical; coldStart→info; Cisco→high; unknown→info
TestParseSeverity 4 critical; case-insensitive; empty→None; unknown→None
TestDispose 3 stop event set; threads joined; no threads is safe
TestGetAlerts 3 returns list; returns copy; starts listener on first call
TestInvalidJsonConfig 2 bad oids_mapping falls back; bad poll_targets falls back

Manual testing

Send a test trap (requires snmp-utils or net-snmp):

# Start listener on port 1620 (no root required)
# Configure the provider with port=1620, version=2c, community_string=public
# Send a linkDown trap
snmptrap -v 2c -c public localhost:1620 "" 1.3.6.1.6.3.1.1.5.3 \
1.3.6.1.2.1.2.2.1.1 i 2
# Send a Cisco enterprise trap
snmptrap -v 2c -c public localhost:1620 "" 1.3.6.1.4.1.9.9.13.3.0.1 \
1.3.6.1.2.1.1.5 s "router-01.example.com"

The resulting AlertDto will have:

  • name: from oids_mapping config or the OID string
  • severity: AlertSeverity.CRITICAL for linkDown (from built-in defaults)
  • source: ["snmp"]
  • description: formatted varbind list

Checklist

  • Code follows Keep’s provider pattern (BaseProvider, AuthConfig pydantic dataclass, AlertDto mapping)
  • Optional dependency handled gracefully (no crash if pysnmp not installed)
  • Thread-safe implementation with proper dispose()
  • 25 unit tests, all passing, no external dependencies required
  • SNMPv1, v2c, v3 all supported
  • Sensitive fields (auth_key, priv_key) marked sensitive: True

Claim

Total prize pool $200
Total paid $0
Status Pending
Submitted March 25, 2026
Last updated March 25, 2026

Contributors

CH

CharlesWong

@CharlesWong

100%

Sponsors

KE

Keep (YC W23)

@keephq

$200