/claim #2112
Closes #2112
SNMP is the industry-standard protocol for network device monitoring — routers, switches, firewalls, UPS units, and servers all emit SNMP traps when something goes wrong. Without an SNMP provider, Keep users running on-prem or hybrid infrastructure have no way to ingest these alerts.
I reviewed all five existing SNMP bounty PRs (#5525, #5552, #5599, #5637, #6107) to understand what each one got right, where each one fell short, and what a production-grade implementation actually needs. This PR is the result of that analysis.
| File | Lines | Purpose |
|---|---|---|
keep/providers/snmp_provider/__init__.py |
3 | Module export |
keep/providers/snmp_provider/snmp_provider.py |
525 | Provider implementation |
keep/providers/snmp_provider/test_snmp_provider.py |
351 | 25 unit tests |
| Feature | This PR | #5525 | #5552 | #5599 | #5637 | #6107 |
|---|---|---|---|---|---|---|
| SNMPv1 traps | ✅ | partial | ✅ | partial | partial | ✅ |
| SNMPv2c traps | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| SNMPv3 (auth+priv) | ✅ | partial | ✅ | partial | partial | ✅ |
| Trap listener daemon thread | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Clean dispose() lifecycle |
✅ | ❌ | ✅ | ❌ | ❌ | ✅ |
| Thread-safe alert cache + lock | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ |
| Optional OID polling | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ |
| JSON-configurable OID→alert mapping | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ |
| Longest-prefix OID matching | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Built-in enterprise severity defaults | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Graceful fallback (no pysnmp) | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ |
| Bad JSON config handled safely | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Unit tests | 25 ✅ | 4 | 0 | 0 | 0 | 0 |
All five competing PRs use exact-match OID lookups. In practice, enterprise SNMP implementations send trap OIDs with trailing instance identifiers (e.g. 1.3.6.1.4.1.9.9.13.3.0.1 instead of exactly 1.3.6.1.4.1.9.9.13). Exact match silently drops these traps.
This PR implements longest-prefix matching: all configured OID prefixes are sorted by length (descending) and the first match wins. This mirrors how real NMS tools (Nagios, Zabbix, PRTG) handle OID-based routing.
def _map_oid_to_alert(self, oid: str) -> dict:
# Sort by prefix length descending — longest match wins
for prefix in sorted(self._oids_mapping.keys(), key=len, reverse=True):
if oid.startswith(prefix):
return self._oids_mapping[prefix]
return {}
When no OID mapping is configured, the provider infers severity from well-known IETF and enterprise OID prefixes. This means zero-config works out of the box for common network events:
| OID prefix | Trap type | Inferred severity |
|---|---|---|
1.3.6.1.6.3.1.1.5.3 |
linkDown |
critical |
1.3.6.1.6.3.1.1.5.5 |
authenticationFailure |
critical |
1.3.6.1.6.3.1.1.5.2 |
warmStart |
warning |
1.3.6.1.6.3.1.1.5.1 |
coldStart |
info |
1.3.6.1.6.3.1.1.5.4 |
linkUp |
info |
1.3.6.1.4.1.9.* |
Cisco enterprise | high |
1.3.6.1.4.1.2636.* |
Juniper enterprise | high |
1.3.6.1.4.1.11.* |
HP/HPE enterprise | high |
1.3.6.1.4.1.2011.* |
Huawei enterprise | medium |
The trap listener thread writes to self._alerts under a threading.Lock. get_alerts() returns a shallow copy so callers cannot mutate the internal state. All competing PRs that have a cache skip the lock entirely.
def get_alerts(self, ...) -> list[AlertDto]:
if not self._listener_running:
self._start_trap_listener()
with self._lock:
return list(self._alerts) # return copy, not reference
pysnmp-lextudio is an optional dependency. If it is not installed the provider logs a warning and get_alerts() returns an empty list rather than raising an ImportError. This avoids crashing the entire Keep process on providers that do not have the optional dep installed.
Full USM (User-based Security Model) support with configurable auth protocol (MD5/SHA) and privacy protocol (DES/AES). Credentials are marked sensitive: True so they are redacted in Keep’s UI and logs.
If oids_mapping or poll_targets contains invalid JSON, the provider logs a warning and falls back to empty mapping/list instead of raising at startup. None of the competing PRs handle this.
$ cd keep/providers/snmp_provider && python3 -m unittest test_snmp_provider -v
test_dispose_joins_running_threads ... ok
test_dispose_sets_stop_event ... ok
test_dispose_with_no_threads_does_not_raise ... ok
test_calls_start_listener_when_not_running ... ok
test_returns_copy_not_reference ... ok
test_returns_list ... ok
test_bad_oids_mapping_uses_empty ... ok
test_bad_poll_targets_uses_empty ... ok
test_exact_oid_returns_config ... ok
test_longest_prefix_wins ... ok
test_no_match_returns_empty ... ok
test_prefix_match ... ok
test_case_insensitive ... ok
test_critical ... ok
test_empty_returns_none ... ok
test_unknown_returns_none ... ok
test_cisco_oid_is_high ... ok
test_cold_start_is_info ... ok
test_link_down_is_critical ... ok
test_unknown_defaults_to_info ... ok
test_invalid_version_raises ... ok
test_v3_without_username_raises ... ok
test_valid_v1 ... ok
test_valid_v2c ... ok
test_valid_v3_with_username ... ok
----------------------------------------------------------------------
Ran 25 tests in 0.007s
OK
All 25 tests pass without pysnmp installed — pysnmp is fully mocked at the sys.modules level before any imports so the test suite is self-contained and CI-friendly.
| Class | Tests | What is covered |
|---|---|---|
TestValidateConfig |
5 | v1/v2c/v3 valid; invalid version raises; v3 no username raises |
TestOidMapping |
4 | exact match; prefix match; longest prefix wins; no match returns empty |
TestSeverityInference |
4 | linkDown→critical; coldStart→info; Cisco→high; unknown→info |
TestParseSeverity |
4 | critical; case-insensitive; empty→None; unknown→None |
TestDispose |
3 | stop event set; threads joined; no threads is safe |
TestGetAlerts |
3 | returns list; returns copy; starts listener on first call |
TestInvalidJsonConfig |
2 | bad oids_mapping falls back; bad poll_targets falls back |
Send a test trap (requires snmp-utils or net-snmp):
# Start listener on port 1620 (no root required)
# Configure the provider with port=1620, version=2c, community_string=public
# Send a linkDown trap
snmptrap -v 2c -c public localhost:1620 "" 1.3.6.1.6.3.1.1.5.3 \
1.3.6.1.2.1.2.2.1.1 i 2
# Send a Cisco enterprise trap
snmptrap -v 2c -c public localhost:1620 "" 1.3.6.1.4.1.9.9.13.3.0.1 \
1.3.6.1.2.1.1.5 s "router-01.example.com"
The resulting AlertDto will have:
name: from oids_mapping config or the OID stringseverity: AlertSeverity.CRITICAL for linkDown (from built-in defaults)source: ["snmp"]description: formatted varbind listsensitive: TrueCharlesWong
@CharlesWong
Keep (YC W23)
@keephq