All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren’t other open Pull Requests for the same update/change?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you’d like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

Implement per-collection request metrics for REST and gRPC

Description

This PR introduces granular tracking of request metrics (latency, counts, errors) per collection. Previously, request metrics were global, making it difficult to identify which specific collection was causing load or experiencing latency in a multi-tenant environment.

With this change, metrics such as rest_responses_total and grpc_responses_total now include a collection label (e.g., collection="my_collection").

Key Changes

  • Per-Collection Metrics: Added collection label to standard request metrics.
  • Configuration:
    • telemetry_config.per_collection_metrics: Enable/disable feature (default: true).
    • telemetry_config.max_collections_in_metrics: Limits the number of tracked collections to prevent memory exhaustion (default: 1000).
  • REST Support:
    • Automatically extracts collection names from request paths (/collections/{name}/...).
    • Includes fallback manual path parsing to handle cases where middleware executes before routing.
  • gRPC Support:
    • Instruments key handlers in Collections, Points, and Snapshots services using a task_local! context to propagate collection names.

Implementation Details

  1. Safety: Uses an LruCache to store per-collection metrics, ensuring that systems with high collection churn or cardinality do not suffer from unbounded memory growth.
  2. Context Propagation:
    • For REST: Leverages Actix match_info and fallback path parsing.
    • For gRPC: Uses tokio::task_local to pass metadata from the handler to the telemetry collector without changing method signatures significantly.

Verification

A new end-to-end test script has been added: tests/per_collection_metrics_test.sh.

How to run

./tests/per_collection_metrics_test.sh

Coverage

  • Creates a collection and performs upsert/search operations via REST.
  • Verifies rest_responses_total contains the correct collection label.
  • Performs gRPC search operations (via Dockerized grpcurl).
  • Verifies grpc_responses_total contains the correct collection label.
tests

Known Limitations

  • Prometheus Cardinality: Enabling this feature increases the number of time series by a factor of specific collections. Users with >1000 active collections should monitor their Prometheus ingestion.
  • gRPC Instrumentation: Requires manual set_collection_context() calls in handlers; new endpoints must be explicitly instrumented.

Metrics Dump

metrics_dump

Demo video

https://github.com/user-attachments/assets/3e53218b-b3e0-42d9-923f-6869582117ec

/claim #3322

Claim

Total prize pool $150
Total paid $0
Status Pending
Submitted December 07, 2025
Last updated December 07, 2025

Contributors

MA

Markster

@itsmarkster

100%

Sponsors

QD

Qdrant

@Qdrant

$150