RSPEED-2943: add observability metrics for auth, authorization, and quota by major · Pull Request #1640 · lightspeed-core/lightspeed-stack

major · 2026-04-29T23:24:38Z

Description

Adds bounded Prometheus metrics across the three pre-request middleware layers on /v1/responses and /v1/infer:

Authentication - attempt/result/reason counters and latency histogram across all auth modes (noop, token, JWK, Kubernetes, API key, rh-identity). No user identifiers in labels.
Authorization - success/denial/error counters and latency histogram by protected action. No user or request identifiers in labels.
Quota - pre-request quota check counters by endpoint, quota type, and result. No actual quota subject identifiers in labels.

Each commit is self-contained and can be reviewed independently.

Type of change

Tools used to create PR

Assisted-by: OpenCode
Generated by: N/A

Related Tickets & Documents

Related Issue # RSPEED-2943
Supersedes RSPEED-2943: add auth monitoring metrics #1638 and RSPEED-2943: add authorization monitoring metrics #1639

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

uv run pytest tests/unit/metrics/test_recording.py \
  tests/unit/authentication/test_utils.py \
  tests/unit/authentication/test_noop.py \
  tests/unit/authentication/test_noop_with_token.py \
  tests/unit/authentication/test_api_key_token.py \
  tests/unit/authentication/test_jwk_token.py \
  tests/unit/authentication/test_k8s.py \
  tests/unit/authorization/test_middleware.py \
  tests/unit/app/endpoints/test_responses.py \
  tests/unit/app/endpoints/test_rlsapi_v1.py
uv run make verify

Summary by CodeRabbit

Chores
- Enhanced observability infrastructure with new metrics for authentication attempts, authorization checks, and quota availability monitoring.
- Added bounded-duration histogram tracking for auth, authorization, and quota operations to improve performance insights.
- Improved error handling and logging across authentication modules with standardized metrics recording and graceful failure modes.

coderabbitai · 2026-04-29T23:24:44Z

Warning

Rate limit exceeded

@major has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 28 minutes and 34 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f235b66a-044e-4a1d-8522-cc7765db6f08

📥 Commits

Reviewing files that changed from the base of the PR and between f71930b and 0eccec0.

📒 Files selected for processing (20)

src/app/endpoints/responses.py
src/app/endpoints/rlsapi_v1.py
src/authentication/api_key_token.py
src/authentication/jwk_token.py
src/authentication/k8s.py
src/authentication/noop.py
src/authentication/noop_with_token.py
src/authentication/rh_identity.py
src/authentication/utils.py
src/authorization/middleware.py
src/metrics/__init__.py
src/metrics/recording.py
tests/unit/app/endpoints/test_responses.py
tests/unit/app/endpoints/test_rlsapi_v1.py
tests/unit/authentication/test_jwk_token.py
tests/unit/authentication/test_k8s.py
tests/unit/authentication/test_rh_identity.py
tests/unit/authentication/test_utils.py
tests/unit/authorization/test_middleware.py
tests/unit/metrics/test_recording.py

Walkthrough

Adds bounded-duration Prometheus metrics and normalization helpers, records auth/authorization/quota check metrics with timing, wires these helpers into authentication/authorization flows and response/infer endpoints, and extends unit tests to cover metric recording and label-bounding behavior.

Changes

Cohort / File(s)	Summary
Endpoint Quota Instrumentation `src/app/endpoints/responses.py`, `src/app/endpoints/rlsapi_v1.py`	Adds `_check_response_quota` and `_check_infer_quota` helpers, replaces inline quota checks in handlers with these helpers, and labels metrics with a fixed `endpoint_path`. Helpers record success/failure/error outcomes with elapsed time and re-raise original exceptions.
Metrics Infrastructure `src/metrics/__init__.py`, `src/metrics/recording.py`	Defines sub-second histogram bucket constants and new Counter/Histogram metrics for auth, authorization, and quota checks. Adds normalization constants and helper functions plus recording helpers (`record_auth_`, `record_authorization_`, `record_quota_check`) that clamp labels and log on metric backend errors.
Authentication Instrumentation `src/authentication/...` (`utils.py`, `jwk_token.py`, `api_key_token.py`, `k8s.py`, `rh_identity.py`, `noop*.py`)	Introduces per-request timing and calls to `record_auth_metrics` in multiple auth dependencies, centralizes JWK handling and validation helpers, strengthens RH Identity payload validation, and records bounded failure/success reasons before re-raising existing HTTP exceptions.
Authorization Middleware `src/authorization/middleware.py`	Adds timing and recording of authorization checks and durations, moving metric emission into a finally block so metrics are attempted regardless of outcome.
Tests `tests/unit/...`	Expands unit tests to assert metric recording on success, label normalization to bounded fallbacks, and that metric-recording failures log warnings with `exc_info=True` without masking original errors; adds tests for quota unexpected errors and auth-path instrumentation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding observability metrics for authentication, authorization, and quota mechanisms across the codebase.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 535-562: The _check_infer_quota function currently logs raw
strings ("skipped","failure","success") and only catches HTTPException; change
it to use the exported recording constants (e.g.,
recording.QUOTA_RESULT_SKIPPED, recording.QUOTA_RESULT_FAILURE,
recording.QUOTA_RESULT_SUCCESS) in the three record_quota_check calls, and add a
broad except Exception block around check_tokens_available that records
recording.QUOTA_RESULT_ERROR with the elapsed time before re-raising the
exception; keep the existing HTTPException except as-is but use the constant
there too. Use the existing symbols _check_infer_quota, check_tokens_available,
and recording.record_quota_check to locate and modify the code.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e9a8ec74-e971-491a-8e93-5dd57f734cd2

📥 Commits

Reviewing files that changed from the base of the PR and between ca125c4 and 7668c4b.

📒 Files selected for processing (6)

src/app/endpoints/responses.py
src/app/endpoints/rlsapi_v1.py
src/metrics/__init__.py
src/metrics/recording.py
tests/unit/app/endpoints/test_responses.py
tests/unit/metrics/test_recording.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)

GitHub Check: Pylinter
GitHub Check: mypy
GitHub Check: unit_tests (3.13)
GitHub Check: build-pr
GitHub Check: unit_tests (3.12)
GitHub Check: integration_tests (3.12)
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E Tests for Lightspeed Evaluation job

🧰 Additional context used

📓 Path-based instructions (6)

tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/unit/**/*.py: Use pytest for all unit and integration tests
Use pytest-mock for AsyncMock objects in unit tests
Use marker pytest.mark.asyncio for async tests

Files:

tests/unit/app/endpoints/test_responses.py
tests/unit/metrics/test_recording.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not use unittest - pytest is the standard testing framework for this project

Files:

tests/unit/app/endpoints/test_responses.py
tests/unit/metrics/test_recording.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax: str | int
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes use ABC with @abstractmethod decorators
Complete type annotations for all class attributes, use specific types, not Any
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection

Files:

src/app/endpoints/rlsapi_v1.py
src/metrics/__init__.py
src/app/endpoints/responses.py
src/metrics/recording.py

src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI dependencies: from fastapi import APIRouter, HTTPException, Request, status, Depends

Files:

src/app/endpoints/rlsapi_v1.py
src/app/endpoints/responses.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

src/app/endpoints/rlsapi_v1.py
src/app/endpoints/responses.py

src/**/__init__.py

📄 CodeRabbit inference engine (AGENTS.md)

Package __init__.py files contain brief package descriptions

Files:

src/metrics/__init__.py

🧠 Learnings (2)

📚 Learning: 2026-04-07T14:44:42.022Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.

Applied to files:

src/app/endpoints/rlsapi_v1.py

📚 Learning: 2026-04-06T20:18:07.852Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

src/app/endpoints/rlsapi_v1.py
src/app/endpoints/responses.py

🔇 Additional comments (8)

tests/unit/app/endpoints/test_responses.py (1)

223-244: LGTM!

The test correctly verifies that unexpected quota backend failures (RuntimeError) are re-raised after recording metrics with the appropriate arguments (endpoint_path="/v1/responses", quota_type="user_id", result="error", and a non-negative duration).

src/app/endpoints/responses.py (2)

141-170: LGTM!

The quota check wrapper correctly:

Times the quota check using time.monotonic()

Records metrics with bounded labels before re-raising exceptions

Uses the exported constants for type safety (QUOTA_TYPE_USER_ID, QUOTA_RESULT_*)

Handles both expected (HTTPException) and unexpected (Exception) quota backend failures

311-316: LGTM!

Moving endpoint_path assignment earlier and using it consistently for quota metrics is clean.

src/metrics/__init__.py (2)

11-24: LGTM!

The bucket distribution covers sub-second latencies well (1ms to 5s), which is appropriate for quota backend checks. Including float("inf") as the final bucket follows Prometheus histogram best practices.

76-92: LGTM!

The quota metrics are well-designed:

Labels are documented as bounded for cardinality safety

Counter and Histogram share the same label set for consistent querying

Descriptive metric names follow the existing ls_ prefix convention

tests/unit/metrics/test_recording.py (1)

165-218: LGTM!

Excellent test coverage for the new record_quota_check function:

The fixture cleanly patches the logger for failure assertions

Parametrized test efficiently covers both counter and histogram failure paths

The bounds test verifies that unexpected inputs are normalized to safe fallback values ("customer-123" → "user_id", "timeout" → "error")

src/metrics/recording.py (2)

17-55: LGTM!

Well-designed label bounding:

Constants provide a single source of truth for valid label values

frozenset ensures immutability of the allowed sets

Normalization functions guarantee bounded cardinality for Prometheus metrics

Fallback values ("user_id" and "error") are sensible defaults

155-177: LGTM!

The record_quota_check helper follows the established pattern from other recording functions:

Normalizes inputs before recording to prevent cardinality explosion

Updates both counter and histogram atomically

Catches expected metric update exceptions and logs them without disrupting the request

tisnik

Please ignore docs/demos directory completely - the sources are incorrect there on purpose.

coderabbitai

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 392-397: Replace the hard-coded string "/v1/responses" used for
quota labeling with the shared constant ENDPOINT_PATH_RESPONSES: update the
local variable endpoint_path (and any other occurrences around
_check_response_quota calls, e.g., the similar block at lines matching the
second occurrence) to reference ENDPOINT_PATH_RESPONSES instead of the literal;
ensure _check_response_quota(user_id, endpoint_path) is passed the constant so
metrics use the centralized constant name and stay in sync with constants.py.

In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 578-615: The _check_infer_quota function currently has only a
single-line docstring; update it to a Google-style docstring including
Parameters (Args) for request: Request, auth: AuthTuple, and endpoint_path: str
with brief descriptions, a Returns section stating Optional[str] and what a
returned quota_id or None means, and a Raises section documenting that it may
raise HTTPException (when quota check fails) and other Exception types
(caught/re-raised during quota checking); also mention side effects (it records
quota metrics via recording.record_quota_check and calls check_tokens_available)
so readers understand behavior.

In `@src/authentication/api_key_token.py`:
- Around line 67-77: Change the extract_user_token exception handling in the
api_key_token flow to (1) compare the failure cause against well-defined
constants instead of a fragile hardcoded substring: import and use the cause
constants defined in utils (the same ones used by utils.extract_user_token) when
deciding the reason label, and (2) broaden the try/except to mirror jwk_token.py
by catching any Exception in addition to HTTPException so you can call
record_auth_metrics(AUTH_MOD_APIKEY_TOKEN, "failure", "unexpected_error",
start_time) for non-HTTP errors before re-raising; keep the existing pathway
that maps the specific HTTPException to "missing_header" vs "missing_token"
using the utils constants and then re-raise the original exception.

In `@src/authentication/k8s.py`:
- Around line 393-424: The helper _populate_kube_admin_uid currently mutates the
V1UserInfo parameter (user.uid); change it to return the resolved UID (str)
instead of setting user.uid in-place, and update the call site in __call__ to
assign the returned value to user.uid (or to a local user_uid variable) where it
is consumed; preserve the existing exception handling and metrics inside
_populate_kube_admin_uid but remove any assignment to user, and ensure all
callers (e.g., __call__) use the returned uid and perform any casting/assignment
there.

In `@src/authentication/noop_with_token.py`:
- Around line 72-80: The current except block derives the failure reason by
stringifying exc.detail; instead, derive it from structured data when available:
in the except HTTPException as exc handler, inspect exc.detail—if it's a dict
prefer exc.detail.get("reason") or exc.detail.get("code") (or
exc.detail.get("error")) to set reason, otherwise fall back to the previous
string-based mapping; then call record_auth_metrics(AUTH_MOD_NOOP_WITH_TOKEN,
"failure", reason, start_time). Update the logic around extract_user_token,
HTTPException, and record_auth_metrics to use the structured-datum first
approach so metric labels don't depend on exception wording.

In `@src/metrics/recording.py`:
- Around line 19-64: ALLOWED_AUTH_REASONS is a manually maintained frozenset and
can get out of sync with reason strings used elsewhere; add a clear comment
above ALLOWED_AUTH_REASONS documenting that it must mirror all auth reason
literals and reference AUTH_REASON_UNKNOWN as the fallback, then add a CI/test
assertion (e.g., a unit test or lint rule) that scans the codebase for all
occurrences of auth-reason string literals and fails if any literal is not
present in ALLOWED_AUTH_REASONS (or conversely, build ALLOWED_AUTH_REASONS from
a single enum/source-of-truth like an Action-style enum and update code to
import that enum) so future additions are either auto-included or rejected by
CI.
- Around line 409-417: The current record_quota_check implementation wraps both
metrics updates in one try/except so if the counter succeeds but the histogram
observe() raises, the counter is incremented while the histogram is silently
skipped and the log wrongly implies both failed; change this by splitting into
two independent try/except blocks around
metrics.quota_checks_total.labels(...).inc() and
metrics.quota_check_duration_seconds.labels(...).observe(...) respectively, and
on each except catch the same exceptions but log a metric-specific warning
(e.g., "Failed to increment quota_checks_total" and "Failed to observe
quota_check_duration_seconds") including exc_info=True; update the tests
(test_record_quota_check_updates_metrics_and_logs_errors) to assert the specific
warning messages for each failure path.
- Around line 160-164: The fallback in normalize_quota_type currently returns
QUOTA_TYPE_USER_ID for unknown inputs which conflates genuine user_id quotas
with unexpected types; change the fallback to a dedicated QUOTA_TYPE_UNKNOWN
constant (or the string "unknown") and update the function's docstring, any
tests that assert behavior for unknown quota types, and any callers/tests that
expect the old user_id fallback; ensure you reference ALLOWED_QUOTA_TYPES in the
check and replace the return of QUOTA_TYPE_USER_ID with QUOTA_TYPE_UNKNOWN
throughout the tests and docs so unknown quota types are bucketed explicitly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 18b1cea9-9e4e-49a6-8058-063575246370

📥 Commits

Reviewing files that changed from the base of the PR and between 7668c4b and f71930b.

📒 Files selected for processing (20)

src/app/endpoints/responses.py
src/app/endpoints/rlsapi_v1.py
src/authentication/api_key_token.py
src/authentication/jwk_token.py
src/authentication/k8s.py
src/authentication/noop.py
src/authentication/noop_with_token.py
src/authentication/rh_identity.py
src/authentication/utils.py
src/authorization/middleware.py
src/metrics/__init__.py
src/metrics/recording.py
tests/unit/app/endpoints/test_responses.py
tests/unit/app/endpoints/test_rlsapi_v1.py
tests/unit/authentication/test_jwk_token.py
tests/unit/authentication/test_k8s.py
tests/unit/authentication/test_rh_identity.py
tests/unit/authentication/test_utils.py
tests/unit/authorization/test_middleware.py
tests/unit/metrics/test_recording.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: build-pr
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 2

🧰 Additional context used

📓 Path-based instructions (4)

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

tests/unit/authorization/test_middleware.py
tests/unit/authentication/test_utils.py
tests/unit/app/endpoints/test_responses.py
tests/unit/authentication/test_jwk_token.py
tests/unit/app/endpoints/test_rlsapi_v1.py
tests/unit/authentication/test_rh_identity.py
tests/unit/authentication/test_k8s.py
tests/unit/metrics/test_recording.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

src/authentication/noop.py
src/authentication/api_key_token.py
src/authentication/utils.py
src/authentication/jwk_token.py
src/authentication/noop_with_token.py
src/app/endpoints/responses.py
src/authorization/middleware.py
src/authentication/k8s.py
src/app/endpoints/rlsapi_v1.py
src/metrics/recording.py
src/metrics/__init__.py
src/authentication/rh_identity.py

src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: FastAPI dependencies: Import from fastapi module for APIRouter, HTTPException, Request, status, Depends
Use FastAPI HTTPException with appropriate status codes for API endpoints and handle APIConnectionError from Llama Stack

Files:

src/app/endpoints/responses.py
src/app/endpoints/rlsapi_v1.py

src/**/__init__.py

📄 CodeRabbit inference engine (AGENTS.md)

Package __init__.py files must contain brief package descriptions

Files:

src/metrics/__init__.py

🧠 Learnings (1)

📚 Learning: 2026-04-06T20:18:07.852Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

src/app/endpoints/responses.py
src/app/endpoints/rlsapi_v1.py

🔇 Additional comments (12)

src/metrics/__init__.py (1)

25-179: LGTM — metric declarations, label sets, and bucket constants are well-formed.

All six new metrics carry only bounded, non-PII labels (auth_module, result, reason, action, endpoint, quota_type), bucket selections are appropriately sub-second for fast pre-request paths, and the added Final[Histogram] annotation on llm_inference_duration_seconds is a good tightening.

tests/unit/authorization/test_middleware.py (1)

325-354: LGTM — resilience test is well-structured and covers the right failure mode.

src/authentication/noop.py (1)

54-67: LGTM — timing and metric recording integrate cleanly with the existing flow.

src/authentication/utils.py (1)

44-71: LGTM — record_auth_metrics correctly centralises auth observability with proper error isolation.

tests/unit/app/endpoints/test_responses.py (1)

282-302: LGTM — cleanly exercises the unexpected-error path and asserts all relevant call arguments.

tests/unit/authentication/test_jwk_token.py (2)

507-527: LGTM — time.monotonic patch is correctly scoped to authentication.jwk_token and the start-time assertion properly validates what's forwarded to record_auth_metrics.

559-609: LGTM — invalid_user_id_token fixture and exception-chain assertion cover a previously untested validation path cleanly.

src/authentication/jwk_token.py (2)

146-230: New auth helpers are well-structured — LGTM.

Error handling is thorough and exception ordering is correct:

_get_jwk_set_for_auth correctly orders aiohttp.ClientError → json.JSONDecodeError → JoseError with no masking

_validate_jwk_claims correctly places ExpiredTokenError before JoseError in the hierarchy

_get_required_claim correctly checks both type and emptiness of the claim value

258-317: __call__ metrics wiring is correct — LGTM.

start_time is correctly scoped to each request. Since every helper records metrics before raising, there is no risk of double-recording or metric skipping on any error path. The success metric at line 316 is only reached when all helpers return cleanly.

src/authentication/k8s.py (1)

541-618: __call__ metrics wiring looks correct — LGTM.

Health-probe and metrics skips are correctly guarded, and every error path (missing token, token review error, invalid token, SAR failures, unauthorized) records a bounded metric before raising. The success metric at line 612 is only reached after all checks pass.

tests/unit/metrics/test_recording.py (2)

197-230: test_counter_recorders_update_metrics_and_log_errors — correct two-phase approach — LGTM.

The success/no-warning assertion in phase 1 followed by reset_mock() + side_effect assignment for phase 2 correctly avoids any stale state interaction: reset_mock() by default does not clear side_effect, but since no side_effect is set during phase 1, there is nothing to carry over.

396-428: test_record_quota_check_updates_metrics_and_logs_errors verifies both failure points — LGTM.

The @pytest.mark.parametrize("failing_metric", ["counter", "histogram"]) correctly exercises both the counter-raises and histogram-raises paths. Because record_quota_check uses a single try/except, a counter failure means the histogram is never called, while a histogram failure means the counter succeeded. Both scenarios result in exactly one warning call — which this test correctly asserts.

Signed-off-by: Major Hayden <major@redhat.com>

major mentioned this pull request Apr 29, 2026

RSPEED-2943: add SLO monitoring metrics #1637

Closed

19 tasks

major force-pushed the split/quota-metrics branch from 2221254 to 7668c4b Compare April 30, 2026 01:47

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

major force-pushed the split/quota-metrics branch from 7668c4b to a45bbd9 Compare April 30, 2026 21:19

tisnik requested changes May 4, 2026

View reviewed changes

major force-pushed the split/quota-metrics branch 2 times, most recently from c64fde1 to 17bdda5 Compare May 5, 2026 13:21

major requested a review from tisnik May 5, 2026 19:58

major force-pushed the split/quota-metrics branch 2 times, most recently from 6971abe to 4811b0d Compare May 8, 2026 12:09

This was referenced May 8, 2026

RSPEED-2943: add auth monitoring metrics #1638

Closed

RSPEED-2943: add authorization monitoring metrics #1639

Closed

major changed the title ~~RSPEED-2943: add quota monitoring metrics~~ RSPEED-2943: add observability metrics for auth, authorization, and quota May 8, 2026

major force-pushed the split/quota-metrics branch from 4811b0d to f71930b Compare May 8, 2026 16:21

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

major added 3 commits May 8, 2026 11:52

feat: add auth monitoring metrics

8226f74

Signed-off-by: Major Hayden <major@redhat.com>

feat: add authorization monitoring metrics

1ed8411

Signed-off-by: Major Hayden <major@redhat.com>

feat: add quota monitoring metrics

0eccec0

Signed-off-by: Major Hayden <major@redhat.com>

major force-pushed the split/quota-metrics branch from f71930b to 0eccec0 Compare May 8, 2026 16:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RSPEED-2943: add observability metrics for auth, authorization, and quota#1640

RSPEED-2943: add observability metrics for auth, authorization, and quota#1640
major wants to merge 3 commits intolightspeed-core:mainfrom
major:split/quota-metrics

major commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

coderabbitai Bot left a comment

Uh oh!

tisnik left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

major commented Apr 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

major commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading