Skip to content

fix: Gracefully handle missing speech profiles bucket#6223

Open
neooriginal wants to merge 2 commits intoBasedHardware:mainfrom
neooriginal:main
Open

fix: Gracefully handle missing speech profiles bucket#6223
neooriginal wants to merge 2 commits intoBasedHardware:mainfrom
neooriginal:main

Conversation

@neooriginal
Copy link
Copy Markdown
Collaborator

Fully tested e2e

Bildschirmfoto 2026-04-01 um 02 18 58

This pull request improves the robustness and error handling of all speech profile storage operations in backend/utils/other/storage.py. It introduces a helper function to safely access the speech profiles bucket, adds warnings and explicit error messages when the bucket is not configured, and ensures that all relevant functions gracefully handle missing configuration instead of failing unexpectedly.

Speech profile bucket access and error handling:

  • Added the _get_speech_profiles_bucket helper function to centralize logic for accessing the speech_profiles_bucket, including warning logging and raising errors when required. All speech profile functions now use this helper to prevent failures when the bucket is not configured.
  • Updated all speech profile-related functions (such as upload_profile_audio, get_user_has_speech_profile, get_profile_audio_if_exists, etc.) to use _get_speech_profiles_bucket and handle missing bucket configuration gracefully by returning early or raising informative exceptions. [1] [2] [3] [4] [5] [6] [7] [8]
  • Added explicit error raising in download_speech_profile_bytes and safe return in delete_speech_profile_blob if the speech profiles bucket is not configured, improving clarity and preventing silent failures. [1] [2]

Environment variable handling:

  • Improved the parsing of the BUCKET_SPEECH_PROFILES environment variable to correctly handle empty or whitespace-only values, ensuring the configuration is robust against misconfiguration.

Make BUCKET_SPEECH_PROFILES handling robust: trim the env var and treat empty value as unset. Introduce _get_speech_profiles_bucket(required=False) that returns a bucket, warns once when the bucket is not configured, and optionally raises if an operation requires the bucket. Update all speech-profile related functions to use the helper and either no-op/return safe defaults when the bucket is absent or raise for required uploads. Also add early checks for download/delete helpers to avoid calling storage APIs when configuration is missing. This prevents crashes and noisy repeated warnings when speech profile storage is not configured.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves robustness of speech profile storage operations by centralizing access to the speech profiles GCS bucket and handling missing bucket configuration more gracefully across read/write/delete paths.

Changes:

  • Parse BUCKET_SPEECH_PROFILES more robustly (treat empty/whitespace-only values as unset).
  • Add _get_speech_profiles_bucket() helper to centralize bucket resolution + one-time warning + optional hard failure.
  • Update speech-profile-related functions to early-return (or raise) when the bucket is not configured.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@neooriginal
Copy link
Copy Markdown
Collaborator Author

@copilot apply changes based on the comments in this thread

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 1, 2026

Greptile Summary

This PR improves robustness for deployments where BUCKET_SPEECH_PROFILES is not configured by centralizing bucket access through a _get_speech_profiles_bucket() helper. Read/list/delete functions now return early with safe empty values, while upload functions raise RuntimeError to prevent silent failures.

Key changes:

  • New _get_speech_profiles_bucket(required: bool) helper that emits a one-time warning and returns None (or raises RuntimeError) when the bucket env var is absent.
  • All speech profile functions updated to use the helper instead of calling storage_client.bucket() directly.
  • Env-var parsing hardened: (os.getenv('BUCKET_SPEECH_PROFILES') or '').strip() or None correctly normalises whitespace-only values to None.
  • Issue: download_speech_profile_bytes raises BlobNotFound (google.cloud.exceptions.NotFound) when the bucket is unconfigured. Callers like speaker_sample_migration.py specifically catch NotFound to mean "blob is gone — delete Firestore reference", so a missing bucket would silently mark all speech sample paths for Firestore deletion. Using RuntimeError (as the rest of this PR does) would land in the broader except Exception handler instead, preventing unintended data loss.

Confidence Score: 4/5

  • Safe to merge after fixing the BlobNotFound exception type in download_speech_profile_bytes; the rest of the error-handling refactor is well-structured
  • One P1 finding: raising BlobNotFound instead of RuntimeError in download_speech_profile_bytes can trigger the "delete from Firestore" branch in the migration path when the bucket is simply unconfigured, risking metadata loss. All other changes are correct and an improvement over the prior crash-on-None behaviour.
  • backend/utils/other/storage.py — specifically the download_speech_profile_bytes function around line 1138

Important Files Changed

Filename Overview
backend/utils/other/storage.py Adds _get_speech_profiles_bucket helper and gracefully handles missing BUCKET_SPEECH_PROFILES across all speech profile functions; raises BlobNotFound in download_speech_profile_bytes which conflicts with caller exception handling in the migration path

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Speech profile function called] --> B{_get_speech_profiles_bucket}
    B -->|bucket configured| C[Return storage_client.bucket obj]
    B -->|not configured + required=False| D[Log one-time warning\nReturn None]
    B -->|not configured + required=True| E[Log one-time warning\nRaise RuntimeError]
    C --> F[Proceed with GCS operation]
    D --> G{Function type}
    G -->|get_user_has_speech_profile| H[return False]
    G -->|get_profile_audio_if_exists| I[return None]
    G -->|get_additional_profile_recordings| J[return empty list]
    G -->|delete_* functions| K[return early / no-op]
    E --> L[Caller receives RuntimeError\n→ HTTP 500]

    M[download_speech_profile_bytes] --> N{speech_profiles_bucket set?}
    N -->|No| O[raise BlobNotFound ⚠️\nSame type as 'blob missing']
    N -->|Yes| P[download_blob_bytes]
    O --> Q[Migration catches NotFound\n→ marks for Firestore deletion ⚠️]
Loading

Reviews (1): Last reviewed commit: "Merge branch 'BasedHardware:main' into m..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants