fix: Gracefully handle missing speech profiles bucket by neooriginal · Pull Request #6223 · BasedHardware/omi

neooriginal · 2026-04-01T00:19:13Z

Fully tested e2e

This pull request improves the robustness and error handling of all speech profile storage operations in backend/utils/other/storage.py. It introduces a helper function to safely access the speech profiles bucket, adds warnings and explicit error messages when the bucket is not configured, and ensures that all relevant functions gracefully handle missing configuration instead of failing unexpectedly.

Speech profile bucket access and error handling:

Added the _get_speech_profiles_bucket helper function to centralize logic for accessing the speech_profiles_bucket, including warning logging and raising errors when required. All speech profile functions now use this helper to prevent failures when the bucket is not configured.
Updated all speech profile-related functions (such as upload_profile_audio, get_user_has_speech_profile, get_profile_audio_if_exists, etc.) to use _get_speech_profiles_bucket and handle missing bucket configuration gracefully by returning early or raising informative exceptions. [1] [2] [3] [4] [5] [6] [7] [8]
Added explicit error raising in download_speech_profile_bytes and safe return in delete_speech_profile_blob if the speech profiles bucket is not configured, improving clarity and preventing silent failures. [1] [2]

Environment variable handling:

Improved the parsing of the BUCKET_SPEECH_PROFILES environment variable to correctly handle empty or whitespace-only values, ensuring the configuration is robust against misconfiguration.

Make BUCKET_SPEECH_PROFILES handling robust: trim the env var and treat empty value as unset. Introduce _get_speech_profiles_bucket(required=False) that returns a bucket, warns once when the bucket is not configured, and optionally raises if an operation requires the bucket. Update all speech-profile related functions to use the helper and either no-op/return safe defaults when the bucket is absent or raise for required uploads. Also add early checks for download/delete helpers to avoid calling storage APIs when configuration is missing. This prevents crashes and noisy repeated warnings when speech profile storage is not configured.

Copilot

Pull request overview

This PR improves robustness of speech profile storage operations by centralizing access to the speech profiles GCS bucket and handling missing bucket configuration more gracefully across read/write/delete paths.

Changes:

Parse BUCKET_SPEECH_PROFILES more robustly (treat empty/whitespace-only values as unset).
Add _get_speech_profiles_bucket() helper to centralize bucket resolution + one-time warning + optional hard failure.
Update speech-profile-related functions to early-return (or raise) when the bucket is not configured.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

backend/utils/other/storage.py

neooriginal · 2026-04-01T00:22:48Z

@copilot apply changes based on the comments in this thread

greptile-apps · 2026-04-01T00:24:47Z

Greptile Summary

This PR improves robustness for deployments where BUCKET_SPEECH_PROFILES is not configured by centralizing bucket access through a _get_speech_profiles_bucket() helper. Read/list/delete functions now return early with safe empty values, while upload functions raise RuntimeError to prevent silent failures.

Key changes:

New _get_speech_profiles_bucket(required: bool) helper that emits a one-time warning and returns None (or raises RuntimeError) when the bucket env var is absent.
All speech profile functions updated to use the helper instead of calling storage_client.bucket() directly.
Env-var parsing hardened: (os.getenv('BUCKET_SPEECH_PROFILES') or '').strip() or None correctly normalises whitespace-only values to None.
Issue: download_speech_profile_bytes raises BlobNotFound (google.cloud.exceptions.NotFound) when the bucket is unconfigured. Callers like speaker_sample_migration.py specifically catch NotFound to mean "blob is gone — delete Firestore reference", so a missing bucket would silently mark all speech sample paths for Firestore deletion. Using RuntimeError (as the rest of this PR does) would land in the broader except Exception handler instead, preventing unintended data loss.

Confidence Score: 4/5

Safe to merge after fixing the BlobNotFound exception type in download_speech_profile_bytes; the rest of the error-handling refactor is well-structured
One P1 finding: raising BlobNotFound instead of RuntimeError in download_speech_profile_bytes can trigger the "delete from Firestore" branch in the migration path when the bucket is simply unconfigured, risking metadata loss. All other changes are correct and an improvement over the prior crash-on-None behaviour.
backend/utils/other/storage.py — specifically the download_speech_profile_bytes function around line 1138

Important Files Changed

Filename	Overview
backend/utils/other/storage.py	Adds `_get_speech_profiles_bucket` helper and gracefully handles missing `BUCKET_SPEECH_PROFILES` across all speech profile functions; raises `BlobNotFound` in `download_speech_profile_bytes` which conflicts with caller exception handling in the migration path

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Speech profile function called] --> B{_get_speech_profiles_bucket}
    B -->|bucket configured| C[Return storage_client.bucket obj]
    B -->|not configured + required=False| D[Log one-time warning\nReturn None]
    B -->|not configured + required=True| E[Log one-time warning\nRaise RuntimeError]
    C --> F[Proceed with GCS operation]
    D --> G{Function type}
    G -->|get_user_has_speech_profile| H[return False]
    G -->|get_profile_audio_if_exists| I[return None]
    G -->|get_additional_profile_recordings| J[return empty list]
    G -->|delete_* functions| K[return early / no-op]
    E --> L[Caller receives RuntimeError\n→ HTTP 500]

    M[download_speech_profile_bytes] --> N{speech_profiles_bucket set?}
    N -->|No| O[raise BlobNotFound ⚠️\nSame type as 'blob missing']
    N -->|Yes| P[download_blob_bytes]
    O --> Q[Migration catches NotFound\n→ marks for Firestore deletion ⚠️]

_{Reviews (1): Last reviewed commit: "Merge branch 'BasedHardware:main' into m..." | Re-trigger Greptile}

backend/utils/other/storage.py

neooriginal added 2 commits April 1, 2026 02:17

Merge branch 'BasedHardware:main' into main

bff392d

neooriginal requested review from beastoin and Copilot April 1, 2026 00:19

Copilot started reviewing on behalf of neooriginal April 1, 2026 00:20 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

backend/utils/other/storage.py Show resolved Hide resolved

backend/utils/other/storage.py Show resolved Hide resolved

backend/utils/other/storage.py Show resolved Hide resolved

backend/utils/other/storage.py Show resolved Hide resolved

greptile-apps bot reviewed Apr 1, 2026

View reviewed changes

backend/utils/other/storage.py Show resolved Hide resolved

backend/utils/other/storage.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Gracefully handle missing speech profiles bucket#6223

fix: Gracefully handle missing speech profiles bucket#6223
neooriginal wants to merge 2 commits intoBasedHardware:mainfrom
neooriginal:main

neooriginal commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

neooriginal commented Apr 1, 2026

Uh oh!

greptile-apps bot commented Apr 1, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

neooriginal commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

neooriginal commented Apr 1, 2026

Uh oh!

greptile-apps bot commented Apr 1, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants