Skip to content

fix: Fix SparkRetrievalJob.persist() failing for SparkSource#6410

Open
ntkathole wants to merge 1 commit into
feast-dev:masterfrom
ntkathole:fix_6261
Open

fix: Fix SparkRetrievalJob.persist() failing for SparkSource#6410
ntkathole wants to merge 1 commit into
feast-dev:masterfrom
ntkathole:fix_6261

Conversation

@ntkathole
Copy link
Copy Markdown
Member

What this PR does / why we need it:

Fixes #6261

SparkRetrievalJob.persist() failed in two scenarios:

  1. Remote offline store path: When using type: remote in feature_store.yaml pointing to a Spark offline server, the server calls SavedDatasetStorage.from_data_source(data_source) to convert the registered SparkSource into storage. This raised ValueError because SparkSource was not registered in the _DATA_SOURCE_TO_SAVED_DATASET_STORAGE mapping, and SavedDatasetSparkStorage lacked a from_data_source() method.

  2. Path-based SparkSource: When using a path-based SparkSource (e.g., S3 with parquet), persist() required a table name and raised ValueError if one wasn't provided, even though the storage had a valid path configured.

@ntkathole ntkathole self-assigned this May 16, 2026
@ntkathole ntkathole requested a review from a team as a code owner May 16, 2026 15:58
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SparkRetrievalJob.persist() fails due to missing SparkSource mapping in SavedDatasetStorage.from_data_source

1 participant