fix: Fix SparkRetrievalJob.persist() failing for SparkSource by ntkathole · Pull Request #6410 · feast-dev/feast

ntkathole · 2026-05-16T15:58:45Z

What this PR does / why we need it:

SparkRetrievalJob.persist() failed in two scenarios:

Remote offline store path: When using type: remote in feature_store.yaml pointing to a Spark offline server, the server calls SavedDatasetStorage.from_data_source(data_source) to convert the registered SparkSource into storage. This raised ValueError because SparkSource was not registered in the _DATA_SOURCE_TO_SAVED_DATASET_STORAGE mapping, and SavedDatasetSparkStorage lacked a from_data_source() method.
Path-based SparkSource: When using a path-based SparkSource (e.g., S3 with parquet), persist() required a table name and raised ValueError if one wasn't provided, even though the storage had a valid path configured.

Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>

ntkathole self-assigned this May 16, 2026

ntkathole requested a review from a team as a code owner May 16, 2026 15:58

fix: Fix SparkRetrievalJob.persist() failing for SparkSource

49e3ca0

Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>

ntkathole force-pushed the fix_6261 branch from 89cf1a4 to 49e3ca0 Compare May 16, 2026 16:01

ntkathole added the ok-to-test label May 16, 2026