Skip to content

Spark: time-travel filter fails on renamed columns in BaseDistributedDataScan#16523

Open
lilei1128 wants to merge 1 commit into
apache:mainfrom
lilei1128:fix-16510
Open

Spark: time-travel filter fails on renamed columns in BaseDistributedDataScan#16523
lilei1128 wants to merge 1 commit into
apache:mainfrom
lilei1128:fix-16510

Conversation

@lilei1128
Copy link
Copy Markdown
Contributor

When performing time-travel reads with a filter on a column that was
subsequently renamed, a ValidationException was thrown:
"Cannot find field 'col' in struct: struct<..., 2: value: ...>"

Root cause: BaseDistributedDataScan.specCache() called table().specs()
directly, which returns partition specs bound to the current table schema.
When the filter expression was projected via Projections.inclusive() in
newManifestEvaluator(), it tried to resolve column names against the
current schema instead of the snapshot schema, causing the failure.

Fix:

  • Override useSnapshotSchema() to return true in BaseDistributedDataScan,
    consistent with DataTableScan
  • Change specCache() to use specs() instead of table().specs(), so
    partition specs are re-bound to the snapshot schema during time-travel
  • Also fix SparkTable.newScanBuilder() to resolve the scan schema against
    the requested snapshot when snapshot-id is passed via options, as a
    defensive fix for cases where SparkTable is not constructed with a
    snapshotId field

@lilei1128
Copy link
Copy Markdown
Contributor Author

close #16510

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant