Conversation
2cf9a74 to
9b822d0
Compare
9b822d0 to
4c4ffff
Compare
wgtmac
left a comment
There was a problem hiding this comment.
Note: This review was generated by Gemini.
src/iceberg/table_scan.cc
Outdated
| for (const auto& snapshot : append_snapshots) { | ||
| SnapshotCache snapshot_cache(snapshot.get()); | ||
| ICEBERG_ASSIGN_OR_RAISE(auto manifests, snapshot_cache.DataManifests(io_)); | ||
| std::ranges::copy_if(manifests, std::back_inserter(data_manifests), |
There was a problem hiding this comment.
In Java's BaseIncrementalAppendScan.java, manifests from append snapshots are collected into a Set<ManifestFile> to prevent duplicate processing if multiple snapshots reference the same manifest. The C++ implementation pushes directly into std::vector<ManifestFile> data_manifests, which will cause duplicate processing if a manifest is retained across multiple append snapshots since ManifestGroup::Make does not deduplicate them internally.
Suggestion: Deduplicate data_manifests (e.g., by tracking seen manifest_paths using a std::unordered_set<std::string>) before passing them to ManifestGroup::Make.
src/iceberg/table_scan.h
Outdated
|
|
||
| /// \brief Returns true if this scan is a current lineage scan, which means it does not | ||
| /// specify from/to snapshot IDs. | ||
| bool IsScanCurrentLineage() const; |
There was a problem hiding this comment.
Can we just move these three new functions to the anonymous namespace in the table_scan.cc?
There was a problem hiding this comment.
I made these functions members because I don't want to keep writing context.xxx inside the method bodies
src/iceberg/table_scan.h
Outdated
| Result<std::vector<std::shared_ptr<ScanTaskType>>> | ||
| IncrementalScan<ScanTaskType>::PlanFiles() const { | ||
| if (context_.IsScanCurrentLineage()) { | ||
| ICEBERG_ASSIGN_OR_RAISE(auto current_snapshot, metadata_->Snapshot()); |
There was a problem hiding this comment.
Do we need to consider if branch has been specified?
There was a problem hiding this comment.
If current_snapshot is nullptr, it indicates that the current table is empty, so we can simply return an empty task list. Don't need to worry about the branch.
No description provided.