[WIP] Feature: Support TRUNCATE TABLE for Iceberg engine#1529
[WIP] Feature: Support TRUNCATE TABLE for Iceberg engine#1529il9ue wants to merge 1 commit intoantalya-26.1from
Conversation
This commit introduces native support for the TRUNCATE TABLE command for the Iceberg database engine. Execution no longer throws a NOT_IMPLEMENTED exception for DataLake engines. To align with Iceberg's architectural standards, this is a metadata-only operation. It creates a new snapshot with an explicitly generated, strictly typed empty Avro manifest list, increments the metadata version, and performs an atomic catalog update. File changes: - StorageObjectStorage.cpp: Remove hardcoded exception, delegate to data_lake_metadata->truncate(). - IDataLakeMetadata.h: Introduce supportsTruncate() and truncate() virtual methods. - IcebergMetadata.h/cpp: Implement the Iceberg-specific metadata truncation, empty manifest list generation via MetadataGenerator, and atomic catalog swap. - tests/integration/: Add PyIceberg integration tests. - tests/queries/0_stateless/: Add SQL stateless tests.
arthurpassos
left a comment
There was a problem hiding this comment.
I haven't implemented anything for Iceberg yet, tho it is in my todo list. I left two small comments for now.
I also looked at the transactional model and it looks ok (assuming I understood it correctly).
My understanding of the Iceberg + catalog transactional model is that updating the catalog is the commit marker, and if it fails, the transaction isn't complete even if the new metadata files were already uploaded. Those become orphan and must be ignored. This also implies an Iceberg table should always be read through a catalog if one exists, otherwise it becomes hard to determine the latest metadata snapshot.
I'll read the code more carefully later.
| { | ||
| throw Exception(ErrorCodes::NOT_IMPLEMENTED, | ||
| "Truncate is not supported for data lake engine"); | ||
| if (isDataLake()) |
There was a problem hiding this comment.
Isn't isDataLake() the same as the above configuration->isDataLakeconfiguration()? see
| virtual bool supportsParallelInsert() const { return false; } | ||
|
|
||
| virtual void modifyFormatSettings(FormatSettings &, const Context &) const {} | ||
| virtual void modifyFormatSettings(FormatSettings & /*format_settings*/, const Context & /*local_context*/) const {} |
There was a problem hiding this comment.
I would not change this method simply to avoid merge conflicts with upstream later on
[Feature] Support TRUNCATE TABLE for Iceberg Engine
Overview
As part of the Antalya release, v26.1 needs to natively support the
TRUNCATE TABLEcommand for the Iceberg database engine. Currently, upstream ClickHouse explicitly rejects this operation. As of PR ClickHouse#91713, executingTRUNCATEdown-casts toStorageObjectStorage, where it immediately throws anErrorCodes::NOT_IMPLEMENTEDexception for Data Lake engines.To support standard analytics workflows and testing pipelines without requiring users to
DROPand recreate tables (which breaks catalog bindings), implementing a metadata-only truncation is essential.Proposed Architecture
Unlike a standard MergeTree truncation that physically drops parts from the local disk, Iceberg truncation must be entirely logical. The implementation will leave physical file garbage collection to standard Iceberg maintenance operations and focus strictly on metadata manipulation.
Core Workflow:
StorageObjectStorage::truncateto checkdata_lake_metadata->supportsTruncate().v<N+1>.metadata.json).snapshotsarray, and update thesnapshot-logandcurrent-snapshot-id.ICataloginterface (e.g., REST Catalog) to point the table to the newly generated metadata JSON.Implementation Details
The required changes span the following internal abstractions:
src/Storages/ObjectStorage/StorageObjectStorage.h/cpp: Overridetruncateand remove the hardcodedthrow Exception. Delegate toIDataLakeMetadata.src/Storages/ObjectStorage/DataLakes/IDataLakeMetadata.h: IntroducesupportsTruncate()andtruncate(ContextPtr, ICatalog)virtual methods.src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.h/cpp: Implement the core truncation logic. Must safely obtain anIObjectStoragewrite buffer viacontext->getWriteSettings()to serialize the empty Avro file before committing the JSON metadata.Acceptance Criteria