Skip to content

API: Add ReadRestrictions Actions #16198

Open
singhpk234 wants to merge 8 commits into
apache:mainfrom
singhpk234:feature/add-action-to-core
Open

API: Add ReadRestrictions Actions #16198
singhpk234 wants to merge 8 commits into
apache:mainfrom
singhpk234:feature/add-action-to-core

Conversation

@singhpk234
Copy link
Copy Markdown
Contributor

@singhpk234 singhpk234 commented May 3, 2026

About the change

Adds the org.apache.iceberg.functions package - the engine-side Java API for
column projection functions defined by the ReadRestrictions spec. Each
function takes a column value, applies a masking or transformation operation,
and returns the result. The REST spec uses "action" as the wire-format
discriminator; this package uses IcebergFunction at the Java layer to avoid
collision with org.apache.iceberg.actions.Action.

Design

  • IcebergFunction<S, T> — base interface with name(), fieldId(), bind(Type),
    canBind(Type). Generic over input (S) and output (T) types to support future
    type-changing functions.
  • SaltedFunction<S, T> — sub-interface adding bind(Type, byte[] salt) for
    per-query salted operations. Only Sha256QueryLocal implements this.
  • BaseFunction<S, T> — abstract inner class holding field-id, with strict
    getClass() equals.
  • UnknownFunction — forward-compatibility: parses unrecognized function names
    without crashing, fails closed at bind time to prevent data leaks.
  • All functions are Serializable for distributed engine use (Spark, Flink)

@github-actions github-actions Bot added the core label May 3, 2026
@singhpk234 singhpk234 marked this pull request as ready for review May 5, 2026 19:56
@github-actions github-actions Bot added the API label May 11, 2026
Comment thread api/src/main/java/org/apache/iceberg/functions/Action.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/functions/ApplyExpression.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/functions/Action.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/functions/Action.java Outdated
Introduces the Action abstraction from the ReadRestrictions spec
(PR apache#13879) as a standalone addition, covering the 10 predefined
column projection actions:

- MaskAlphanum, ShowFirst4, ShowLast4, MaskToFixedValue,
  ReplaceWithNull, TruncateToYear, TruncateToMonth, Sha256Global,
  Sha256QueryLocal, ApplyExpression
- Plus an Unknown forward-compat carrier for unrecognized server-side
  action types so the client fails closed at bind time

Each action carries a field id plus any action-specific payload and
owns its own bind(Type) returning a SerializableFunction. This mirrors
the Transform<S, T> pattern used in partition transforms. Null input
produces null output for every action, per spec.

No REST wire-format plumbing or engine integration yet; those follow
in separate PRs.
Mirrors the org.apache.iceberg.transforms.* layout: one file per concrete
action instead of 10 nested classes in Action.java. Action.java now holds
only the interface plus the BaseAction abstract (fieldId carrier).

Algorithmic simplifications along the way:
- ShowLast4 rewritten single-pass with a 4-offset ring buffer
  (was O(2n) from a separate codePointCount + mask loop).
- Sha256 4 per-type subclasses (Sha256String/Integer/Long/Binary)
  collapsed into one Sha256Fn with a Codec enum carrying the update/
  encode pair.
- Truncate 3 per-storage subclasses (Date/Timestamp/TimestampNano)
  collapsed into one TruncateTemporalFn with Unit + Storage enums.
- mapCodePoint moved from Actions to MaskAlphanum.maskCodePoint;
  matches the spec phrasing "redacts the remainder using mask-alphanum
  rules" that ShowFirst4/ShowLast4 reference.

UnknownAction.bind() now throws IllegalArgumentException for consistency
with the other actions' bind-time type rejection.

35 TestActions tests pass, spotless + revapi clean.
Rename Sha256Fn -> Sha256 and TruncateTemporalFn -> TruncateTemporal.
Both are package-private helpers that don't need the Fn abbreviation
to disambiguate from the public action classes (Sha256Global /
Sha256QueryLocal and TruncateToYear / TruncateToMonth respectively).
Iceberg checkstyle requires local variable names with at least 2
characters (pattern ^[a-z][a-zA-Z0-9]++$). The refactor introduced
five single-char locals that the CI build-checks job rejected:

- Sha256.java: int v / long v -> intVal / longVal
- ShowLast4.java: int o -> maskOffset
- TruncateTemporal.java: LocalDate d / LocalDateTime d -> date / truncated
Drops three explanatory paragraphs from the Action interface, Sha256
helper, and TruncateTemporal helper that restated what the signatures
already convey.

Adds equals/hashCode/toString to BaseAction (compares actionType +
fieldId) so concrete actions behave as value objects out of the box.
ApplyExpression overrides to include its Expression payload in the
comparison; UnknownAction inherits the base since its actionType()
already returns the raw discriminator string.
Per spec, all predefined actions preserve the input column type
("For all predefined actions except apply-expression, the output type
matches the input column type"). Only apply-expression could differ,
and that path currently throws since Iceberg Expressions are
boolean-only.

Drop the source/target distinction: Action<T> with bind(Type) returning
SerializableFunction<T, T>. SerializableFunction itself stays two-param
since it's shared with Transforms which legitimately use S != T (Bucket,
Days, etc.).
Mirrors the placement of org.apache.iceberg.transforms.Transform: pure
interface plus concrete value-object implementations belong in api,
not core. Action's only dependencies (Type, DateTimeUtil,
SerializableFunction, Expression) all live in api already, so no
module-boundary issues.

No code changes — just file relocation:
core/src/main/java/org/apache/iceberg/functions/ ->
api/src/main/java/org/apache/iceberg/functions/

PR title "API:" prefix now matches the actual module location.
Rename Action to IcebergFunction to avoid collision with
org.apache.iceberg.actions.Action. Make the interface fully generic
with separate input/output type parameters.

- Extract SaltedFunction<S, T> sub-interface for Sha256QueryLocal
- Fix BaseFunction.equals to use strict class equivalence
- Remove ApplyExpression (no implementation exists yet)
- Rename actionType() to name()
@sfc-gh-prsingh sfc-gh-prsingh force-pushed the feature/add-action-to-core branch from b39898e to c243020 Compare May 22, 2026 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants