Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/language/reference/functions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,11 @@ Today the concrete shipped surfaces are documented here:
- [Filter builders](../builders/filters.md)
- [Aggregate builders](../builders/aggregates.md)
- [Projection builders](../builders/projections.md)
- [Nested data functions](nested.md)

The canonical scalar literal helper is `lit(...)`. Typed literal helpers construct the same scalar-expression representation.

The current registry-backed helper surface is registered in the package-owned function registry. Registry types live in `src/function_registry.incn`, the shared package registry lives in `src/functions/registry.incn`, and concrete public helper entries are produced by `function_registry.add(...)` decorators in individual `src/functions/<family>/<name>.incn` modules. The registry-backed families are references, literals, casts, operators, predicates, conditionals, math, ordering, and aggregates. Each runtime entry exposes a stable function reference such as `inql.functions.col`, namespace, canonical name, typed lifecycle metadata (`since`, versioned changes, and optional deprecation), RFC 024 policy category, function class, null behavior, alias policy, aggregate modifier policy, and Substrait mapping metadata. Checked function signatures come from the public helper declaration, not from a second hand-written registry signature.
The current registry-backed helper surface is registered in the package-owned function registry. Registry types live in `src/function_registry.incn`, the shared package registry lives in `src/functions/registry.incn`, and concrete public helper entries are produced by `function_registry.add(...)` decorators in individual `src/functions/<family>/<name>.incn` modules. The registry-backed families are references, literals, casts, operators, predicates, conditionals, math, ordering, aggregates, and nested data. Each runtime entry exposes a stable function reference such as `inql.functions.col`, namespace, canonical name, typed lifecycle metadata (`since`, versioned changes, and optional deprecation), RFC 024 policy category, function class, null behavior, alias policy, aggregate modifier policy, and Substrait mapping metadata. Checked function signatures come from the public helper declaration, not from a second hand-written registry signature.

The registry is the source for non-derivable machine facts. Public helper declarations are the source for argument names, argument types, and return types. Docstrings remain human-facing explanation, examples, and parameter intent. The `registry-metadata` check validates the checked API metadata projections produced from public facade aliases, registry decorators, and decorated callable signatures. Runtime registry entries are lazy and process-local: they support helper execution and lowering for loaded helpers, while the complete public catalog comes from checked metadata. This matters for generated docs, diagnostics, Prism lowering, and backend capability checks as the catalog grows.

Expand All @@ -31,6 +32,7 @@ The registered helper surface currently includes:
| `coalesce(...)`, `nullif(...)`, `case_when(...)` | scalar | registered Substrait mappings; `case_when(...)` lowers as built-in `IfThen` |
| `in_(...)`, `between(...)` | scalar | built-in membership/range lowering (`SingularOrList` and `between`) |
| `abs(...)`, `ceil(...)`, `floor(...)`, `round(...)` | scalar | registered Substrait math scalar mappings; `round(...)` is currently the single-argument form |
| `array(...)`, `cardinality(...)`, `array_contains(...)`, `arrays_overlap(...)`, `array_position(...)`, `element_at(...)`, `array_sort(...)`, `array_distinct(...)`, `array_except(...)`, `array_intersect(...)`, `array_union(...)`, `array_join(...)`, `array_slice(...)`, `array_reverse(...)`, `array_flatten(...)`, `map_from_arrays(...)`, `map_extract(...)`, `map_contains_key(...)`, `map_keys(...)`, `map_values(...)`, `map_entries(...)`, `named_struct(...)` | scalar | registered nested scalar helpers backed by Substrait extension mappings; `map_contains_key(...)` lowers as a documented predicate rewrite |
| `asc(...)`, `desc(...)`, `asc_nulls_first(...)`, `asc_nulls_last(...)`, `desc_nulls_first(...)`, `desc_nulls_last(...)` | ordering | structural sort-field helpers consumed by `order_by(...)` and lowered to Substrait `SortRel.sorts` |
| `sum(...)`, `count()`, `count_expr(...)`, `avg(...)`, `min(...)`, `max(...)` | aggregate | registered Substrait extension functions; `count_expr(...)` is a compatibility spelling for future `count(expr)` helper overloading |
| `count_distinct(...)`, `count_if(...)` | aggregate | compatibility helpers that lower through aggregate modifiers over canonical `count` semantics |
Expand Down
58 changes: 58 additions & 0 deletions docs/language/reference/functions/nested.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Nested Data Functions (Reference)

Nested data helpers build and inspect row-level arrays, maps, and structs. They are scalar expressions: every helper returns one value for each input row and does not change relation cardinality.

Generator or table-valued operations such as row-expanding `explode(...)` are separate from this page.

## Arrays

| Function | Meaning |
| --- | --- |
| `array(values)` | Build an array expression from one or more scalar expressions. |
| `cardinality(value)` | Return the size of an array or map. |
| `array_contains(array_expr, value)` | Return whether an array contains a value. |
| `arrays_overlap(left, right)` | Return whether two arrays have any elements in common. |
| `array_position(array_expr, value)` | Return the one-based position of a value. |
| `element_at(array_expr, index)` | Return an array element by one-based index. |
| `array_sort(array_expr)` | Sort one array value. |
| `array_distinct(array_expr)` | Remove duplicate elements from one array value. |
| `array_except(left, right)` | Return elements from `left` that are not in `right`. |
| `array_intersect(left, right)` | Return elements shared by both arrays. |
| `array_union(left, right)` | Return the union of both arrays. |
| `array_join(array_expr, delimiter)` | Join a string array into one string. |
| `array_slice(array_expr, start, stop)` | Return a one-based array slice using the backend adapter's slice contract. |
| `array_reverse(array_expr)` | Reverse one array value. |
| `array_flatten(array_expr)` | Flatten an array-of-arrays into one row-level array value. |

## Maps And Structs

| Function | Meaning |
| --- | --- |
| `map_from_arrays(keys, values)` | Build a map from key and value arrays. |
| `map_extract(map_expr, key)` | Return the values associated with a key. |
| `map_contains_key(map_expr, key)` | Return whether `map_extract(...)` finds at least one value for the key. |
| `map_keys(map_expr)` | Return the map's keys as an array. |
| `map_values(map_expr)` | Return the map's values as an array. |
| `map_entries(map_expr)` | Return map entries. |
| `named_struct(field_names, values)` | Build a struct expression with explicit field names. |

## Example

```incan
from pub::inql.functions import array, array_contains, cardinality, col, element_at, lit

projected = (
events
.with_column("tags", array([lit("paid"), col("source")]))
.with_column("tag_count", cardinality(col("tags")))
.with_column("has_paid_tag", array_contains(col("tags"), lit("paid")))
.with_column("first_tag", element_at(col("tags"), lit(1)))
)
```

## Semantics

- Array indexing is one-based for `element_at(...)`, `array_position(...)`, and `array_slice(...)`.
- `element_at(...)` currently maps to the portable array-element adapter path. Out-of-range behavior follows the current backend adapter's recoverable result until InQL has a richer static/runtime error-policy split for strict versus try-style element access.
- `array_flatten(...)` is intentionally named to avoid colliding with future table-valued or generator `flatten(...)` forms.
- Grouping or ordering by nested values is not documented as portable until equality and ordering semantics for arrays, maps, and structs are specified.
1 change: 1 addition & 0 deletions docs/release_notes/v0_1.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Entries will be filled in as work lands (link RFCs and PRs when applicable).
- **Scalar expressions:** RFC 012 unifies filter predicates, computed projection values, grouping keys, and aggregate inputs around one `ColumnExpr` surface with canonical `lit(...)` and typed literal helpers.
- **Core scalar functions:** RFC 015 adds registry-backed scalar function applications and the first core helper slice for casts, comparisons, boolean logic, null/NaN predicates, arithmetic, conditionals, membership/range predicates, and ordering expressions. Implemented helpers lower to Substrait IR through registry metadata, built-in Rex shapes, or structural sort-field lowering; DataFusion remains the first execution adapter rather than the semantic boundary.
- **Common scalar functions:** The first RFC 018 slice adds registry-backed math helpers for `abs(...)`, `ceil(...)`, `floor(...)`, and single-argument `round(...)`, with Substrait mappings and DataFusion-backed execution coverage.
- **Nested data functions:** RFC 020 adds registry-backed scalar helpers for array construction/access, cardinality, containment, overlap, sorting, set-like operations, joining, slicing, reversing, scalar array flattening, map construction/access, map key/value/entry extraction, map key containment, and named struct construction. These helpers lower through Substrait extension metadata and execute through the DataFusion-backed Session path without introducing generator semantics.
- **Function registry:** RFC 014 adds declaration-site registry decorators for the current public helper surface, including stable function references, checked signature projection, lifecycle metadata, behavior categories, alias policy, Substrait mapping categories, and checked API metadata drift validation.
- **Function extension policy:** RFC 024 policy metadata now distinguishes portable core functions, namespaced extension-only functions, opt-in compatibility aliases, engine-specific functions, and rejected compatibility requests without adding an extension plugin system or backend-owned semantics.
- **Projection:** builder-based `with_column`, `add`, `mul`, and literal expression helpers now lower derived columns through Prism, Substrait, and Session execution.
Expand Down
44 changes: 23 additions & 21 deletions docs/rfcs/020_nested_data_functions.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# InQL RFC 020: Nested data functions

- **Status:** Draft
- **Status:** Implemented
- **Created:** 2026-04-27
- **Author(s):** Danny Meijer (@dannymeijer)
- **Related:**
Expand All @@ -11,12 +11,12 @@
- InQL RFC 021 (generator and table-valued functions)
- **Issue:** [InQL #37](https://github.com/dannys-code-corner/InQL/issues/37)
- **RFC PR:** —
- **Written against:** Incan v0.2
- **Shipped in:**
- **Written against:** Incan v0.3-era InQL
- **Shipped in:** v0.1

## Summary

This RFC defines InQL functions for nested scalar values: arrays, maps, and structs. It covers construction, element access, cardinality, containment, sorting, set-like array operations, map entry access, and higher-order collection functions as a later extension point. Nested functions remain scalar when they produce one value per input row; cardinality-changing operations such as `explode` belong to a separate generator RFC.
This RFC defines InQL functions for nested scalar values: arrays, maps, and structs. It covers construction, element access, cardinality, containment, overlap checks, sorting, set-like array operations, scalar array flattening, map entry access, and higher-order collection functions as a later extension point. Nested functions remain scalar when they produce one value per input row; cardinality-changing operations such as `explode` belong to a separate generator RFC.

## Motivation

Expand All @@ -28,7 +28,7 @@ The split matters. `array_contains(.items, "x")` is a row-level scalar predicate

- Define scalar functions for arrays, maps, and structs.
- Distinguish nested scalar operations from generators.
- Define element access and safe element access.
- Define element access with an explicit one-based indexing policy.
- Define collection size, containment, sorting, and set-like operations.
- Leave lambda-based higher-order functions as a later design decision unless the host language surface is ready.

Expand All @@ -41,16 +41,16 @@ The split matters. `array_contains(.items, "x")` is a row-level scalar predicate

## Guide-level explanation (how authors think about it)

Authors should be able to inspect and manipulate nested values without changing relation cardinality:
Authors can inspect and manipulate nested values without changing relation cardinality:

```incan
from pub::inql.functions import array_contains, cardinality, col, element_at, map_keys
from pub::inql.functions import array_contains, cardinality, col, element_at, lit, map_keys

enriched = (
events
.filter(array_contains(col("tags"), "purchase"))
.filter(array_contains(col("tags"), lit("purchase")))
.with_column("tag_count", cardinality(col("tags")))
.with_column("first_item", element_at(col("items"), 1))
.with_column("first_item", element_at(col("items"), lit(1)))
.with_column("metadata_keys", map_keys(col("metadata")))
)
```
Expand All @@ -59,17 +59,17 @@ If an author wants one output row per item, that is a generator/table-valued ope

## Reference-level explanation (precise rules)

InQL should define array construction with `array`, struct construction with `struct` or `named_struct`, and map construction with `create_map` or an equivalent canonical name.
InQL defines array construction with `array`, struct construction with `named_struct`, and map construction with `map_from_arrays`.

InQL should define `cardinality` as the canonical size function for arrays and maps. Compatibility aliases such as `size`, `array_size`, and `array_length` may resolve to `cardinality` where semantics match.
InQL defines `cardinality` as the canonical size function for arrays and maps. Compatibility aliases such as `size`, `array_size`, and `array_length` may resolve to `cardinality` where semantics match, but the initial implemented surface keeps the canonical spelling.

InQL should define element access functions including `element_at`, `try_element_at`, and `get`. Strict element access must fail or diagnose according to its registry error policy when an index or key is invalid. `try_element_at` must produce the recoverable result defined by its registry entry.
InQL defines array element access with `element_at(array_expr, index)`. Indexes are one-based. Current lowering maps to the portable array-element adapter path and uses the backend adapter's recoverable out-of-range behavior until InQL has a richer static/runtime error-policy split for strict versus try-style element access.

InQL should define array predicates and transforms including `array_contains`, `array_position`, `array_sort`, `array_distinct`, `array_except`, `array_intersect`, `array_union`, `array_join`, `arrays_overlap`, `flatten`, `slice`, and `reverse` where type and null semantics are specified.
InQL defines array predicates and transforms including `array_contains`, `array_position`, `array_sort`, `array_distinct`, `array_except`, `array_intersect`, `array_union`, `array_join`, `arrays_overlap`, `array_flatten`, `array_slice`, and `array_reverse` where type and null semantics are specified by the registry and backend adapter boundary. The scalar array-flattening helper is named `array_flatten` so table-valued or generator `flatten` remains available for RFC 021.

InQL should define map functions including `map_contains_key`, `map_entries`, `map_from_arrays`, `map_from_entries`, `map_keys`, and `map_values`.
InQL defines map functions including `map_contains_key`, `map_entries`, `map_extract`, `map_from_arrays`, `map_keys`, and `map_values`.

InQL should account for object-style warehouse functions such as `object_construct`, `object_construct_keep_null`, `object_delete`, `object_insert`, `object_keys`, and `object_pick`. These should be modeled through typed object/map semantics where possible and through a variant/semi-structured family only when dynamic value semantics are required.
Object-style warehouse functions such as `object_construct`, `object_construct_keep_null`, `object_delete`, `object_insert`, `object_keys`, and `object_pick` are accounted for as semi-structured and dynamic-object concerns. They should be modeled through typed object/map semantics where possible and through the RFC 022 semi-structured family only when dynamic value semantics are required.

Higher-order functions such as `transform`, `filter`, `exists`, `forall`, `aggregate`, `reduce`, `zip_with`, `map_filter`, `transform_keys`, and `transform_values` must not reach Planned status until lambda or equivalent callback semantics are specified for InQL expressions.

Expand All @@ -87,7 +87,7 @@ Index origin, invalid-index behavior, null container behavior, null element beha

### Interaction with other InQL surfaces

Nested functions may appear wherever scalar expressions of their result type are valid. Grouping by nested values may be restricted until equality and ordering semantics for nested values are fully specified.
Nested functions may appear wherever scalar expressions of their result type are valid. Grouping by nested values is not documented as portable until equality and ordering semantics for nested values are fully specified.

### Compatibility / migration

Expand All @@ -113,10 +113,12 @@ No current InQL APIs are expected to break. Nested functions should be additive
- **Execution / interchange** — Prism and Substrait lowering must preserve nested value semantics or diagnose unsupported operations.
- **Documentation** — docs should separate nested scalar operations from generator functions.

## Unresolved questions
## Design Decisions

- Should element access use one-based indexing for SQL/Spark compatibility or zero-based indexing for host-language familiarity?
- What should strict `element_at` do on out-of-range indexes?
- Should grouping and ordering over arrays, maps, and structs be allowed initially?
### Resolved

<!-- When every question is resolved, rename this section to **Design Decisions**, group answers under ### Resolved, and remove this comment. -->
- Element access, array position results, and array slice boundaries are one-based for SQL/Spark compatibility.
- `element_at(...)` uses the current adapter's recoverable array-element behavior for out-of-range indexes. A separate strict/try split is deferred until registry error policy can distinguish static validation failures from runtime recoverable results.
- Grouping and ordering over arrays, maps, and structs are not documented as portable in the initial implementation.
- Scalar `array_flatten(...)` is separate from RFC 021 table-valued or generator flattening.
- Higher-order collection functions remain deferred until InQL expression callback or lambda semantics are specified.
Loading
Loading