Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/language/reference/dataset_methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ The Substrait helper surface behind these methods is split by semantic role:
| `group_by` | `def group_by(self, columns: list[ColumnExpr]) -> Self` | Define grouping keys using scalar expressions. |
| `agg` | `def agg(self, measures: list[AggregateMeasure]) -> Self` | Apply aggregate measures over the current relation or current grouping. |
| `generate` | `def generate(self, generator: GeneratorApplication) -> Self` | Apply a relation-shaping generator such as `explode(...)` with explicit output aliases. |
| `with_window_column` | `def with_window_column(self, name: str, application: WindowFunctionApplication) -> Self` | Add or replace one projected column using a placed window function. |
| `order_by` | `def order_by(self, columns: list[ColumnExpr]) -> Self` | Sort rows by scalar expressions or ordering helpers such as `asc(...)` and `desc(...)`. |
| `limit` | `def limit(self, n: int) -> Self` | Cap row count. |
| `explode` | `def explode(self) -> Self` | Compatibility marker for the older EXPLODE extension path. Prefer `generate(explode(...))`. |
Expand Down Expand Up @@ -69,6 +70,7 @@ def enrich(orders: LazyFrame[Order]) -> LazyFrame[Order]:
- `join(...)` is constrained to same-carrier inputs and the boolean join predicate surface shown in the signature.
- `select(...)` preserves projection shape; explicit projection lists are represented today through `with_column(...)` and scalar-expression builders.
- `generate(...)` preserves all input columns and appends generated output aliases. Alias collisions are rejected during planning/lowering.
- `with_window_column(...)` currently supports ranking helpers over explicit window specs and lowers through Substrait window relations. Backend execution support is tracked separately from logical planning support.
- `DataFrame[T]` exposes materialized metadata and preview text; row-level accessors belong to the materialized DataFrame API surface.
- Query-block and scoped DSL surfaces lower into these builder APIs rather than defining separate method semantics.

Expand All @@ -77,3 +79,4 @@ def enrich(orders: LazyFrame[Order]) -> LazyFrame[Order]:
- [Filter builders](builders/filters.md)
- [Aggregate builders](builders/aggregates.md)
- [Projection builders](builders/projections.md)
- [Window functions](functions/windows.md)
4 changes: 3 additions & 1 deletion docs/language/reference/functions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,11 @@ Today the concrete shipped surfaces are documented here:
- [Projection builders](../builders/projections.md)
- [Generator and table-valued functions](generators.md)
- [Nested data functions](nested.md)
- [Window functions](windows.md)

The canonical scalar literal helper is `lit(...)`. Typed literal helpers construct the same scalar-expression representation.

The current registry-backed helper surface is registered in the package-owned function registry. Registry types live in `src/function_registry.incn`, the shared package registry lives in `src/functions/registry.incn`, and concrete public helper entries are produced by `function_registry.add(...)` decorators in individual `src/functions/<family>/<name>.incn` modules. The registry-backed families are references, literals, casts, operators, predicates, conditionals, math, ordering, aggregates, generators, and nested data. Each runtime entry exposes a stable function reference such as `inql.functions.col`, namespace, canonical name, typed lifecycle metadata (`since`, versioned changes, and optional deprecation), RFC 024 policy category, function class, null behavior, alias policy, aggregate modifier policy, and Substrait mapping metadata. Checked function signatures come from the public helper declaration, not from a second hand-written registry signature.
The current registry-backed helper surface is registered in the package-owned function registry. Registry types live in `src/function_registry.incn`, the shared package registry lives in `src/functions/registry.incn`, and concrete public helper entries are produced by `function_registry.add(...)` decorators in individual `src/functions/<family>/<name>.incn` modules. The registry-backed families are references, literals, casts, operators, predicates, conditionals, math, ordering, aggregates, generators, nested data, and windows. Each runtime entry exposes a stable function reference such as `inql.functions.col`, namespace, canonical name, typed lifecycle metadata (`since`, versioned changes, and optional deprecation), RFC 024 policy category, function class, null behavior, alias policy, aggregate modifier policy, and Substrait mapping metadata. Checked function signatures come from the public helper declaration, not from a second hand-written registry signature.

The registry is the source for non-derivable machine facts. Public helper declarations are the source for argument names, argument types, and return types. Docstrings remain human-facing explanation, examples, and parameter intent. The `registry-metadata` check validates the checked API metadata projections produced from public facade aliases, registry decorators, and decorated callable signatures. Runtime registry entries are lazy and process-local: they support helper execution and lowering for loaded helpers, while the complete public catalog comes from checked metadata. This matters for generated docs, diagnostics, Prism lowering, and backend capability checks as the catalog grows.

Expand All @@ -35,6 +36,7 @@ The registered helper surface currently includes:
| `abs(...)`, `ceil(...)`, `floor(...)`, `round(...)` | scalar | registered Substrait math scalar mappings; `round(...)` is currently the single-argument form |
| `array(...)`, `cardinality(...)`, `array_contains(...)`, `arrays_overlap(...)`, `array_position(...)`, `element_at(...)`, `array_sort(...)`, `array_distinct(...)`, `array_except(...)`, `array_intersect(...)`, `array_union(...)`, `array_join(...)`, `array_slice(...)`, `array_reverse(...)`, `array_flatten(...)`, `map_from_arrays(...)`, `map_extract(...)`, `map_contains_key(...)`, `map_keys(...)`, `map_values(...)`, `map_entries(...)`, `named_struct(...)` | scalar | registered nested scalar helpers backed by Substrait extension mappings; `map_contains_key(...)` lowers as a documented predicate rewrite |
| `explode(...)`, `explode_outer(...)`, `posexplode(...)`, `posexplode_outer(...)` | generator | relation-extension mappings consumed by `generate(...)`; positional forms use zero-based positions |
| `window()`, `row_number()`, `rank()`, `dense_rank()` | window | `window()` builds structural window-spec metadata; ranking helpers lower through `ConsistentPartitionWindowRel` when placed with `with_window_column(...)` |
| `asc(...)`, `desc(...)`, `asc_nulls_first(...)`, `asc_nulls_last(...)`, `desc_nulls_first(...)`, `desc_nulls_last(...)` | ordering | structural sort-field helpers consumed by `order_by(...)` and lowered to Substrait `SortRel.sorts` |
| `sum(...)`, `count()`, `count_expr(...)`, `avg(...)`, `min(...)`, `max(...)` | aggregate | registered Substrait extension functions; `count_expr(...)` is a compatibility spelling for future `count(expr)` helper overloading |
| `count_distinct(...)`, `count_if(...)` | aggregate | compatibility helpers that lower through aggregate modifiers over canonical `count` semantics |
Expand Down
33 changes: 33 additions & 0 deletions docs/language/reference/functions/windows.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Window Functions (Reference)

Window helpers are relation-aware. A window function application produces one output value per input row while reading a
partition of related rows. It is not an ordinary scalar expression and must be placed through a projection-like dataset
method.

```incan
from pub::inql import LazyFrame
from pub::inql.functions import col, desc, rank, window
from models import Order

def ranked_orders(orders: LazyFrame[Order]) -> LazyFrame[Order]:
return orders.with_window_column(
"customer_rank",
rank().over(window().partition_by([col("customer_id")]).order_by([desc(col("amount"))])),
)
```

The current foundation slice includes:

| Function | Meaning | Placement |
| --- | --- | --- |
| `window()` | Build an empty window specification. | Structural builder used before `.over(...)`. |
| `row_number()` | Assign a sequential row number inside the ordered window. | Use `.over(window().order_by(...))`, then `with_window_column(...)`. |
| `rank()` | Rank rows with gaps after ties inside the ordered window. | Use `.over(window().order_by(...))`, then `with_window_column(...)`. |
| `dense_rank()` | Rank rows without gaps after ties inside the ordered window. | Use `.over(window().order_by(...))`, then `with_window_column(...)`. |

`WindowSpec.partition_by(...)` replaces the partition expressions. `WindowSpec.order_by(...)` replaces the ordering
expressions. Ranking helpers require explicit ordering; missing ordering is rejected during logical lowering.

`with_window_column(name, application)` preserves input columns and adds or replaces `name` using add-or-replace
projection semantics. Each call lowers one window projection through Substrait `ConsistentPartitionWindowRel` with a
registry-backed function anchor. Backend execution support is separate from this logical planning surface.
2 changes: 1 addition & 1 deletion docs/language/reference/substrait/operator_catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ The following table maps InQL plan capabilities to Substrait logical relations a
| Group by / aggregates | `AggregateRel` with scalar grouping keys and aggregate measures; grouping sets are tracked as a distinct capability below | core |
| Rollup / cube / grouping sets | `AggregateRel` with multiple groupings | core |
| Distinct rows | `AggregateRel` with grouping keys and no measures | core |
| Window / analytic functions | `ProjectRel` with window expressions | core |
| Window / analytic functions | `ConsistentPartitionWindowRel` with partition/order expressions and registered window function anchors | core |
| Sort | `SortRel` | core |
| Limit / offset | `FetchRel` | core |
| Union, intersect, except | `SetRel` with the appropriate set operation enum | core |
Expand Down
1 change: 1 addition & 0 deletions docs/release_notes/v0_1.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Entries will be filled in as work lands (link RFCs and PRs when applicable).
- **Common scalar functions:** The first RFC 018 slice adds registry-backed math helpers for `abs(...)`, `ceil(...)`, `floor(...)`, and single-argument `round(...)`, with Substrait mappings and DataFusion-backed execution coverage.
- **Nested data functions:** RFC 020 adds registry-backed scalar helpers for array construction/access, cardinality, containment, overlap, sorting, set-like operations, joining, slicing, reversing, scalar array flattening, map construction/access, map key/value/entry extraction, map key containment, and named struct construction. These helpers lower through Substrait extension metadata and execute through the DataFusion-backed Session path without introducing generator semantics.
- **Generator functions:** RFC 021 adds registry-backed generator applications for `explode(...)`, `explode_outer(...)`, `posexplode(...)`, and `posexplode_outer(...)`. Generators remain relation-shaping operations applied with `generate(...)`; they preserve input columns, require explicit output aliases, and lower through the current Substrait extension-relation gap encoding.
- **Window functions:** RFC 019 adds the first window-function planning slice with `window()` specs, `row_number()`, `rank()`, `dense_rank()`, and `with_window_column(...)`. Ranking windows require explicit ordering and lower through Substrait `ConsistentPartitionWindowRel`; backend execution support remains a separate adapter capability.
- **Function registry:** RFC 014 adds declaration-site registry decorators for the current public helper surface, including stable function references, checked signature projection, lifecycle metadata, behavior categories, alias policy, Substrait mapping categories, and checked API metadata drift validation.
- **Function extension policy:** RFC 024 policy metadata now distinguishes portable core functions, namespaced extension-only functions, opt-in compatibility aliases, engine-specific functions, and rejected compatibility requests without adding an extension plugin system or backend-owned semantics.
- **Projection:** builder-based `with_column`, `add`, `mul`, and literal expression helpers now lower derived columns through Prism, Substrait, and Session execution.
Expand Down
28 changes: 17 additions & 11 deletions docs/rfcs/019_window_functions.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# InQL RFC 019: Window functions

- **Status:** Draft
- **Status:** In Progress
- **Created:** 2026-04-27
- **Author(s):** Danny Meijer (@dannymeijer)
- **Related:**
Expand All @@ -11,7 +11,7 @@
- InQL RFC 016 (core aggregate functions)
- **Issue:** [InQL #36](https://github.com/dannys-code-corner/InQL/issues/36)
- **RFC PR:** —
- **Written against:** Incan v0.2
- **Written against:** Incan v0.3-era InQL
- **Shipped in:** —

## Summary
Expand Down Expand Up @@ -40,19 +40,18 @@ Window functions also force a clearer relation between row-level expressions and

## Guide-level explanation (how authors think about it)

Authors should be able to rank and compare rows within a partition:
Authors can rank rows within a partition using the builder surface:

```incan
from pub::inql.functions import col, desc, lag, rank, window
from pub::inql.functions import col, desc, rank, window

ranked = (
orders
.with_column("customer_rank", rank().over(window().partition_by([col("customer_id")]).order_by([desc(col("amount"))])))
.with_column("previous_amount", lag(col("amount"), 1).over(window().partition_by([col("customer_id")]).order_by([col("created_at")])))
.with_window_column("customer_rank", rank().over(window().partition_by([col("customer_id")]).order_by([desc(col("amount"))])))
)
```

The exact builder syntax may evolve, but authors should understand that a window function returns a row-level value computed with access to nearby or related rows.
The exact query-block syntax may evolve, but authors should understand that a window function returns a row-level value computed with access to nearby or related rows.

## Reference-level explanation (precise rules)

Expand Down Expand Up @@ -108,10 +107,17 @@ No current InQL function should be reclassified silently as a window function. A
- **Execution / interchange** — Prism and Substrait lowering must preserve window partitioning, ordering, frames, and function identity.
- **Documentation** — docs should clearly separate aggregate functions from window functions.

## Unresolved questions
## Design Decisions

### Resolved

- The first implementation slice exposes explicit `with_window_column(...)` projection-like placement rather than accepting window functions in arbitrary scalar-expression positions.
- Ranking helpers require explicit `order_by(...)` in the window spec. InQL does not invent a silent default ordering.
- The current foundation slice lowers `row_number`, `rank`, and `dense_rank` through `ConsistentPartitionWindowRel` with registry-backed function anchors.
- DataFusion execution for window relations is not claimed until a backend adapter slice explicitly supports the lowered window relation.

### Remaining

- What default frame should InQL use for ordered window functions?
- Should window functions be allowed in `WHERE` or only in projection/order positions?
- Should null treatment use explicit `IGNORE NULLS` / `RESPECT NULLS` style modifiers?

<!-- When every question is resolved, rename this section to **Design Decisions**, group answers under ### Resolved, and remove this comment. -->
- How should `lag`, `lead`, first/last/nth value functions, aggregate-over-window calls, and query-block `OVER (...)` syntax be phased on top of the foundation model?
2 changes: 1 addition & 1 deletion docs/rfcs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ InQL uses its **own** RFC series (starting at 000), independent of the [Incan la
| [016][rfc-016] | Draft | Core aggregate functions | |
| [017][rfc-017] | Draft | Aggregate modifiers | |
| [018][rfc-018] | Draft | Common scalar function catalog | |
| [019][rfc-019] | Draft | Window functions | |
| [019][rfc-019] | In Progress | Window functions | |
| [020][rfc-020] | Draft | Nested data functions | |
| [021][rfc-021] | In Progress | Generator and table-valued functions | |
| [022][rfc-022] | Draft | Semi-structured and format functions | |
Expand Down
Loading
Loading