Skip to content

Add missing array functions#1468

Merged
timsaucer merged 10 commits intoapache:mainfrom
timsaucer:feat/add-missing-array-fns
Apr 6, 2026
Merged

Add missing array functions#1468
timsaucer merged 10 commits intoapache:mainfrom
timsaucer:feat/add-missing-array-fns

Conversation

@timsaucer
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #1452

Rationale for this change

These features are available upstream but not exposed to the python API.

What changes are included in this PR?

Add python API
Add unit tests

Are there any user-facing changes?

Addition only.

timsaucer and others added 6 commits April 3, 2026 13:52
Add new array functions from upstream DataFusion v53: array_any_value,
array_distance, array_max, array_min, array_reverse, arrays_zip,
string_to_array, and gen_series. Add corresponding list_* aliases and
missing list_* aliases for existing functions (list_empty, list_pop_back,
list_pop_front, list_has, list_has_all, list_has_any). Also add
array_contains/list_contains as aliases for array_has, generate_series
as alias for gen_series, and string_to_list as alias for string_to_array.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests cover all functions and aliases added in the previous commit:
array_any_value, array_distance, array_max, array_min, array_reverse,
arrays_zip, string_to_array, gen_series, generate_series,
array_contains, list_contains, list_empty, list_pop_back,
list_pop_front, list_has, list_has_all, list_has_any, and list_*
aliases for the new functions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…comment

- Make null_string optional in string_to_array/string_to_list
- Make step optional in gen_series/generate_series
- Rename second_array to element in array_contains/list_has/list_contains
- Restore # Window Functions section comment in __all__
- Add tests for optional parameter variants

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reduce 26 individual tests to 14 test functions with parametrized
cases, eliminating boilerplate while maintaining full coverage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…block

Merge standalone tests for list_empty, list_pop_back, list_pop_front,
list_has, array_contains, list_contains, list_has_all, and list_has_any
into the existing parametrized test_array_functions block alongside
their array_* counterparts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use the richer multi-row dataset (including all-nulls case) for both
array_any_value and list_any_value via the parametrized test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR exposes several upstream DataFusion array/list scalar functions and aliases through the datafusion-python API, and adds Python unit tests to validate the new bindings and aliases (closing #1452).

Changes:

  • Added Python API exports and wrappers for new array/list functions and list_* aliases (e.g., array_any_value, array_distance, array_max/min, array_reverse, arrays_zip, string_to_array, gen_series, plus list_* aliases).
  • Added Rust pyo3 bindings for newly exposed functions that weren’t previously available in the Python extension module.
  • Expanded unit test coverage to exercise new functions and alias behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
python/datafusion/functions.py Adds new public function exports (__all__) and Python-level wrappers/aliases for array/list functions.
crates/core/src/functions.rs Adds pyo3 bindings for new DataFusion nested functions/UDFs and registers them in the Python extension module.
python/tests/test_functions.py Adds unit tests for new functions and alias coverage in both the general array-function parametrized suite and targeted tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

These aliases match the upstream DataFusion SQL-level aliases, completing
the set of missing array functions from issue apache#1452.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@timsaucer timsaucer marked this pull request as ready for review April 3, 2026 19:14
Copy link
Copy Markdown
Contributor

@ntjohnson1 ntjohnson1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small things otherwise LGTM.

I didn't check the upstream details at all and assumed that since the python api tests looked reasonable, and things compile things are well aligned.

I didn't cross things off one by one but did a general check that this resolved the items listed in the original issue (original issue didn't call out list_overalp but reasonable to include)


Any parts matching the optional ``null_string`` will be replaced with ``NULL``.

Examples:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't demonstrate the optional parameter.

We can probably update the copilot rules that functions should have examples that cover base functionality and extra examples for optional arguments.


Unlike :py:func:`range`, this includes the upper bound.

Examples:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing optional parameter example

Comment on lines +28 to +44

## Python Function Docstrings

Every Python function must include a docstring with usage examples.

- **Examples are required**: Each function needs at least one doctest-style example
demonstrating basic usage.
- **Optional parameters**: If a function has optional parameters, include separate
examples that show usage both without and with the optional arguments. Pass
optional arguments using their keyword name (e.g., `step=dfn.lit(3)`) so readers
can immediately see which parameter is being demonstrated.
- **Reuse input data**: Use the same input data across examples wherever possible.
The examples should demonstrate how different optional arguments change the output
for the same input, making the effect of each option easy to understand.
- **Alias functions**: Functions that are simple aliases (e.g., `list_sort` aliasing
`array_sort`) only need a one-line description and a `See Also` reference to the
primary function. They do not need their own examples.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ntjohnson1 Do you think we should add anything else here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it covers the majority of things to me. The only other piece is the stylistic preference on specifying "Returns" or not. I don't know if there is a definitive position on that.

@timsaucer
Copy link
Copy Markdown
Member Author

I didn't cross things off one by one but did a general check that this resolved the items listed in the original issue (original issue didn't call out list_overalp but reasonable to include)

Yes, and I've updated the skill that generated the original issue because we had both some false positives and false negatives.

@timsaucer
Copy link
Copy Markdown
Member Author

Thanks for the review @ntjohnson1 !

@timsaucer timsaucer merged commit 99bc960 into apache:main Apr 6, 2026
21 checks passed
@timsaucer timsaucer deleted the feat/add-missing-array-fns branch April 6, 2026 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add missing array/list functions and aliases

3 participants