Skip to content

feat: substrait builder extra api#773

Open
mbwhite wants to merge 5 commits intosubstrait-io:mainfrom
mbwhite:additional-substrait-builder
Open

feat: substrait builder extra api#773
mbwhite wants to merge 5 commits intosubstrait-io:mainfrom
mbwhite:additional-substrait-builder

Conversation

@mbwhite
Copy link
Copy Markdown
Contributor

@mbwhite mbwhite commented Mar 23, 2026

Follow on to pr #767 to fill some missing gaps in the substrait builder. We've recently used the Substrait Builder and found that there some variants and missing gaps in the APIs. We'd subclassed it to add these, so I'm now wanting to push these to the main codebase.

There wasn't a unit test already so I've added a basic one here - which I appreciate will is quite large.

@github-actions
Copy link
Copy Markdown

ACTION NEEDED

Substrait follows the Conventional Commits
specification
for
release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@mbwhite mbwhite force-pushed the additional-substrait-builder branch from bd62615 to d0cc543 Compare March 23, 2026 11:58
@mbwhite mbwhite changed the title Additional substrait builder feat: substrait builder extra api Mar 23, 2026
@mbwhite mbwhite force-pushed the additional-substrait-builder branch from d0cc543 to 6c079f0 Compare March 23, 2026 12:50
Signed-off-by: MBWhite <whitemat@uk.ibm.com>
@mbwhite mbwhite force-pushed the additional-substrait-builder branch 2 times, most recently from 8618365 to 434df46 Compare March 24, 2026 10:29
@mbwhite mbwhite marked this pull request as ready for review March 24, 2026 10:34
Signed-off-by: MBWhite <whitemat@uk.ibm.com>
@mbwhite mbwhite force-pushed the additional-substrait-builder branch from 434df46 to 83fa865 Compare March 24, 2026 11:50
@mbwhite mbwhite force-pushed the additional-substrait-builder branch 2 times, most recently from 56aaa2a to 5bdace0 Compare March 24, 2026 13:09
Signed-off-by: MBWhite <whitemat@uk.ibm.com>
@mbwhite mbwhite force-pushed the additional-substrait-builder branch from 5bdace0 to ef5e526 Compare March 24, 2026 13:10
Copy link
Copy Markdown
Member

@benbellick benbellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments. Thanks!

@mbwhite
Copy link
Copy Markdown
Contributor Author

mbwhite commented Mar 25, 2026

@benbellick updated as suggested :-)

@mbwhite mbwhite requested a review from benbellick March 25, 2026 11:52
@mbwhite mbwhite force-pushed the additional-substrait-builder branch from efabd93 to fe68836 Compare March 25, 2026 13:15
@mbwhite mbwhite requested a review from benbellick March 26, 2026 09:28
Signed-off-by: MBWhite <whitemat@uk.ibm.com>
@mbwhite mbwhite force-pushed the additional-substrait-builder branch from fe68836 to 1ca14b2 Compare March 27, 2026 11:51
@mbwhite mbwhite requested a review from benbellick April 2, 2026 10:27
@nielspardon
Copy link
Copy Markdown
Member

@benbellick since you started this review, do you have time for another look?

Copy link
Copy Markdown
Member

@benbellick benbellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more comments. Thanks!

* @param expression the boolean expression to negate
* @return a scalar function invocation representing the logical NOT of the input expression
*/
public Expression not(Expression expression) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly relevant issue I just found FYI substrait-io/substrait#1041

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also and and or above return Expression.ScalarFunctionInvocation, but this one returns the more general Expression. Can we return Expression.ScalarFunctionInvocation to be consistent?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the nullability should be assumed to be MIRROR rather than DECLARED_OUTPUT to be consistent with and and or above. So I think we should check nullability here and only mark the output as nullable if the input is.

public Expression scalarFn(
String urn,
String key,
Type returnType,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud here, but does this builder validate that the returnType is actually possible for this function?

}

/**
* Creates a scalar function invocation with function options.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says

with function options

but only the above method accepts options AFAICT via optionsList, and this one does not. Maybe we just get rid of this whole one to keep things simple?

* @return a new {@link Plan}
*/
public Plan.Root root(final Rel input, final List<String> names) {
return Plan.Root.builder().input(input).names(names).build();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also curious if this one validates that the length of names matches the schema of input.

final Expression.I32Literal input = builder.i32(1);
final Expression cast = builder.cast(input, R.I64);
assertNotNull(cast);
assertTrue(cast instanceof Expression.Cast);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to actually check that the type of this expression is now R.I64.

DefaultExtensionCatalog.FUNCTIONS_COMPARISON,
"is_null:any",
TypeCreator.REQUIRED.BOOLEAN,
args,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could just simplify this to the below and drop the above construction of an ArrayList

Suggested change
args,
List.of(expression),

* @param input the root relational expression of the query plan
* @param names the ordered list of output column names corresponding to the input relation's
* output schema
* @return a new {@link Plan}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return a new {@link Plan}
* @return a new {@link Plan.root}


final AggregateFunctionInvocation afi1 =
builder.aggregateFn(
"extension:io.substrait:functions_aggregate_generic",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
"extension:io.substrait:functions_aggregate_generic",
DefaultExtensionCatalog.FUNCTIONS_AGGREGATE_GENERIC,

Fetch limit = builder.limit(10, scan);
assertEquals(10, limit.getCount().getAsLong());
assertEquals(0, limit.getOffset());
limit = builder.limit(10, Remap.of(Arrays.asList(new Integer[] {0, 1})), scan);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
limit = builder.limit(10, Remap.of(Arrays.asList(new Integer[] {0, 1})), scan);
limit = builder.limit(10, Remap.of(List.of(0, 1)), scan);

* @param value value to create
* @return i64 instance
*/
public Expression.I64Literal i64(long value) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: all of the other existing methods use v for the parameter name. Should we do the same for all of the new ones to be consistent?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants