Split maven artifacts into component libraries by headius · Pull Request #4031 · ruby/prism

headius · 2026-03-23T19:31:04Z

This PR will move all Java-related components under the java/ dir and begin splitting up the aggregate prism-parser artifact into component libraries:

prism-parser-api, under java/api is the Loader API and support classes.
prism-parser-native, under java/native is the native JNI binding for the Prism shared library.
prism-parser-wasm under java/wasm is the WASM-based binding of the Prism library.

There will be at least two more components added to this list, either as part of this PR or as separate work.

prism-parser-complete, a build of the WASM binding that includes non-semantic information like comments and line numbers.

This will likely require some enhancements in the API module to support these other elements. This will also be the basis for the JRuby version of the Ruby-based Prism parser API.

prism-parser-native-<platform> or similar will handle shipping pre-built native binaries for the JNI backend.

* The Loader API lives under java/api. * The current native endpoint for the Prism shared library lives under java/native. * The WASM build and binding lives under java/wasm. The libraries will be released together but can be developed and snapshotted independently. Users that copy the source from the previous java/ will want to grab both java/api/src/main/java and java/native/src/main/java contents.

This uses the JRuby rake-maven-plugin to generate the templates as part of the Maven build. The generated output for the Java templates will be under java/api/target/generated-sources/java.

eregon · 2026-03-23T20:16:02Z

prism-parser-api will need to be templated with & without non-semantic fields, how are you thinking to handle that?
Different artifact names for the same pom.xml seems the easiest.

headius · 2026-03-23T20:17:16Z

@eregon Perhaps a different artifact or perhaps a sibling API within the same artifact. I have not decided what would be cleanest.

It seems like the ideal case would be that the API just includes empty elements for non-semantic elements when those are not enabled by the backend.

eregon · 2026-03-23T20:17:47Z

For example modifying Loader to handle both "all fields" and "only semantic fields" will not fly, because the Java fields of the nodes need to reflect the set of fields chosen (or if always including all fields then it's a huge memory overhead).

eregon · 2026-03-23T20:20:31Z

A different artifact seems the cleanest to me, also because each of these 2 artifacts would then depend on either prism-parser-wasm or prism-parser-complete but not both.
And we wouldn't need to make compromises on flexibility during templating time like identifier types, see #4009 (comment)

eregon · 2026-03-23T20:23:24Z

prism-parser-native
prism-parser-native-<platform>
prism-parser-wasm
prism-parser-complete

Regarding naming, prism-parser-complete seems too unclear (e.g. is it like jruby-complete and effectively includes all jruby-parser-*? No it isn't), prism-parser-wasm-complete would be clearer.

But, since all artifacts are either "all fields" or "only-semantic fields" I think separate prefixes would be best.
Note we'll also need "native with all fields", unless we're OK to always use the slower WASM for that case.

So I'd suggest this:

prism-semantic-parser-{api,native,native-<platform>,wasm}
prism-full-parser-{api,native,native-<platform>,wasm}

eregon · 2026-03-23T20:28:43Z

In terms of dependencies would there be any between those artifacts? Or rather the users would pick:

An -api artifact
And choose one of -{native,native-<platform>,wasm}

headius · 2026-03-23T20:33:36Z

Regarding naming, prism-parser-complete seems too unclear

We can bikeshed the naming later.

In terms of dependencies

Everyone on the Java side of things will use api and select one artifact that provides a parser backend. Those backends might be configured via SPI, but I haven't decided if that's worth it (really only useful if multiple backends will be used in a single JVM app, like JRuby's fallback on WASM, but that can be configured at a higher level).

headius · 2026-03-23T20:35:26Z

Latest patch cleans up some path locations in the CI builds.

The "build java*" jobs use make which uses the rake-compiler JavaExtensionTask to build, but that plugin does not have a way to fetch Maven dependencies needed like JUnit and Chicory. The make build for the Java API should not be using the extension plugin (which is designed for building JRuby extensions that have no other dependencies) and instead should use the Maven build, but I'm still sorting out where that's all done.

headius · 2026-03-23T21:02:09Z

Bit of a chicken and egg issue trying to get the generated parts of the Java build in place:

generate-sources phase for the java/api build generates the .java sources, but also generates the .c sources.
generate-resources phase for java/wasm needs the .wasm to be already built, which needs make to run.
make needs the generated .c sources.

I'll play with different configurations to figure out an appropriate sequence and hopefully get all those steps to work in both the maven builds and the rake builds.

eregon · 2026-03-23T21:34:54Z

We can bikeshed the naming later.

#4031 (comment) is not just about the name of that one but the general organization, I think that makes sense to discuss now. It doesn't need to block this PR, but that's an important discussion and architecture point to decide early on.

headius · 2026-03-23T21:46:24Z

is it like jruby-complete

The jruby-complete naming was done a long time ago. I probably wouldn't use that naming now.

So I'd suggest this

That's eight artifacts. If we also are publishing separate artifacts for every identifier form, that would be 24 artifacts.

I think that's overkill.

There are better ways to design the API to have optional fields in the AST, such as by having a set of full non-semantic AST subclass nodes that can be created by the full non-semantic parser build. The AST would look the same unless you require the non-semantic data, and if you don't it's the simpler API that doesn't include those fields.

eregon · 2026-03-23T21:56:06Z

That's eight artifacts.

Yes, and of course they can be created as needs arise, for now a subset is fine of course, but the general naming needs to take the fields distinction into account somehow.

If we also are publishing separate artifacts for every identifier form, that would be 24 artifacts.

No, semantic implies RubySymbol (or byte[] if you prefer that for JRuby), and non-semantic implies either j.l.String or byte[] (whatever is best for general API users, still unclear at this stage), so it's still just 8.
See #4009 (comment)

There's a lot of chicken-and-egg issues with trying to have Maven do all the steps for the Java artifact builds right now, so back off and require that templates (and WASM) are generated before the Maven builds of the relevant modules.

eregon · 2026-03-24T09:04:34Z

templates/template.rb

+      "java/api/target/generated-sources/java/org/ruby_lang/prism/Loader.java",
+      "java/api/target/generated-sources/java/org/ruby_lang/prism/Nodes.java",
+      "java/api/target/generated-sources/java/org/ruby_lang/prism/AbstractNodeVisitor.java",


Is it necessary to put them under target/generated-sources instead of java/api/src/main/java?
I saw the docs mention mvn clean cleans that, but is that actually useful? There is rake clean and also the most common case is probably to run rake templates to update the generated files rather than cleaning.

This is the script used to import the .java files in TruffleRuby, it'd be nice if we don't have to manually merge from different folders.
I think it's also nicer in an editor when files in the same package are siblings on the filesystem.

target/generated-sources is the standard location for sources generated by the Maven build, but after I reverted to generating outside of the build I'm not sure if that fits. I'll look into alternatives.

Right but the sources are not generated by the Maven build here, so it seems better under java/api/src/main/java/org/ruby_lang/prism

Continued in #4039

eregon · 2026-03-24T09:11:50Z

The nesting is quite deep with Maven, e.g.
java/org/ruby_lang/prism/ParseResult.java is moved to
java/api/src/main/java/org/ruby_lang/prism/ParseResult.java
that's 8 levels of nested folders before arriving to a source file (vs 4 before).

I looked and found there is:

<build>
    <sourceDirectory>${project.basedir}/src</sourceDirectory>
</build>

So it'd be java/api/src/org/ruby_lang/prism/ParseResult.java which would be nice since the main/java parts are purely redundant.

Could you try that if it doesn't cause any issue with the Maven plugins used?

eregon · 2026-03-24T09:19:29Z

Bit of a chicken and egg issue trying to get the generated parts of the Java build in place:

What you landed on looks good to me, i.e. require to run rake before.
That way there is no duplication and there is no need for Prism contributors to install Maven (if some Java-related CI job would fail), etc. Keeping to use Rake::JavaExtensionTask is good because it's low friction and ensures there are zero dependencies (on Maven packages).

headius · 2026-03-24T16:34:18Z

that's 8 levels of nested folders

The layout I used is standard Maven layout requiring no configuration.

One of those levels is for splitting up the components, so that's unavoidable.
The src path is required either way to separate sources from other elements.

So only main/java is really extra, and only for the non-wasm sources. In the wasm component, all levels are being used:

src/main/java for Java sources
src/main/java-templates for the generated sources
src/test/java for the Java tests
src/test/resources for the WASM build

If we generate sources into src/main/java-templates (perhaps preferable to target/generated-sources) and add src/test/java for some minimal unit tests, the divisions are no longer redundant. I'd rather leave the layout like this than have to add it back again later.

it'd be nice if we don't have to manually merge from different folders.

The generated C sources do go into src, but they are not versioned like the Java sources used to be:

[] prism $ ls -1 templates/src
diagnostic.c.erb
json.c.erb
node.c.erb
prettyprint.c.erb
serialize.c.erb
tokens.c.erb
[] prism $ diff <(git ls-files src) <(ls src/*)
4a5
> src/diagnostic.c
6a8
> src/json.c
9a12
> src/node.c
11a15
> src/prettyprint.c
13a18
> src/serialize.c
19a25
> src/tokens.c

Perhaps it would be easiest for you if the TR test build saved an archive of sources?

I think it's also nicer in an editor when files in the same package are siblings on the filesystem.

Actually, the -native and -wasm sources should be in subpackages, because JPMS does not allow separate modules to have files in the same package.

That way there is no duplication and there is no need for Prism contributors to install Maven

We can set up a ./mvnw Maven wrapper to avoid anyone having to install it.

Keeping to use Rake::JavaExtensionTask is good because it's low friction and ensures there are zero dependencies (on Maven packages)

There may be test-time dependencies in the future, such as on JUnit for testing. JavaExtensionTask will neither be able to build nor run those tests. It already can't verify the wasm component because it knows nothing of dependencies on JUnit and JRuby for testing nor Chicory for the build. It's just inadequate for non-trivial Java projects.

Verification that everything builds and tests should be done by a Maven build.

headius changed the title ~~Begin splitting the Java artifact into components~~ Split maven artifacts into component libraries Mar 23, 2026

headius added 2 commits March 23, 2026 15:04

Rework Java template generation for Maven build

b257151

This uses the JRuby rake-maven-plugin to generate the templates as part of the Maven build. The generated output for the Java templates will be under java/api/target/generated-sources/java.

Fix up references to the old project structure

2df7c97

headius force-pushed the split-maven-artifacts branch 2 times, most recently from d5ab64f to 2b6c107 Compare March 23, 2026 20:41

headius force-pushed the split-maven-artifacts branch 3 times, most recently from 6745e86 to d0d894c Compare March 23, 2026 21:23

Fix documentation job for new layout

1983383

headius force-pushed the split-maven-artifacts branch from d0d894c to 1983383 Compare March 23, 2026 21:25

headius force-pushed the split-maven-artifacts branch from 911edde to e2e78c8 Compare March 23, 2026 21:39

eregon mentioned this pull request Mar 23, 2026

Eliminate templated STRING_TYPE and use a factory #4009

Closed

kddnewton added the java Pull requests that update Java code label Mar 23, 2026

Back off Maven template run and build from Rake

2ea5d59

There's a lot of chicken-and-egg issues with trying to have Maven do all the steps for the Java artifact builds right now, so back off and require that templates (and WASM) are generated before the Maven builds of the relevant modules.

headius force-pushed the split-maven-artifacts branch 2 times, most recently from 01093a6 to 0008f5e Compare March 23, 2026 22:28

Fix documentation job for new paths

f369cab

headius force-pushed the split-maven-artifacts branch from 0008f5e to f369cab Compare March 23, 2026 22:35

Re-fix Java build for new layout

32661ee

headius force-pushed the split-maven-artifacts branch from 84bb9c9 to 32661ee Compare March 24, 2026 00:22

headius added 2 commits March 23, 2026 21:40

Clean up Maven build for split artifacts

d746c2a

Clean up Java readme

1f824a8

eregon reviewed Mar 24, 2026

View reviewed changes

kddnewton approved these changes Mar 24, 2026

View reviewed changes

kddnewton marked this pull request as ready for review March 24, 2026 11:08

kddnewton merged commit a98ab82 into ruby:main Mar 24, 2026
68 checks passed

headius deleted the split-maven-artifacts branch March 24, 2026 15:48

headius mentioned this pull request Mar 24, 2026

Fix javadoc gen for GH pages #4038

Merged

headius mentioned this pull request Mar 24, 2026

Maven tweaks for usability #4039

Draft

Conversation

headius commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eregon commented Mar 23, 2026

Uh oh!

headius commented Mar 23, 2026

Uh oh!

eregon commented Mar 23, 2026

Uh oh!

eregon commented Mar 23, 2026

Uh oh!

eregon commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eregon commented Mar 23, 2026

Uh oh!

headius commented Mar 23, 2026

Uh oh!

headius commented Mar 23, 2026

Uh oh!

headius commented Mar 23, 2026

Uh oh!

eregon commented Mar 23, 2026

Uh oh!

headius commented Mar 23, 2026

Uh oh!

eregon commented Mar 23, 2026

Uh oh!

eregon Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

headius Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

eregon Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

eregon Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

eregon commented Mar 24, 2026

Uh oh!

eregon commented Mar 24, 2026

Uh oh!

Uh oh!

headius commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

headius commented Mar 23, 2026 •

edited

Loading

eregon commented Mar 23, 2026 •

edited

Loading