Skip to content

Build: Bump hadoop from 3.4.3 to 3.5.0.#15897

Open
slfan1989 wants to merge 1 commit into
apache:mainfrom
slfan1989:bump-hadoop-from-3.4.3-to-3.5.0
Open

Build: Bump hadoop from 3.4.3 to 3.5.0.#15897
slfan1989 wants to merge 1 commit into
apache:mainfrom
slfan1989:bump-hadoop-from-3.4.3-to-3.5.0

Conversation

@slfan1989
Copy link
Copy Markdown
Contributor

Build: Bump hadoop from 3.4.3 to 3.5.0.

@slfan1989 slfan1989 marked this pull request as draft April 6, 2026 12:15
@slfan1989 slfan1989 force-pushed the bump-hadoop-from-3.4.3-to-3.5.0 branch from afa9bc6 to f94972f Compare April 12, 2026 12:57
@github-actions github-actions Bot added the spark label Apr 12, 2026
@slfan1989 slfan1989 force-pushed the bump-hadoop-from-3.4.3-to-3.5.0 branch from 58516fe to aa181ac Compare April 13, 2026 02:15
@slfan1989 slfan1989 marked this pull request as ready for review April 13, 2026 02:16
@slfan1989
Copy link
Copy Markdown
Contributor Author

@nastra @huaxingao Could you help review this PR? Thank you very much! This change is mainly to support the newly released Hadoop version(hadoop-3.5.0), which is the first version with full JDK 17 support. Compared with hadoop-3.4.3, it upgrades Jersey to 2.46 and introduces related Jakarta dependency changes, including the switch to jakarta.servlet.jsp-api. Therefore, we need to explicitly add the javax.servlet:javax.servlet-api:4.0.1 dependency in Spark 3.4 and Spark 3.5.

Comment thread gradle/libs.versions.toml Outdated
Comment thread gradle/libs.versions.toml Outdated
jetty-compression-server = { module = "org.eclipse.jetty.compression:jetty-compression-server", version.ref = "jetty" }
jetty-compression-gzip = { module = "org.eclipse.jetty.compression:jetty-compression-gzip", version.ref = "jetty" }
javax-servlet = { module = "javax.servlet:javax.servlet-api", version.ref = "javax-servlet-api" }
jetty-server = { module = "org.eclipse.jetty:jetty-server", version.ref = "jetty" }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be needed, because we're using jetty-compression-server already

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. That catalog alias was unused in this PR.

Comment thread gradle/libs.versions.toml Outdated
jakarta-servlet = {module = "jakarta.servlet:jakarta.servlet-api", version.ref = "jakarta-servlet-api"}
jetty-compression-server = { module = "org.eclipse.jetty.compression:jetty-compression-server", version.ref = "jetty" }
jetty-compression-gzip = { module = "org.eclipse.jetty.compression:jetty-compression-gzip", version.ref = "jetty" }
javax-servlet = { module = "javax.servlet:javax.servlet-api", version.ref = "javax-servlet-api" }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javax-servlet is deprecated and was replaced by jakarta-servlet. Is this something that we can use here instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jakarta.servlet can't replace javax.servlet here. Spark 3.4/3.5 with Hive 2 still load legacy javax.servlet.* classes at runtime, and those packages are not binary-compatible. I kept the workaround scoped to testRuntimeOnly / integrationRuntimeOnly in the Spark 3.4 and 3.5 modules only.

Copy link
Copy Markdown
Contributor

@nastra nastra May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably use the previous Hadoop version with Spark 3.5 to avoid pulling in legacy packages

@slfan1989 slfan1989 force-pushed the bump-hadoop-from-3.4.3-to-3.5.0 branch from ada4bb6 to 78ff01d Compare May 7, 2026 02:24
@slfan1989
Copy link
Copy Markdown
Contributor Author

@nastra I’m very sorry that I wasn’t able to continue following up on this PR in a timely manner last month. Recently, I’ve made some improvements and refinements to the related changes. If you have time, could you please take another look when convenient? Thank you very much for your time and help!

@slfan1989
Copy link
Copy Markdown
Contributor Author

slfan1989 commented May 14, 2026

@nastra Could you help review this PR again? Thank you very much!

Comment thread spark/v3.4/build.gradle Outdated
integrationRuntimeOnly project(path: ':iceberg-core', configuration: 'testArtifacts')
integrationRuntimeOnly (project(path: ':iceberg-open-api', configuration: 'testFixturesRuntimeElements'))
// Spark 3.4 + Hive 2 still load legacy javax.servlet classes at runtime
integrationRuntimeOnly libs.javax.servlet
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rebase, because Spark 3.4 module has been removed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I rebased onto the current master and dropped the Spark 3.4 change since that module has been removed upstream.

Comment thread spark/v3.5/build.gradle Outdated
integrationRuntimeOnly project(path: ':iceberg-core', configuration: 'testArtifacts')
integrationRuntimeOnly (project(path: ':iceberg-open-api', configuration: 'testFixturesRuntimeElements'))
// Spark 3.5 + Hive 2 still load legacy javax.servlet classes at runtime
integrationRuntimeOnly libs.javax.servlet
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably rather use the previous hadoop version with Spark 3.5 instead of pulling in these dependencies

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks. I checked the dependency path and Hadoop 3.5.0 is coming into the Spark 3.5 test and integration runtimes via :iceberg-open-api test fixtures. I dropped the javax.servlet workaround and pinned those runtimes back to Hadoop 3.4.3 instead.

@slfan1989 slfan1989 force-pushed the bump-hadoop-from-3.4.3-to-3.5.0 branch from 088f497 to 58a1abc Compare May 21, 2026 15:45
@slfan1989 slfan1989 force-pushed the bump-hadoop-from-3.4.3-to-3.5.0 branch from 306b8f4 to 5d6bbcb Compare May 22, 2026 01:01
Comment thread gradle/libs.versions.toml
guava = "33.6.0-jre"
hadoop3 = "3.4.3"
hadoop3 = "3.5.0"
hadoop3-previous = "3.4.3"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's name this hadoop3-spark35 to clearly indicate that this is only for Spark 3.5

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Renamed it to hadoop3-spark35 to make the scope explicit and updated the Spark 3.5 references accordingly.

org.codehaus.mojo:animal-sniffer-annotations:1.27
org.codehaus.woodstox:stax2-api:4.2
org.conscrypt:conscrypt-openjdk-uber:2.5
org.glassfish.hk2.external:aopalliance-repackaged:2.6
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably exclude all of the glassfish dependencies from the runtime module

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also exclude this from testFixturesImplementation(libs.hadoop3.common)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the Glassfish excludes on the Kafka Connect runtime hadoop3.common dependency and also mirrored them on :iceberg-open-api test fixtures. I also excluded org.javassist:javassist. Regenerated runtime-deps.txt and verified the runtime baseline again.

exclude group: 'org.apache.hadoop', module: 'hadoop-auth'
exclude group: 'org.apache.commons', module: 'commons-configuration2'
exclude group: 'org.apache.hadoop.thirdparty', module: 'hadoop-shaded-protobuf_3_7'
exclude group: 'org.eclipse.jetty'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add all of the glassfish excludes here. If possible also exclude javassist

com.microsoft.azure:msal4j-persistence-extension:1.3
com.microsoft.azure:msal4j:1.23
com.sun.xml.bind:jaxb-impl:2.2
com.sun.istack:istack-commons-runtime:3.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to check whether this is already included in the LICENSE file for the kafka-connect-runtime (most likely not, so it needs to be added there if this is a required dependency)

javax.servlet:javax.servlet-api:3.1
javax.xml.bind:jaxb-api:2.2
javax.xml.stream:stax-api:1.0-2
jakarta.annotation:jakarta.annotation-api:1.3
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LICENSE must be updated to reflect these new dependencies and the old ones must be removed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants