Skip to content

TEZ-4682: [Cloud] Tez AM docker image#456

Merged
abstractdog merged 12 commits intoapache:masterfrom
Aggarwal-Raghav:TEZ-4682
Mar 24, 2026
Merged

TEZ-4682: [Cloud] Tez AM docker image#456
abstractdog merged 12 commits intoapache:masterfrom
Aggarwal-Raghav:TEZ-4682

Conversation

@Aggarwal-Raghav
Copy link
Contributor

No description provided.

@tez-yetus

This comment was marked as outdated.

@Aggarwal-Raghav
Copy link
Contributor Author

Aggarwal-Raghav commented Jan 24, 2026

@abstractdog , I was able to start DagAppMaster with ZK on local. Attaching logs for the container docker_logs.txt

docker run -d \
        --name tez-am \
        -p 10001:10001 \
        -e TEZ_FRAMEWORK_MODE="STANDALONE_ZOOKEEPER" apache/tez-am:1.0.0-SNAPSHOT
brew install zookeeper
zkServer start

But this PR has lot of open items and I need some advice on the following:

  1. Is the docker directory inside tez-dist fine or should I create a sepate sub-module for dockerfile related code which will be executed after tez-dist module.
  2. This image will presumeably be ran with ZK + K8 + S3. Question is do we need a hadoop tarball inside this image just in case for some 3rd party jars etc. If my understanding is correct, it shouldn't be there but I've kept it for now. Will remove if you say so.
  3. in DAGAppMaster#main() there are lot of ENV variables which I have mocked for now in entrypoint.sh. I'll try to improve this (suggestions are welcomed here)
  4. my tez-site.xml is not getting picked up from classpath
    Configuration conf = new Configuration();
    . will debug that
  5. Any way/How to test this AM container without YARN by running some job?

@tez-yetus

This comment was marked as outdated.

@abstractdog
Copy link
Contributor

abstractdog commented Jan 26, 2026

@abstractdog , I was able to start DagAppMaster with ZK on local. Attaching logs for the container docker_logs.txt

docker run -d \
        --name tez-am \
        -p 10001:10001 \
        -e TEZ_FRAMEWORK_MODE="STANDALONE_ZOOKEEPER" apache/tez-am:1.0.0-SNAPSHOT
brew install zookeeper
zkServer start

But this PR has lot of open items and I need some advice on the following:

  1. Is the docker directory inside tez-dist fine or should I create a sepate sub-module for dockerfile related code which will be executed after tez-dist module.
  2. This image will presumeably be ran with ZK + K8 + S3. Question is do we need a hadoop tarball inside this image just in case for some 3rd party jars etc. If my understanding is correct, it shouldn't be there but I've kept it for now. Will remove if you say so.
  3. in DAGAppMaster#main() there are lot of ENV variables which I have mocked for now in entrypoint.sh. I'll try to improve this (suggestions are welcomed here)
  4. my tez-site.xml is not getting picked up from classpath
    Configuration conf = new Configuration();

    . will debug that
  5. Any way/How to test this AM container without YARN by running some job?

very good, very good, let me check this in detail sometime this week, here are some pointers in the meantime, responding your questions:

  1. I believe we can follow Apache Hive in this area, feel free to do something like here: https://github.com/apache/hive/tree/master/packaging

  2. We should keep hadoop jars. Even if the k8s environment is not the hadoop/yarn environment anymore, Tez heavily depends on hadoop compile time and runtime as well, and this is something we don't intend to break in the short or midterm.

  3. I'll check it. What we should really be clear about is e.g.

# 3. NodeManager Details
export NM_HOST=${NM_HOST:-"localhost"}
export NM_PORT=${NM_PORT:-"12345"}

there is no Yarn NodeManager in a k8s environment, so the reader of the entrypoint.sh should see a clear code distinguishing between needed env vars and legacy/backward-compatible env vars, that's what should be handled with care in my opinion

  1. Okay.

  2. Yeah. So given that neither tez containers (TEZ-4665) nor llap containers (HIVE-29411) thing is implemented, we cannot successfully run a whole DAG, but we can get to a point where at least a DAG is successfully submitted from Hive to this AM container. So, I believe, to make this happen, we need to make a HS2 container (see Hive instructions for dockerized setup) be able to find this Tez AM container, so most probably, we need to stop using tez.local.mode=true for this experiment
    UPDATE: after creating HIVE-29419 for a separate TezAM image in Hive, the testing of this AM could be as simple as opening a TezClient to the AM container and submitting a DAG (with documentation attached).

@Aggarwal-Raghav
Copy link
Contributor Author

Aggarwal-Raghav commented Jan 27, 2026

Thanks for the pointers @abstractdog .

  1. Yes, the implementation is reminiscent of hive (TBH, pom.xml and build-docker.sh and some parts of Dockerfile are taken from hive to some extent)
  2. For basic startup of tez am without hadoop jars, I didn't observed any issue. As tez tar ball contains few hadoop jars and i think they and their transitive dependency jars are sufficient for tez-am to be client of hadoop services (but I have commit ready just in case if we later want to remove hadoop tarball)
  3. No Update. I believe, code change in DagAppMaster is required for segregation.
  4. Raised TEZ-4685: DagAppMaster is not picking tez-site.xml from classpath in zookeeper mode #458

Few additional things:

  1. DAGAppMaster#serviceInit() => DAGAppMaster#createTaskSchedulerManager is trying to connect to ResourceManager even in zookeeper mode . I think we shouldn't use YARN scheduler and maybe move to Yunikorn (we are using that in spark internally). Let me know how to proceed for this? For now should I raise a PR for skipping it if zk mode is enabled?
2026-01-27 19:13:06,207 INFO zookeeper.ZkAMRegistry: Added AMRecord to zkpath /tez-external-sessions/tez_am/server/application_1769280834537_0000
2026-01-27 19:13:06,208 INFO app.DAGAppMaster: Added AMRecord: {hostName=2d0733bd53ae, externalId=tez-session-, hostIp=172.17.0.2, port=10001, computeName=default-compute, appId=application_1769280834537_0000} to registry..
2026-01-27 19:13:06,210 INFO rm.TaskSchedulerManager: Creating YARN TaskScheduler: org.apache.tez.dag.app.rm.DagAwareYarnTaskScheduler
2026-01-27 19:13:06,253 INFO conf.Configuration: resource-types.xml not found
2026-01-27 19:13:06,253 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2026-01-27 19:13:06,259 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2026-01-27 19:13:06,259 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2026-01-27 19:13:06,263 INFO rm.DagAwareYarnTaskScheduler: scheduler initialized with maxRMHeartbeatInterval:1000 reuseEnabled:true reuseRack:true reuseAny:false localityDelay:250 preemptPercentage:10 preemptMaxWaitTime:60000 numHeartbeatsBetweenPreemptions:3 idleContainerMinTimeout:5000 idleContainerMaxTimeout:10000 sessionMinHeldContainers:0
2026-01-27 19:13:06,267 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8030
2026-01-27 19:13:07,572 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2026-01-27 19:13:08,580 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2026-01-27 19:13:09,588 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2026-01-27 19:13:10,595 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
  1. Disable tez.am.ui as it's also using yarn rm proxy

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@Aggarwal-Raghav
Copy link
Contributor Author

DAGAppMaster#serviceInit() => DAGAppMaster#createTaskSchedulerManager is trying to connect to ResourceManager even in zookeeper mode . I think we shouldn't use YARN scheduler and maybe move to Yunikorn (we are using that in spark internally). Let me know how to proceed for this? For now should I raise a PR for skipping it if zk mode is enabled?

Using tez.local.mode=true, solves this as it will use LocalTaskScheduler
DAG is Up and ready:
Screenshot 2026-02-13 at 12 16 32 AM

@tez-yetus

This comment was marked as outdated.

@Aggarwal-Raghav Aggarwal-Raghav changed the title [DRAFT] [WIP] TEZ-4682: [Cloud] Tez AM docker image TEZ-4682: [Cloud] Tez AM docker image Feb 15, 2026
@tez-yetus

This comment was marked as outdated.

@Aggarwal-Raghav
Copy link
Contributor Author

@abstractdog , Can you please help with review?

@abstractdog
Copy link
Contributor

@abstractdog , Can you please help with review?

let me get back to this next week

@Aggarwal-Raghav
Copy link
Contributor Author

@abstractdog , Can you please help with review?

let me get back to this next week

sure

@tez-yetus

This comment was marked as outdated.

mvn clean install -DskipTests -Pdocker,tools
```

2. Install zookeeper in mac by:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add ubuntu steps? we might be so kind to let linux users' life be easier

Copy link
Contributor

@abstractdog abstractdog Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPDATE: can we use a dockerized zookeeper instead? install ZK on the host machine looks against this whole cloud/docker initiative (also, in case of problems or ZK nodes messed up, deleting and restarting a container feels easier and cleaner to me)


# Tez AM Container Environment Configuration

HADOOP_USER_NAME=tez
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicking, can you order the env vars here the same as they are ordered in the entrypoint script?

tez-dist/pom.xml Outdated
Comment on lines +138 to +144
<argument>${project.basedir}/src/docker/build-docker.sh</argument>
<argument>-hadoop</argument>
<argument>${hadoop.version}</argument>
<argument>-tez</argument>
<argument>${project.version}</argument>
<argument>-repo</argument>
<argument>apache</argument>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make it similar to what I can see in Hive? much less verbose, e.g.

              <arguments>
                <argument>.... .sh</argument>
                <argument>-hadoop ${hadoop.version}</argument>
                <argument>-tez ${tez.version}</argument>
              </arguments>

@Aggarwal-Raghav
Copy link
Contributor Author

Thanks for the thorough review @abstractdog . I'll address the review comments shortly, you can continue the review in the meantime. I hope you are able to build the image and start tez am standalone process 😅

I still believe we can get rid of hadoop tarball dependency completely as the hadoop dependent required jars are already part of tez tarball. It might unnecessary increase docker image size.

Also please suggest should I use eclipse-temurin:21.0.3_9-jre-ubi9-minimal or jdk, in case we want to take jstack or other java debugging tools, jkd image is required.

@abstractdog
Copy link
Contributor

abstractdog commented Feb 25, 2026

Thanks for the thorough review @abstractdog . I'll address the review comments shortly, you can continue the review in the meantime. I hope you are able to build the image and start tez am standalone process 😅

I still believe we can get rid of hadoop tarball dependency completely as the hadoop dependent required jars are already part of tez tarball. It might unnecessary increase docker image size.

Also please suggest should I use eclipse-temurin:21.0.3_9-jre-ubi9-minimal or jdk, in case we want to take jstack or other java debugging tools, jkd image is required.

  1. I believe as long as a simple DAG can run without adding hadoop jars (other than what's already inside tez.tar.gz), it fine to get rid of them: we cannot test it now, but we can still check if the AM starts correctly, and if so, we're good

  2. regarding debugging tools: I would add them in the first round, and we can still optimize later: these images are not for production in the first round, so I would rather have a slightly bigger image with debugging tools than having a small image that's harder to use while investigating something

# HADOOP FETCH LOGIC #
######################
HADOOP_FILE_NAME="hadoop-$HADOOP_VERSION.tar.gz"
HADOOP_URL=${HADOOP_URL:-"https://archive.apache.org/dist/hadoop/core/hadoop-$HADOOP_VERSION/$HADOOP_FILE_NAME"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about using this first:
https://dlcdn.apache.org/hadoop/common/hadoop-3.4.2/hadoop-3.4.2.tar.gz
and then fall back to archive

archive.apache.org is crazy slow for me at the moment (not the first time), maybe it would worth discovering dlcdn.apache.org

Comment on lines +21 to +22
USER=tez
HADOOP_USER_NAME=tez
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see finally DAGAppMaster is started as:

    -Duser.name="$HADOOP_USER_NAME" \

do we need a separate HADOOP_USER_NAME env var?

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@abstractdog
Copy link
Contributor

abstractdog commented Mar 17, 2026

thanks @Aggarwal-Raghav , this is so close, left minor comments
2 more things in general:

  1. would you consider replacing "tez_am" and "tez-am" simply with "am" in the file names? this would simplify a lot and also remove the need of thinking about underscore vs. hyphen in different contexts
  2. would you create a separate jira for a simple client example or include it here, I'm fine with whatever you choose: ideally this should work without hive, so the user arrives and 1) starts docker-compose 2) submits a DAG easily according to README (sample DAG creation and submit is the point where Tez repo might want to give good standalone example)

other than that, this is in a very good shape and I feel it's good to go in, we can fix anything later that might come along the way

@Aggarwal-Raghav
Copy link
Contributor Author

thanks @Aggarwal-Raghav , this is so close, left minor comments 2 more things in general:

  1. would you consider replacing "tez_am" and "tez-am" simply with "am" in the file names? this would simplify a lot and also remove the need of thinking about underscore vs. hyphen in different contexts
  2. would you create a separate jira for a simple client example or include it here, I'm fine with whatever you choose: ideally this should work without hive, so the user arrives and 1) starts docker-compose 2) submits a DAG easily according to README (sample DAG creation and submit is the point where Tez repo might want to give good standalone example)

other than that, this is in a very good shape and I feel it's good to go in, we can fix anything later that might come along the way

thanks @Aggarwal-Raghav , this is so close, left minor comments 2 more things in general:

  1. would you consider replacing "tez_am" and "tez-am" simply with "am" in the file names? this would simplify a lot and also remove the need of thinking about underscore vs. hyphen in different contexts
  2. would you create a separate jira for a simple client example or include it here, I'm fine with whatever you choose: ideally this should work without hive, so the user arrives and 1) starts docker-compose 2) submits a DAG easily according to README (sample DAG creation and submit is the point where Tez repo might want to give good standalone example)

other than that, this is in a very good shape and I feel it's good to go in, we can fix anything later that might come along the way

  1. cd tez-dist/src/docker/tez-am/{Dockerfile.am, build-am-docker.sh, am-entrypoint.sh, am.env} is this ok?
  2. I would prefer to put the in this JIRA to showcase the end-2-end value. I'll see if I can make use of tez-examples wordcount example here otherwise will create a new program ExternalAMWordCount (like https://github.com/Aggarwal-Raghav/ZkWordCount)

@abstractdog
Copy link
Contributor

thanks @Aggarwal-Raghav , this is so close, left minor comments 2 more things in general:

  1. would you consider replacing "tez_am" and "tez-am" simply with "am" in the file names? this would simplify a lot and also remove the need of thinking about underscore vs. hyphen in different contexts
  2. would you create a separate jira for a simple client example or include it here, I'm fine with whatever you choose: ideally this should work without hive, so the user arrives and 1) starts docker-compose 2) submits a DAG easily according to README (sample DAG creation and submit is the point where Tez repo might want to give good standalone example)

other than that, this is in a very good shape and I feel it's good to go in, we can fix anything later that might come along the way

thanks @Aggarwal-Raghav , this is so close, left minor comments 2 more things in general:

  1. would you consider replacing "tez_am" and "tez-am" simply with "am" in the file names? this would simplify a lot and also remove the need of thinking about underscore vs. hyphen in different contexts
  2. would you create a separate jira for a simple client example or include it here, I'm fine with whatever you choose: ideally this should work without hive, so the user arrives and 1) starts docker-compose 2) submits a DAG easily according to README (sample DAG creation and submit is the point where Tez repo might want to give good standalone example)

other than that, this is in a very good shape and I feel it's good to go in, we can fix anything later that might come along the way

  1. cd tez-dist/src/docker/tez-am/{Dockerfile.am, build-am-docker.sh, am-entrypoint.sh, am.env} is this ok?
  2. I would prefer to put the in this JIRA to showcase the end-2-end value. I'll see if I can make use of tez-examples wordcount example here otherwise will create a new program ExternalAMWordCount (like https://github.com/Aggarwal-Raghav/ZkWordCount)
  1. yes, perfect IMO
  2. okay, I'll let you decide

@Aggarwal-Raghav
Copy link
Contributor Author

Aggarwal-Raghav commented Mar 19, 2026

  1. I have used official hadoop docker image 3.4.2-lean
  2. namenode and datanode will be separate service otherwise we need to use a custom entrypoint to start in a single container as happening in minimal-hadoop docker image. I think it is better to have separate. I have added a wait also to ensure namenode is not in safemode before datanode and tez-am is up.
  3. Handled the review comments regarding docs and file naming conventions.
  4. Added a sample program to run in tez-docker am and here it get tricky ⚠️. It is throwing same error as mentioned in TEZ-4686 with standalone program.
    Steps:
1. cd tez-dist/src/docker/tez-am/
2. docker compose up -d --build
Screenshot 2026-03-20 at 12 26 15 AM

Everything should be running at this point

3. docker exec -it tez-am bash
4. echo "Hello world Hello" > /tmp/input.txt
5. java -cp ./*:./lib/*:tez-examples-1.0.0-SNAPSHOT.jar org.apache.tez.examples.ExternalAmWordCount /tmp/input.txt /tmp/output

With TEZ-4686 cherrypick:

❯ docker exec -it tez-am bash
bash-5.1$ echo "Hello world Hello" > /tmp/input.txt
java -cp ./*:./lib/*:tez-examples-1.0.0-SNAPSHOT.jar org.apache.tez.examples.ExternalAmWordCount /tmp/input.txt /tmp/output
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
bash-5.1$ ls /tmp/output
attempt_1773945902335_0000_1_00_000000_0_10003	attempt_1773945902335_0000_1_00_000000_0_10003_0  part-v001-o000-r-00000  _SUCCESS
bash-5.1$ cat /tmp/output/part-v001-o000-r-00000
Hello	2
world	1
bash-5.1$

Without TEZ-4686 cherrypick:

❯ docker exec -it tez-am bash
bash-5.1$ echo "Hello world Hello" > /tmp/input.txt
java -cp ./*:./lib/*:tez-examples-1.0.0-SNAPSHOT.jar org.apache.tez.examples.ExternalAmWordCount /tmp/input.txt /tmp/output
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.lang.NullPointerException: Cannot invoke "org.apache.tez.client.registry.AMRecord.getApplicationId()" because "this.amRecord" is null
	at org.apache.tez.client.registry.zookeeper.ZkFrameworkClient.createApplication(ZkFrameworkClient.java:114)
	at org.apache.tez.client.TezClient.createApplication(TezClient.java:1103)
	at org.apache.tez.client.TezClient.start(TezClient.java:399)
	at org.apache.tez.examples.ExternalAmWordCount.main(ExternalAmWordCount.java:74)

@tez-yetus

This comment was marked as outdated.

@Aggarwal-Raghav
Copy link
Contributor Author

  1. I have used official hadoop docker image 3.4.2-lean
  2. namenode and datanode will be separate service otherwise we need to use a custom entrypoint to start in a single container as happening in minimal-hadoop docker image. I think it is better to have separate. I have added a wait also to ensure namenode is not in safemode before datanode and tez-am is up.
  3. Handled the review comments regarding docs and file naming conventions.
  4. Added a sample program to run in tez-docker am and here it get tricky ⚠️. It is throwing same error as mentioned in TEZ-4686 with standalone program.
    Steps:
1. cd tez-dist/src/docker/tez-am/
2. docker compose up -d --build
Screenshot 2026-03-20 at 12 26 15 AM **Everything should be running at this point**
3. docker exec -it tez-am bash
4. echo "Hello world Hello" > /tmp/input.txt
5. java -cp ./*:./lib/*:tez-examples-1.0.0-SNAPSHOT.jar org.apache.tez.examples.ExternalAmWordCount /tmp/input.txt /tmp/output

With TEZ-4686 cherrypick:

❯ docker exec -it tez-am bash
bash-5.1$ echo "Hello world Hello" > /tmp/input.txt
java -cp ./*:./lib/*:tez-examples-1.0.0-SNAPSHOT.jar org.apache.tez.examples.ExternalAmWordCount /tmp/input.txt /tmp/output
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
bash-5.1$ ls /tmp/output
attempt_1773945902335_0000_1_00_000000_0_10003	attempt_1773945902335_0000_1_00_000000_0_10003_0  part-v001-o000-r-00000  _SUCCESS
bash-5.1$ cat /tmp/output/part-v001-o000-r-00000
Hello	2
world	1
bash-5.1$

Without TEZ-4686 cherrypick:

❯ docker exec -it tez-am bash
bash-5.1$ echo "Hello world Hello" > /tmp/input.txt
java -cp ./*:./lib/*:tez-examples-1.0.0-SNAPSHOT.jar org.apache.tez.examples.ExternalAmWordCount /tmp/input.txt /tmp/output
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.lang.NullPointerException: Cannot invoke "org.apache.tez.client.registry.AMRecord.getApplicationId()" because "this.amRecord" is null
	at org.apache.tez.client.registry.zookeeper.ZkFrameworkClient.createApplication(ZkFrameworkClient.java:114)
	at org.apache.tez.client.TezClient.createApplication(TezClient.java:1103)
	at org.apache.tez.client.TezClient.start(TezClient.java:399)
	at org.apache.tez.examples.ExternalAmWordCount.main(ExternalAmWordCount.java:74)

@abstractdog , any input on this?

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 16m 32s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+0 🆗 shelldocs 0m 0s Shelldocs was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+0 🆗 mvndep 0m 53s Maven dependency ordering for branch
+1 💚 mvninstall 6m 11s master passed
+1 💚 compile 4m 4s master passed
+1 💚 checkstyle 2m 28s master passed
+1 💚 javadoc 2m 44s master passed
+0 🆗 spotbugs 0m 50s tez-examples in master has 2 extant spotbugs warnings.
+0 🆗 spotbugs 1m 44s tez-dag in master has 749 extant spotbugs warnings.
+0 🆗 spotbugs 7m 13s root in master has 1935 extant spotbugs warnings.
+0 🆗 spotbugs 0m 31s branch/tez-dist no spotbugs output file (spotbugsXml.xml)
-0 ⚠️ patch 1m 3s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 9s Maven dependency ordering for patch
+1 💚 mvninstall 5m 31s the patch passed
+1 💚 codespell 1m 35s No new issues.
+1 💚 compile 4m 6s the patch passed
+1 💚 javac 4m 6s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 2m 17s the patch passed
+1 💚 hadolint 0m 0s No new issues.
+1 💚 markdownlint 0m 2s No new issues.
+1 💚 shellcheck 0m 1s No new issues.
+1 💚 yamllint 0m 1s No new issues.
+1 💚 javadoc 2m 42s the patch passed
+0 🆗 spotbugs 0m 29s tez-dist has no data from spotbugs
_ Other Tests _
+1 💚 unit 0m 32s tez-examples in the patch passed.
+1 💚 unit 6m 11s tez-dag in the patch passed.
+1 💚 unit 0m 29s tez-dist in the patch passed.
+1 💚 unit 73m 45s root in the patch passed.
+1 💚 asflicense 1m 52s The patch does not generate ASF License warnings.
156m 43s
Subsystem Report/Notes
Docker ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-456/13/artifact/out/Dockerfile
GITHUB PR #456
Optional Tests dupname asflicense codespell detsecrets javac javadoc unit xmllint compile shellcheck shelldocs yamllint hadolint markdownlint spotbugs checkstyle
uname Linux 20120a19e997 5.15.0-173-generic #183-Ubuntu SMP Fri Mar 6 13:29:34 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-home/workspace/tez-multibranch_PR-456/src/.yetus/personality.sh
git revision master / 4ddcff7
Default Java Ubuntu-21.0.10+7-Ubuntu-124.04
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-456/13/testReport/
Max. process+thread count 1398 (vs. ulimit of 5500)
modules C: tez-examples tez-dag tez-dist . U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-456/13/console
versions git=2.43.0 maven=3.8.7 hadolint=1.18.0-0-g76eee5c spotbugs=4.9.3 codespell=2.4.1 markdownlint=0.46.0 shellcheck=0.7.1 yamllint=1.38.0
Powered by Apache Yetus 0.15.1 https://yetus.apache.org

This message was automatically generated.

@abstractdog
Copy link
Contributor

  1. I have used official hadoop docker image 3.4.2-lean
  2. namenode and datanode will be separate service otherwise we need to use a custom entrypoint to start in a single container as happening in minimal-hadoop docker image. I think it is better to have separate. I have added a wait also to ensure namenode is not in safemode before datanode and tez-am is up.
  3. Handled the review comments regarding docs and file naming conventions.
  4. Added a sample program to run in tez-docker am and here it get tricky ⚠️. It is throwing same error as mentioned in TEZ-4686 with standalone program.
    Steps:
1. cd tez-dist/src/docker/tez-am/
2. docker compose up -d --build
Screenshot 2026-03-20 at 12 26 15 AM **Everything should be running at this point**
3. docker exec -it tez-am bash
4. echo "Hello world Hello" > /tmp/input.txt
5. java -cp ./*:./lib/*:tez-examples-1.0.0-SNAPSHOT.jar org.apache.tez.examples.ExternalAmWordCount /tmp/input.txt /tmp/output

With TEZ-4686 cherrypick:

❯ docker exec -it tez-am bash
bash-5.1$ echo "Hello world Hello" > /tmp/input.txt
java -cp ./*:./lib/*:tez-examples-1.0.0-SNAPSHOT.jar org.apache.tez.examples.ExternalAmWordCount /tmp/input.txt /tmp/output
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
bash-5.1$ ls /tmp/output
attempt_1773945902335_0000_1_00_000000_0_10003	attempt_1773945902335_0000_1_00_000000_0_10003_0  part-v001-o000-r-00000  _SUCCESS
bash-5.1$ cat /tmp/output/part-v001-o000-r-00000
Hello	2
world	1
bash-5.1$

Without TEZ-4686 cherrypick:

❯ docker exec -it tez-am bash
bash-5.1$ echo "Hello world Hello" > /tmp/input.txt
java -cp ./*:./lib/*:tez-examples-1.0.0-SNAPSHOT.jar org.apache.tez.examples.ExternalAmWordCount /tmp/input.txt /tmp/output
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.lang.NullPointerException: Cannot invoke "org.apache.tez.client.registry.AMRecord.getApplicationId()" because "this.amRecord" is null
	at org.apache.tez.client.registry.zookeeper.ZkFrameworkClient.createApplication(ZkFrameworkClient.java:114)
	at org.apache.tez.client.TezClient.createApplication(TezClient.java:1103)
	at org.apache.tez.client.TezClient.start(TezClient.java:399)
	at org.apache.tez.examples.ExternalAmWordCount.main(ExternalAmWordCount.java:74)

looks good @Aggarwal-Raghav : the NPE is because the example program uses TezClient.start, which eventually leads to a createApplication call instead of discovering one from the registry, this is what's described on HIVE-29477 and workarounded like this:
https://github.com/apache/hive/pull/6343/changes#diff-acfe9151d94581378224e8c415c3eb8eedbfa2c5118c1ac3a1945a2146c62802R104-R118

in the long run we can make TezClient.start to magically handle this, but in the meantime, to unblock this PR, the example program can utilize TezClient the same way the above Hive class does: if you can make this work, this PR is ready

other than that, left minor comments regarding the example program

@abstractdog abstractdog self-requested a review March 23, 2026 12:19
Copy link
Contributor

@abstractdog abstractdog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments

import org.slf4j.LoggerFactory;

/**
* Sample Program inspired for WordCount but to run with External Tez AM with Zookeeper.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inspired by

@Aggarwal-Raghav
Copy link
Contributor Author

Thanks for the suggestions. I'll address these today only.

@Aggarwal-Raghav
Copy link
Contributor Author

Aggarwal-Raghav commented Mar 23, 2026

Hopefully have addressed all the review comments and added a log4j.properties in tez-examples jar.
Here are the logs: ExternalAmWordCount.log

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 5m 26s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+0 🆗 shelldocs 0m 0s Shelldocs was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+0 🆗 mvndep 0m 55s Maven dependency ordering for branch
+1 💚 mvninstall 4m 55s master passed
+1 💚 compile 2m 31s master passed
+1 💚 checkstyle 1m 36s master passed
+1 💚 javadoc 1m 48s master passed
+0 🆗 spotbugs 0m 35s tez-examples in master has 2 extant spotbugs warnings.
+0 🆗 spotbugs 1m 0s tez-dag in master has 749 extant spotbugs warnings.
+0 🆗 spotbugs 4m 14s root in master has 1935 extant spotbugs warnings.
+0 🆗 spotbugs 0m 19s branch/tez-dist no spotbugs output file (spotbugsXml.xml)
-0 ⚠️ patch 0m 43s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 5s Maven dependency ordering for patch
+1 💚 mvninstall 3m 25s the patch passed
+1 💚 codespell 0m 51s No new issues.
+1 💚 compile 2m 37s the patch passed
+1 💚 javac 2m 37s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 24s the patch passed
+1 💚 hadolint 0m 1s No new issues.
+1 💚 markdownlint 0m 1s No new issues.
+1 💚 shellcheck 0m 1s No new issues.
+1 💚 yamllint 0m 0s No new issues.
+1 💚 javadoc 1m 45s the patch passed
+0 🆗 spotbugs 0m 20s tez-dist has no data from spotbugs
_ Other Tests _
+1 💚 unit 0m 21s tez-examples in the patch passed.
+1 💚 unit 5m 11s tez-dag in the patch passed.
+1 💚 unit 0m 20s tez-dist in the patch passed.
+1 💚 unit 61m 7s root in the patch passed.
+1 💚 asflicense 1m 17s The patch does not generate ASF License warnings.
110m 41s
Subsystem Report/Notes
Docker ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-456/14/artifact/out/Dockerfile
GITHUB PR #456
Optional Tests dupname asflicense codespell detsecrets javac javadoc unit xmllint compile shellcheck shelldocs yamllint hadolint markdownlint spotbugs checkstyle
uname Linux 685dcb65dc83 5.15.0-173-generic #183-Ubuntu SMP Fri Mar 6 13:29:34 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-home/workspace/tez-multibranch_PR-456/src/.yetus/personality.sh
git revision master / d82bb55
Default Java Ubuntu-21.0.10+7-Ubuntu-124.04
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-456/14/testReport/
Max. process+thread count 1359 (vs. ulimit of 5500)
modules C: tez-examples tez-dag tez-dist . U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-456/14/console
versions git=2.43.0 maven=3.8.7 hadolint=1.18.0-0-g76eee5c spotbugs=4.9.3 codespell=2.4.1 markdownlint=0.46.0 shellcheck=0.7.1 yamllint=1.38.0
Powered by Apache Yetus 0.15.1 https://yetus.apache.org

This message was automatically generated.

@abstractdog abstractdog self-requested a review March 24, 2026 07:57
@abstractdog
Copy link
Contributor

Hopefully have addressed all the review comments and added a log4j.properties in tez-examples jar.

@Aggarwal-Raghav, this is awesome, I just validated the patch with these steps:

docker-compose -f tez-dist/src/docker/tez-am/docker-compose.yml down --rmi all #if any previous containers were present
mvn clean install -DskipTests -Pdocker
docker-compose -f tez-dist/src/docker/tez-am/docker-compose.yml up -d --build

docker exec -it tez-am bash
echo "Hello world Hello" > /tmp/input.txt
java -cp ./*:./lib/*:tez-examples-1.0.0-SNAPSHOT.jar org.apache.tez.examples.ExternalAmWordCount /tmp/input.txt /tmp/output

I believe this patch is finally ready to go in

@abstractdog abstractdog merged commit fd03af6 into apache:master Mar 24, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants