Skip to content

AVRO-4254: [Java] Avoid logging datum values in UnresolvedUnionException#3764

Merged
RyanSkraba merged 1 commit intoapache:mainfrom
kunalmnnit:fix/unresolved-union-exception-data-leak
May 7, 2026
Merged

AVRO-4254: [Java] Avoid logging datum values in UnresolvedUnionException#3764
RyanSkraba merged 1 commit intoapache:mainfrom
kunalmnnit:fix/unresolved-union-exception-data-leak

Conversation

@kunalmnnit
Copy link
Copy Markdown
Contributor

@kunalmnnit kunalmnnit commented May 7, 2026

Summary

  • UnresolvedUnionException previously included the full toString() of the unresolved datum in its exception message. When this exception propagates to generic error handlers or gets logged, the datum value — which may contain sensitive user data — gets written to application logs.
  • Replace the datum's toString() with its class name in the exception message. The actual datum object remains accessible via getUnresolvedDatum() for callers that need programmatic access.

Motivation

When an UnresolvedUnionException is thrown during serialization, it is common for upstream frameworks to catch and log the full exception message. Since the message contains the raw datum value, this can result in sensitive data being written to application logs — a data leak.

The fix is minimal and backwards-compatible:

  • The exception message now shows the type of the datum (e.g. java.lang.Integer) instead of its value
  • The getUnresolvedDatum() accessor still returns the original object for any caller that needs the actual value

Test plan

  • Updated existing test TestGenericDatumWriter.unionUnresolvedExceptionExplicitWhichField to assert new message format
  • All 15 tests in TestGenericDatumWriter pass

…sage

UnresolvedUnionException previously included the string representation
of the unresolved datum in its exception message. When this exception
propagates to generic error handlers (e.g. in Kafka Connect runtime),
the datum value — which may contain sensitive user data — gets written
to log files.

Replace the datum's toString() with its class name in the exception
message. The actual datum object remains accessible via
getUnresolvedDatum() for callers that need it for programmatic use.
@github-actions github-actions Bot added the Java Pull Requests for Java binding label May 7, 2026
@kunalmnnit kunalmnnit changed the title AVRO-4253: Avoid logging datum values in UnresolvedUnionException Avoid logging datum values in UnresolvedUnionException May 7, 2026
@kunalmnnit
Copy link
Copy Markdown
Contributor Author

cc @RyanSkraba @martin-g @iemejia — would appreciate a review on this. The change prevents sensitive datum values from leaking into exception messages (and subsequently into application logs) by replacing the datum's toString() with its class name.

@RyanSkraba RyanSkraba changed the title Avoid logging datum values in UnresolvedUnionException AVRO-4254: [Java] Avoid logging datum values in UnresolvedUnionException May 7, 2026
@RyanSkraba
Copy link
Copy Markdown
Contributor

Created AVRO-4254 for you! Yeah, this is a good idea.

@RyanSkraba RyanSkraba merged commit 96389eb into apache:main May 7, 2026
9 checks passed
RyanSkraba pushed a commit that referenced this pull request May 7, 2026
…sage (#3764)

UnresolvedUnionException previously included the string representation
of the unresolved datum in its exception message. When this exception
propagates to generic error handlers (e.g. in Kafka Connect runtime),
the datum value — which may contain sensitive user data — gets written
to log files.

Replace the datum's toString() with its class name in the exception
message. The actual datum object remains accessible via
getUnresolvedDatum() for callers that need it for programmatic use.
@RyanSkraba
Copy link
Copy Markdown
Contributor

Cherry-picked to branch-1.12.

@RyanSkraba
Copy link
Copy Markdown
Contributor

Thanks for the contribution :D I've closed the JIRA -- if you want credit in JIRA please don't hesitate to request an account, but be assured that you have the credit in the git log 😄 And don't hesitate to create other fixes if you find logs that contain user data in the log message.

@kunalmnnit
Copy link
Copy Markdown
Contributor Author

Thanks for the contribution :D I've closed the JIRA -- if you want credit in JIRA please don't hesitate to request an account, but be assured that you have the credit in the git log 😄 And don't hesitate to create other fixes if you find logs that contain user data in the log message.

Thank you for your quick review and words!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Java Pull Requests for Java binding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants