AVRO-4254: [Java] Avoid logging datum values in UnresolvedUnionException#3764
Conversation
…sage UnresolvedUnionException previously included the string representation of the unresolved datum in its exception message. When this exception propagates to generic error handlers (e.g. in Kafka Connect runtime), the datum value — which may contain sensitive user data — gets written to log files. Replace the datum's toString() with its class name in the exception message. The actual datum object remains accessible via getUnresolvedDatum() for callers that need it for programmatic use.
|
cc @RyanSkraba @martin-g @iemejia — would appreciate a review on this. The change prevents sensitive datum values from leaking into exception messages (and subsequently into application logs) by replacing the datum's toString() with its class name. |
|
Created AVRO-4254 for you! Yeah, this is a good idea. |
…sage (#3764) UnresolvedUnionException previously included the string representation of the unresolved datum in its exception message. When this exception propagates to generic error handlers (e.g. in Kafka Connect runtime), the datum value — which may contain sensitive user data — gets written to log files. Replace the datum's toString() with its class name in the exception message. The actual datum object remains accessible via getUnresolvedDatum() for callers that need it for programmatic use.
|
Cherry-picked to branch-1.12. |
|
Thanks for the contribution :D I've closed the JIRA -- if you want credit in JIRA please don't hesitate to request an account, but be assured that you have the credit in the git log 😄 And don't hesitate to create other fixes if you find logs that contain user data in the log message. |
Thank you for your quick review and words! |
Summary
UnresolvedUnionExceptionpreviously included the fulltoString()of the unresolved datum in its exception message. When this exception propagates to generic error handlers or gets logged, the datum value — which may contain sensitive user data — gets written to application logs.toString()with its class name in the exception message. The actual datum object remains accessible viagetUnresolvedDatum()for callers that need programmatic access.Motivation
When an
UnresolvedUnionExceptionis thrown during serialization, it is common for upstream frameworks to catch and log the full exception message. Since the message contains the raw datum value, this can result in sensitive data being written to application logs — a data leak.The fix is minimal and backwards-compatible:
java.lang.Integer) instead of its valuegetUnresolvedDatum()accessor still returns the original object for any caller that needs the actual valueTest plan
TestGenericDatumWriter.unionUnresolvedExceptionExplicitWhichFieldto assert new message formatTestGenericDatumWriterpass