HDDS-11234. [doc] Manage Netty native memory consumption#448
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new troubleshooting document to help operators diagnose and cap Netty native (direct) memory usage in Ozone daemons (notably DataNode and S3 Gateway), addressing scenarios where RSS grows significantly beyond the JVM heap due to Netty’s pooled direct-buffer allocator.
Changes:
- Documented symptoms and root cause of high native/direct memory usage (Netty pooled allocator behavior).
- Added recommended mitigation via Netty system properties (both unshaded Netty and Ratis-shaded Netty) and how to configure them through
ozone-env.shenv vars. - Provided initial sizing guidance and pointed to relevant Ozone
NettyMetricsfor observability.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| trigger `OutOfDirectMemoryError` under load; that is the signal the | ||
| cap is too tight. Watch the daemon log and back off if you see it. | ||
|
|
||
| The `NettyMetrics` source on each Ozone daemon (see |
There was a problem hiding this comment.
We track NettyMetrics in this dashboard https://github.com/apache/ozone/blob/master/hadoop-ozone/dist/src/main/compose/common/grafana/dashboards/Ozone%20-%20OM%20Overview.json but it's OM only. The Ozone memory dashboard https://github.com/apache/ozone/blob/master/hadoop-ozone/dist/src/main/compose/common/grafana/dashboards/Ozone%20-%20Memory%20Consumption%20Metrics.json does not include NettyMetrics
What changes were proposed in this pull request?
Companion doc of apache/ozone#10354
What is the link to the Apache Jira?
apache/ozone#10354
How was this patch tested?