Skip to content

Improve table-style forest plot rendering and subgroup support#122

Open
takua624 wants to merge 18 commits intoLSYS:mainfrom
takua624:main
Open

Improve table-style forest plot rendering and subgroup support#122
takua624 wants to merge 18 commits intoLSYS:mainfrom
takua624:main

Conversation

@takua624
Copy link
Copy Markdown

@takua624 takua624 commented Apr 2, 2026

Summary of changes

  • Fixed a row-rendering issue where some rows could be cut off or omitted depending on which annotation columns were displayed.
  • Fixed a header-rendering issue where annoteheaders could be present internally but not actually appear in the final plot.
  • Reworked y-axis row placement so plotting no longer depends on Matplotlib’s auto-generated categorical ticks, which could become inconsistent with the number of dataframe rows.
  • Fixed cases where duplicate y-label values caused rows to collapse or disappear silently.
  • Improved behavior for all-NaN inputs so the function can return an empty plot gracefully instead of failing.
  • Added support for weight-scaled markers, using marker size to reflect study weight more directly.
  • Added subtotal diamonds and subtotal statistics display for grouped analyses.
  • Added arrows for confidence intervals that extend beyond the plotting range.
  • Fixed x-axis/tick handling for cases where log scaling is needed even when the plot type is not captured cleanly by the previous MH/IV-based logic.

Motivation

The main motivation was to make the package more reliable for dense, publication-style forest plots where:

  • the number of plotted rows must match the dataframe exactly,
  • headers and subgroup summaries need to be displayed consistently,
  • duplicated study labels should not cause silent row loss,
  • and edge cases such as empty/all-NaN inputs should fail gracefully.

In my use case, these issues became apparent when generating large batches of forest plots from structured meta-analysis data. In particular, relying on auto-generated y-ticks or inferred axis limits was fragile when headers, duplicate labels, or unusual annotation columns were present. My changes make row placement more explicit and deterministic, which substantially improves robustness for complex plots

Notes

Some of these changes were motivated by reproducing real-world Cochrane forest plots, including subgroup summaries and weighted study markers. The fixes around missing headers, row/tick mismatches, duplicate labels, empty-plot handling, weight-based marker sizing, subtotal diamonds, subtotal statistics, and out-of-range CI arrows all came directly from those use cases.

takua624 added 18 commits July 13, 2025 13:59
determine ylim to actually reflect the number of rows needed to display the header, especially when there are few rows in the dataframe (e.g., 3 rows)
Handle dataframes with null values.
Add a parameter to enable drawing marker sizes proportional to study weights.
Let user specify which row(s) contains subtotal information, and draw a horizontal diamond to indicate the CI of the subtotal stats, rather than square&whiskers.
allows the user to specify stats of the total effect, and show underneath the total row.
ignore capitalization for rows containing subtotal stats and info
riginally, the y is specified as dataframe[yticklabel]. This works until there are duplicate values in the yticklabel column. In this case, pyplot skips the duplicated values without yielding any warning. When plotting, we should always specify numerical x-y coordinates!!!
enable color flagging rows with suspicious values
enable consistent padding width across rendereres
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant