Skip to content

Some updates for the impact plot naming and style#2

Open
kdlong wants to merge 4122 commits into
bendavid:mainfrom
kdlong:impactPlots
Open

Some updates for the impact plot naming and style#2
kdlong wants to merge 4122 commits into
bendavid:mainfrom
kdlong:impactPlots

Conversation

@kdlong
Copy link
Copy Markdown

@kdlong kdlong commented Nov 18, 2024

Some changes made for the comparison of impacts probably need to be made more general.

bendavid and others added 30 commits August 19, 2024 15:39
… [0.1,0.1] for c1 and c2; Fix postfit plotting
bendavid pushed a commit that referenced this pull request Feb 3, 2026
bendavid added a commit that referenced this pull request May 27, 2026
…ar-mode

The two-stage continuity pipeline with the deterministic width smear is now the
only design. Removes ~1560 lines of superseded code across the four jpsi mass-fit
modules.

Legacy path (the single-stage θ-conditioned forward-fold):
  * model: drop theta_conditioning, linearize_scale, detach_flow_on_data,
    fixed_theta_sampling, *_sample_center buffers, theta_*_cond_scale; remove
    nll, log_p_signal_data/mc, event_nll; the flow now conditions only on
    muon_kin (n_cond = N_MUON_KIN); _build_flow_cond folded into log_p_nominal.
  * trainer: drop _nll_step, _nll_components, _maybe_mc_only_batch,
    _epoch_metrics, _adaptive_sigma, compute_fisher_info (legacy), and
    _train_loop_legacy; drop --stage legacy, --linearize-scale, --mc-only,
    --detach-flow-on-data, --fixed-theta-sampling, the θ-sampling noise/
    adaptive-σ args, and the legacy θ-lr args.
  * diagnostics: drop the θ-conditioned _flow_density_on_grid and the legacy
    fold branch; the signal grid is always the #2 direct-eval tilt and the MC
    template is always the per-muon physical fold.

Convolution smear-mode (the stochastic GH mass convolution):
  * width is now the only smear: drop smear_mode, _continuity_logp_conv,
    _source_rho_std, _gh_nodes, sigma_qop_pm, _smear_per_event, _softplus_inv,
    the softplus θ_smear init (→ 0) and effective_theta_smear softplus branch;
    _continuity_response returns only the advection s_adv (no diffusion V);
    _continuity_logp is the single deterministic width-fold density.
  * trainer/diagnostics: drop --smear-mode, --continuity-n-gh, --smear-init-a/c
    and all n_gh plumbing.

The per-muon qop folding scheme for the validation plots is kept
(fold_sigma_qop_pm reads the signed width coeffs clipped at 0).

Verified end-to-end on synthetic data: --stage flow → fit → diagnostics (all
closure/θ plots) → uncertainties (observed + empirical Fisher) all run finite.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
bendavid added a commit that referenced this pull request May 28, 2026
The continuity-density Jacobian was capped by a hard `log(G'.clamp_min(0.05))`,
whose log is FLAT past the floor — a gradient-free NLL basin the optimiser was
escaping into at large |η| (V ∝ c/sin²θ large, the linearised one-step Euler
Jacobian goes negative, the floor activates, the fit drifts to large negative
θ_c with no cost). unbinnedcal22/testvalidation5 showed θ_c → −6.6 at |η|>1.4
for an injection that should give θ_c=+10.

Two Jacobian forms, controlled by --jacobian-form / model.jacobian_form:

softlog (default): log(G'.clamp_min(floor)) + (floor−G')⁺²/(2·floor²) — exact
log in the physical region, plus a C¹ quadratic BARRIER past the floor (value
and slope zero at the seam; grows quadratically below it). Bounded from below
by log(floor) so the density can't diverge to ±∞ from a degenerate G'.
Restores the gradient (slope ≈ −(floor−G')/floor² past the floor) so the
optimiser is *pulled back* toward physical G' > 0 instead of parked in the
floored basin. NLL scan on the same checkpoint+injection now argmin's at
θ_c = +2 in the forward bin (previously: drifted to large negative).

exp (option): frozen-score continuous-flow approximation log G' =
log(1+s_adv'(m')) − V·∂²_m log p₀(m')/2, no floor. Always finite — the
closed-form N→∞ Euler limit assuming the score is constant along the
trajectory. Caveats: different operator approximation than softlog's exact
N-step Jacobian (extra "frozen-score" assumption — breaks near sharp peak
features); rewards unphysical sharpening UNBOUNDEDLY by −log G' = +V·∂²/2 (no
natural cap), so issue #2 is not as well-defended as with softlog's barrier.
NLL scan still argmin's at +2 but with a much shallower minimum and θ_c=−10
still scoring NLL = −0.74 (vs softlog's +0.04 penalty there). Use only when
the V-too-large breakdown (issue #1) is the dominant concern.

Adopted in _build_model, _load_full_fit, diagnostics' build. SMEAR_GP_FLOOR
unchanged (legacy 0.05); now used by softlog as the barrier seam, ignored by
exp. Backward-compatible: old checkpoints without `jacobian_form` in args
default to "softlog" via getattr fallback.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
bendavid added a commit that referenced this pull request May 28, 2026
… singularities

`--jacobian-form exp` crashed with `non-finite loss [fit] epoch 1` at high LR
(--fit-scale-lr 0.9 --fit-smear-lr 0.9): a single event whose source mass m'
strayed into the flow's tail produced an inf/NaN in ∂²log p₀, and even when
that was contained, the exp form's unbounded sharpening reward (log_Gp = −V·∂²
/2 with no floor) drove log_p_theta = logp0 − log_Gp huge positive on
overshoot events → downstream `.exp()` in data_nll_continuity overflowed →
mixture density inf → NaN loss → trainer abort.

Two guards (both no-ops in the softlog path):

1. `_log_jacobian_exp`: `nan_to_num` on both log_J_scale and log_J_smear so
   the (small minority of) boundary events with inf/NaN d2 contribute 0 to
   the Jacobian instead of poisoning the batch. The exp form exposes d2 = ∂²
   log p₀ directly; the softlog autograd Gp absorbs the singularity through
   the chain rule.

2. `_continuity_logp`: cap `log_p_theta` from ABOVE at +50 (the physical
   log-density on the m-window scale is well inside [−30, +5], so the cap
   leaves the operating regime untouched while preventing exp() overflow
   downstream). No lower cap — that would kill the softlog barrier's
   pull-back gradient; .exp() of a very-negative log_p_theta underflows
   cleanly to 0 and the mixture collapses to the Bernstein background.
   nan_to_num catches any residual NaN.

Verified: both forms now survive 7/7 batches at lr=0.9 (the user's config)
without non-finite loss. The exp form still drifts θ_c → −0.6 at this lr (its
unbounded sharpening reward + lr=0.9 ≫ softlog at the same lr → −1.9 with
the barrier counter-force); use --jacobian-form softlog (the default) for
proper #2 defence at high LR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
bendavid added a commit that referenced this pull request May 28, 2026
…ty on (a, c)

Add a positivity reparameterisation for θ_smear, selectable via
`--smear-param-form` / `model.smear_param_form`:

* `linear` (default): the O(1) θ_smear is the coefficient directly — signed,
  supports both broadening (V>0) and unsmear (V<0). Backward-compatible.

* `softplus`: each of (a, c) INDIVIDUALLY constrained to ≥ 0 via
  `physical = softplus(θ)·SMEAR_VAR_SCALE`. Use this to defend against the
  negative-c drift (issue #2) by construction, at the cost of losing the
  two-sided fit (the model can no longer represent MC that's too broad vs
  data).

The per-η fit mask is applied AFTER the transform in the single accessor
`_smear_raw_to_effective`, so frozen parameters (smear_fit_params='a'/'c',
or smearing_enabled=False) and inactive coefficients evaluate to EXACTLY
zero in the per-muon σ_qop and all downstream transformations (the
continuity density forward map, _continuity_logZ via _smear_per_event_linear,
the fold via _qop_var_pm, effective_theta_smear for plots/Fisher). Verified:
under softplus + smear_fit_params='c', the frozen 'a' column is bit-exact 0
regardless of the raw θ_smear[:,0] value; the active 'c' column stays ≥ 0
across all η bins through training.

Fisher σ on the smear: the σ_eff conversion now applies the delta-method
Jacobian — identity for `linear` (unchanged), sigmoid(θ̂) for `softplus`
(matches `d softplus/dθ`). The param_space label records the actual form.

Wired through _build_model, _load_full_fit's adopt list, and the diagnostics
build with safe `getattr(..., "linear")` fallbacks so old checkpoints
default to the original (linear) form.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
bendavid added a commit that referenced this pull request May 28, 2026
…osure mode

Two related fixes for the forward-|η| smear-collapse / bkg-absorption pathology
the validation closures were showing:

1) Post-injection window cut (loader). The injection adds the per-muon qop
   Gaussian kick to m_ll; for a c≈5e-5 smear ~3% of |η|>1.8 pseudo-data
   events were pushed outside [m_lo, m_hi]. The loader was passing those
   through, and downstream the Bernstein-d1 basis evaluated to NEGATIVE
   values (linear u-extrapolation), the mixture `f₀p₀ + f₁p₁ + f_s p_s`
   could go negative, `log(p_mix.clamp_min(eps))` floored at ~+log(1/eps) ≈
   +69 per event, AND the MLP had a strong incentive to grow f_bkg so the
   Bernstein extrapolation (large positive on the "other side") would absorb
   them at finite NLL — directly feeding the f_bkg=38% at |η|>1.8 problem.
   Fix: in `_batch_tensors`, zero the per-event weight for events outside
   [stats.m_lo, stats.m_hi] after the injection is applied. For real data the
   snapshot already cut to the window, so this is a no-op; for injected MC
   it drops ~0.06% central / ~3% forward — small statistical loss, removes a
   real systematic. Verified: 415/415 out-of-window pseudo-data events from a
   c=5e-5 injection now have weight 0 (323 of them in |η|∈[1.8,2.4]).

2) --no-background (model + trainer). Even with the window cut, the in-window
   tails of the broadened signal still degenerate with the Bernstein bkg at
   forward |η|: the model can broaden (positive c, pays −log G' per event)
   OR let the MLP grow f_bkg (no Jacobian cost). The optimiser prefers the
   bkg "free lunch" and the smear collapses to 0. For validation closures the
   truth is f_bkg=0 by construction (MC pseudo-data is signal-only) — there's
   no physical degree of freedom for the MLP to represent. Add
   `background_enabled: bool = True` to the model; when False,
   `data_nll_continuity` reduces to `−log p_signal` (MLP and Bernstein both
   bypassed), and `train_stage2` / `run_bootstrap_continuity` skip the MLP
   parameter group and freeze its grads. CLI flag `--no-background` (default
   OFF so real-data fits are unchanged). Diagnostics' f_data overlay reads
   the model flag and forces f ≡ [0, 0, 1] when disabled instead of showing
   the random-init MLP output. Verified: with bkg disabled, scrambling the
   MLP parameters to 1e6 still gives a finite per-event NLL — the data
   branch is genuinely MLP-independent.

The two effects compound and showed up together (cal22/testvalidation7: 38%
f_bkg + θ_c→0 at |η|>1.8); they're also independent — #1 fixes the
injection-induced contribution and is correct for any injection magnitude; #2
fixes the underlying degeneracy in the validation setup. Use both together
for validation closures of large smearings.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants