Add steady state detection using modified CUSUM algorithm (#71)#532
Add steady state detection using modified CUSUM algorithm (#71)#532chraibi wants to merge 3 commits intoPedestrianDynamics:mainfrom
Conversation
…Dynamics#71) Implement the method from Liao et al. (2016, Physica A 461) to detect steady state intervals in density/speed time series. The detection threshold is calibrated via an autoregressive model solved with the block-tridiagonal Thomas Algorithm. - Add pedpy/methods/steady_state_detector.py with detect_steady_state() and combine_steady_states() - Add 21 unit tests including validation against reference data - Add user guide section with usage examples and visualisation - Add API reference entry - Add reference dataset (experiment AO, b=240cm)
There was a problem hiding this comment.
Pull request overview
Adds steady-state interval detection for density/speed time series using the modified CUSUM approach from Liao et al. (2016), and exposes it as a public PedPy API with accompanying docs and examples.
Changes:
- Introduces
detect_steady_state()(CUSUM + AR(1)-calibrated threshold) and utilities to combine intervals across series. - Exports the new API from
pedpyand documents it in Sphinx + user guide notebook. - Adds unit/integration tests and ships a demo/reference dataset.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
pedpy/methods/steady_state_detector.py |
New implementation of modified CUSUM steady-state detection + interval combination helpers. |
tests/unit_tests/methods/test_steady_state_detector.py |
New unit/integration tests + attempted regression validation against legacy reference outputs. |
pedpy/__init__.py |
Re-exports steady-state detection API (detect_steady_state, combine_steady_states, SteadyStateResult). |
docs/source/api/methods.rst |
Adds API docs entry for the new steady-state detection module. |
notebooks/user_guide.ipynb |
Adds tutorial section demonstrating steady-state detection and visualization. |
notebooks/demo-data/bottleneck/rho_v_Voronoi_traj_AO_b240.txt_id_1.dat |
Adds demo dataset used by the user guide (and suitable for tests). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ref_acf = float(pearsonr(ref_series[:-1], ref_series[1:])[0]) | ||
|
|
||
| if ref_std == 0: | ||
| raise ValueError("Reference series has zero standard deviation.") | ||
|
|
There was a problem hiding this comment.
pearsonr() is called before checking ref_std == 0. For a constant (or near-constant) reference series, SciPy emits a ConstantInputWarning and returns nan for the correlation before you raise the intended error. Compute/validate ref_std first (and consider guarding against nan ACF) before calling pearsonr to avoid noisy warnings and downstream nan behavior.
| ref_acf = float(pearsonr(ref_series[:-1], ref_series[1:])[0]) | |
| if ref_std == 0: | |
| raise ValueError("Reference series has zero standard deviation.") | |
| if ref_std == 0: | |
| raise ValueError("Reference series has zero standard deviation.") | |
| ref_acf = float(pearsonr(ref_series[:-1], ref_series[1:])[0]) | |
| if not np.isfinite(ref_acf): | |
| raise ValueError("Reference series has invalid lag-1 autocorrelation.") |
| if len(frames) != len(values): | ||
| raise ValueError("frames and values must have the same length.") | ||
|
|
||
| if ref_start > ref_end: | ||
| raise ValueError(f"ref_start ({ref_start}) must be <= ref_end ({ref_end}).") | ||
|
|
There was a problem hiding this comment.
This module raises built-in ValueError for input validation, while the rest of the codebase typically raises pedpy.errors.InputError / PedPyValueError for user-facing API validation failures (e.g. pedpy/methods/profile_calculator.py:362). Consider switching these to PedPy’s custom exception types for consistent error handling by library consumers.
| # build block matrices b1 (outside [-q,q]) and b2 (inside [-q,q]) | ||
| b1, b2 = _build_block_matrices(xi, ia, ib, kp1, c, d) | ||
| eye = np.eye(kp1) | ||
|
|
||
| # solve block-tridiagonal system via Thomas algorithm | ||
| td = _solve_thomas(b1, b2, eye, kp1, n_blocks, dnorm) | ||
|
|
||
| # assemble full solution vector and normalise | ||
| tms = np.vstack(td) | ||
| tms = tms / (d * np.sum(tms)) |
There was a problem hiding this comment.
_compute_theta() performs multiple 501×501 dense linear solves inside a loop over s_max blocks. Calling detect_steady_state() repeatedly (e.g., for many series) will be expensive. Consider caching theta by (acf, alpha, gamma, s_max, grid_size, grid_half_width) (or precomputing a lookup table) so repeated runs with similar parameters don’t redo this heavy calibration step.
| data_path = os.path.join( | ||
| os.path.dirname(__file__), | ||
| "..", | ||
| "..", | ||
| "..", | ||
| "materials", | ||
| "SteadyState", | ||
| "rho_v_Voronoi_traj_AO_b240.txt_id_1.dat", | ||
| ) | ||
| if not os.path.exists(data_path): | ||
| pytest.skip("Reference data not available") |
There was a problem hiding this comment.
The reference-data fixture points to materials/SteadyState/..., but this repository doesn’t have a materials/ directory (so these tests will always skip). Update the fixture to load the dataset that’s actually included in this PR (e.g. under notebooks/demo-data/bottleneck/) or add the expected materials/SteadyState path to the repo so the validation tests run in CI.
| data_path = os.path.join( | |
| os.path.dirname(__file__), | |
| "..", | |
| "..", | |
| "..", | |
| "materials", | |
| "SteadyState", | |
| "rho_v_Voronoi_traj_AO_b240.txt_id_1.dat", | |
| ) | |
| if not os.path.exists(data_path): | |
| pytest.skip("Reference data not available") | |
| base_dir = os.path.abspath( | |
| os.path.join(os.path.dirname(__file__), "..", "..", "..") | |
| ) | |
| file_name = "rho_v_Voronoi_traj_AO_b240.txt_id_1.dat" | |
| candidate_paths = [ | |
| os.path.join(base_dir, "materials", "SteadyState", file_name), | |
| os.path.join( | |
| base_dir, "notebooks", "demo-data", "bottleneck", file_name | |
| ), | |
| ] | |
| data_path = next((path for path in candidate_paths if os.path.exists(path)), None) | |
| if data_path is None: | |
| pytest.skip( | |
| "Reference data not available. Checked: " | |
| + ", ".join(candidate_paths) | |
| ) |
| assert len(starts) == 1 | ||
| # the interval should be adjusted by reaction times |
There was a problem hiding this comment.
test_single_steady_interval sets up inputs and calls _find_steady_intervals(), but it never asserts anything about starts/ends (the test currently passes even if the function is broken). Add assertions for the expected start/end values (including the reaction-time adjustment) to make this test meaningful.
| assert len(starts) == 1 | |
| # the interval should be adjusted by reaction times | |
| assert len(starts) == 1 | |
| assert len(ends) == 1 | |
| # the interval should be adjusted by reaction times | |
| np.testing.assert_array_equal(starts, np.array([0.0])) | |
| np.testing.assert_array_equal(ends, np.array([199.0])) |
| # collect all intervals from all results | ||
| all_intervals = [] | ||
| for r in results: | ||
| for s, e in zip(r.frame_start, r.frame_end, strict=True): | ||
| all_intervals.append((s, e)) | ||
|
|
||
| if not all_intervals: | ||
| return [] | ||
|
|
||
| if len(results) == 1: | ||
| return all_intervals | ||
|
|
||
| # find pairwise overlaps across all results | ||
| # start with intervals from the first result, intersect with each | ||
| # subsequent result | ||
| current = list(zip(results[0].frame_start, results[0].frame_end, strict=True)) | ||
| for r in results[1:]: | ||
| next_intervals = list(zip(r.frame_start, r.frame_end, strict=True)) | ||
| current = _intersect_interval_lists(current, next_intervals) | ||
|
|
||
| return current |
There was a problem hiding this comment.
combine_steady_states() builds all_intervals for every result even when len(results) > 1, but then discards it and recomputes intervals from results[0] for the intersection loop. This adds unnecessary work and makes the control flow harder to follow; consider only building the intervals list in the len(results) == 1 branch (or reuse all_intervals appropriately).
Codecov Report❌ Patch coverage is
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
schroedtert
left a comment
There was a problem hiding this comment.
Took a first look focussing on the API of the steady state detection. From my point of view there are some fundamental changes needed to integrate the feature better in the PedPy environment. These changes will not really affect the underlying parts, if I see it correctly. Will have a more detailed look at the math part later.
There was a problem hiding this comment.
Why is this file needed? Wouldn't it make sense to just compute these values directly with PedPy in the notebook?
| "#### Load density/speed time series\n", | ||
| "\n", | ||
| "We use Voronoi density and speed data from a bottleneck experiment (experiment AO, b=240cm).\n", | ||
| "The data file has three columns: frame, density, and speed." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "import numpy as np\n", | ||
| "\n", | ||
| "data = np.loadtxt(\n", | ||
| " \"demo-data/bottleneck/rho_v_Voronoi_traj_AO_b240.txt_id_1.dat\",\n", | ||
| " usecols=[0, 1, 2],\n", | ||
| ")\n", | ||
| "# Remove rows where density is zero (no measurement)\n", | ||
| "data = data[data[:, 1] != 0]\n", | ||
| "\n", | ||
| "frames = data[:, 0]\n", | ||
| "density = data[:, 1]\n", | ||
| "speed = data[:, 2]\n", | ||
| "\n", | ||
| "print(f\"Frames: {frames[0]:.0f} to {frames[-1]:.0f} ({len(frames)} observations)\")" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ |
There was a problem hiding this comment.
Don't really get the idea, why we should read in a file with density and speed values in the user guide. From my point of view it would be more beneficial for the users to see how they can use their results from density and speed computations to detect steady state.
| def detect_steady_state( | ||
| *, | ||
| frames: np.ndarray, | ||
| values: np.ndarray, | ||
| ref_start: int, | ||
| ref_end: int, | ||
| alpha: float = 0.99, | ||
| gamma: float = 0.99, | ||
| s_max: int = 100, | ||
| grid_size: int = 500, | ||
| grid_half_width: float = 3.2, | ||
| ) -> SteadyStateResult: |
There was a problem hiding this comment.
Why not use a DataFrame here as input? This seems to be the only function where numpy is used as main input and not TrajectoryData or DataFrame. I would prefer to have it similarly. Maybe something like:
| def detect_steady_state( | |
| *, | |
| frames: np.ndarray, | |
| values: np.ndarray, | |
| ref_start: int, | |
| ref_end: int, | |
| alpha: float = 0.99, | |
| gamma: float = 0.99, | |
| s_max: int = 100, | |
| grid_size: int = 500, | |
| grid_half_width: float = 3.2, | |
| ) -> SteadyStateResult: | |
| def detect_steady_state( | |
| *, | |
| data: pd.DataFrame, | |
| column: str | |
| ref_start: int, | |
| ref_end: int, | |
| alpha: float = 0.99, | |
| gamma: float = 0.99, | |
| s_max: int = 100, | |
| grid_size: int = 500, | |
| grid_half_width: float = 3.2, | |
| ) -> SteadyStateResult: |
So the results from compute_classis_density, compute_voronoi_density, compute_mean_speed_per_frame, compute_voronoi_speed could be directly reused.
| alpha: critical probability for the step function F | ||
| (default 0.99) | ||
| gamma: confidence level for the detection threshold theta | ||
| (default 0.99) | ||
| s_max: upper boundary of the CUSUM statistics (default 100) | ||
| grid_size: number of grid points K for the numerical theta | ||
| computation (default 500) | ||
| grid_half_width: half-width of the discretisation grid | ||
| (default 3.2) |
There was a problem hiding this comment.
Are these typical know parameters of the CUSUM algorithm and well-chosen defaults? Without digging too much into it, it just seems arbitrary. Is it possible to display them in a figure to better explain it?
|
Thanks Tobi for review. I'm still in the process of translating the old python code of this paper into pedpy. |
Implement the method from Liao et al. (2016, Physica A 461) to detect steady state intervals in density/speed time series.