Hinweis: Vage Einträge ohne messbares Ziel, Interface-Spezifikation oder Teststrategie mit  markieren.

Analytics Module - Future Enhancements

Version: 1.8.0 Status: 📋 Active Last Updated: 2026-06-09 Module Path: src/analytics/

Scope

The analytics module provides the full pipeline from raw event ingestion to insight delivery: OLAP aggregation (SUM/AVG/MIN/MAX/STDDEV/PERCENTILE over columnar data), streaming window operators (tumbling, sliding, session, hop), a Complex Event Processing (CEP) engine with NFA-based pattern matching, incremental materialized view maintenance (IVM), time-series forecasting (ARIMA/Yule–Walker/Holt–Winters), multi-algorithm anomaly detection (Isolation Forest, Z-Score, LOF, ensemble), AutoML model training and serving, process mining, NLP analysis, Arrow/Parquet export, Arrow Flight RPC, and distributed shard-based aggregation. Twelve .cpp implementation files are covered below; all identified issues reference exact file names and function names.

Design Constraints

[x] std::lock_guard / std::unique_lock must never be held across user callbacks, network I/O, or O(N²) computation – all identified cases resolved (CEP timerLoop, StreamingAnomalyDetector, ModelServingEngine, MLServingEngine, IncrementalView)
[x] AVX-512 and ARM NEON kernel results must be bit-identical (tolerance ≤ 1 ULP) to the scalar baseline on the same input dataset
[ ] Streaming aggregation peak memory must not exceed 512 MB per active window; enforced via compile-time configurable hard cap
[x] IVM delta-application latency must be ≤ 50 ms for batches ≤ 10 000 rows; applyChanges() must not hold its exclusive lock for the full batch
[x] ExporterFactory::createExporter(format) must return a format-specific exporter, not the universal StubAnalyticsExporter for every format
[ ] Windows platform build stubs in olap.cpp and process_mining.cpp must be replaced by real cross-platform implementations before v2.0.0
[x] All background loops (expiryLoop, timerLoop, workerLoop, metricsLoop) honour stop signals via condition variables — CEPEngine::metricsLoop() uses metrics_cv_.wait_for with stop predicate
[ ] No dynamic memory allocation inside SIMD hot loops; intermediate buffers must be pre-allocated in Impl structs

Required Interfaces

Interface	Consumer	Notes
`ExporterFactory::createExporter(format)` → `IFormatExporter`	Export pipeline	Must dispatch to Arrow IPC / Parquet / Feather exporter, not always `StubAnalyticsExporter`
`IncrementalView::applyChanges(batch)`	Storage CDC pipeline	Needs batch-split to bound lock-hold duration
`StreamingAnomalyDetector::process(point)`	Real-time alerting	Must perform training outside `mu_` lock
`ModelServingEngine::predict(name, version, point)`	Query executor	Inference must run outside the registry shared-lock
`CEPEngine::timerLoop()`	CEP runtime	Window callbacks must be dispatched after lock release
`DistributedAnalyticsSharding::getHealthyShardCount()`	Health dashboard	Network I/O must not run under `mutex_`
`LLMProcessAnalyzer::Impl::putInCache(key, response)`	LLM integration	✅ Fixed v1.8.0: O(N) eviction replaced with O(1) LRU (doubly-linked list + hash map); SHA256 cache key; max_cache_entries in LLMConfig
`AutoMLModel::KNNRegressorModel::predictOneReg(x)`	AutoML serving	Stub `return 0.0` must be replaced with real k-NN regression
`OLAPEngine` (Windows)	Cross-platform build	Full implementation needed; current stub emits warnings and returns empty results
`ProcessMining` (Windows)	Cross-platform build	Stub returns `Status::Error` for every operation

Planned Features

1 · ExporterFactory Stub Replacement

Priority: High Target Version: v1.8.0 Files: src/analytics/analytics_export.cpp lines 728–734

ExporterFactory::createExporter(ExportFormat) and createDefaultExporter() both return std::make_unique<StubAnalyticsExporter>() unconditionally. The comment on line 728 reads "For now, return stub exporter for all formats – In the future, this would return format-specific exporters". The StubAnalyticsExporter class itself (line 203) delegates to exportToFileArrow() only when THEMIS_HAS_ARROW is set, and for all three Arrow formats falls through to a NOT_SUPPORTED status when Arrow is absent, but the factory never instantiates any specialised class regardless.

Implementation Notes:

[x] Introduce ArrowIPCExporter, ParquetExporter, and FeatherExporter classes that wrap the existing exportToFileArrow() logic – remove dead StubAnalyticsExporter wrapper
[x] Rename StubAnalyticsExporter to JSONCSVExporter to reflect its actual capability scope
[x] createExporter(ExportFormat) must switch on format and return the correct concrete type; formats unavailable without Arrow must return std::unexpected / throw std::runtime_error with a clear message instead of silently returning the fallback
[x] Add unit test that asserts createExporter(ExportFormat::FMT_ARROW_PARQUET) returns a non-stub type when THEMIS_HAS_ARROW is defined
[x] Suppress the 6 Stubs annotation in the file header once all stubs are promoted to real implementations

Performance Targets:

Parquet export of 1 M rows: ≤ 2 s wall time with snappy compression on a single core
CSV export of 1 M rows: ≤ 500 ms (streaming write, no full in-memory serialization)

2 · Lock Held Across User Callbacks in `CEPEngine::timerLoop()`

Priority: High Target Version: v1.8.0 Files: src/analytics/cep_engine.cpp lines 1071–1095

WindowManager::timerLoop() acquires windows_mutex_ (line 1079) and then immediately calls callback_(w.events, w.start, …) for every open GLOBAL window (line 1082–1084). User-supplied callbacks are arbitrary code and can perform I/O, database writes, or other blocking work. While windows_mutex_ is held, no other thread can add events, close windows, or read window state.

Implementation Notes:

[x] In timerLoop(), snapshot the callbacks and their arguments under the lock (copy event vectors and timestamps), release windows_mutex_, then invoke callbacks on the snapshot — identical to the copy-and-dispatch idiom
[x] Introduce a WindowCallbackBatch value type that carries (events_copy, start, now) to make snapshots cheap via move semantics
[x] Apply the same pattern to closeWindow() callers that invoke user callbacks while holding partition locks in cep_engine.cpp lines 428–440
[x] metricsLoop() (line 2403) uses bare std::this_thread::sleep_for(config_.metrics_interval) — replace with a condition_variable::wait_for so the thread wakes immediately on running_ = false; current implementation can delay engine shutdown by one full metrics_interval
[x] Add a regression test that calls CEPEngine::stop() and asserts it returns within 100 ms regardless of metrics_interval value

Performance Targets:

CEPEngine::stop() must return within ≤ 100 ms on all background threads

3 · `StreamingAnomalyDetector::process()` — Training Under Lock

Priority: High Target Version: v1.8.0 Files: src/analytics/anomaly_detection.cpp lines 1035–1070

StreamingAnomalyDetector::process() acquires mu_ at line 1040 and holds it for the entire execution, including:

Line 1051: std::vector<DataPoint> buf(window_.begin(), window_.end()) — full deque-to-vector copy
Line 1053: detector_.train(buf) — O(N·T) for IsolationForest (N = window size, T = trees), O(N²) for LOF
Line 1063–1064: detector_.predict(point) — model scoring while holding the lock

Every concurrent call to process() (from any producer thread) blocks for the entire training duration.

Implementation Notes:

[x] Extract a private snapshotWindow() helper that copies the deque under a brief lock scope and returns a std::vector<DataPoint> — lock is released before calling train() or predict()
[x] Gate retrain (retrain_on_window) behind an std::atomic<bool> retraining_ flag and schedule training on a dedicated background thread using std::async(std::launch::async, …) to keep process() non-blocking
[x] detector_.predict(point) is stateless once trained — hold only a std::shared_lock<std::shared_mutex> during prediction and upgrade to unique_lock only when isTrained() state changes
[x] getAnomalies() (line 1080) and getStats() (line 1085/1090) each take their own lock_guard — these are read-only accessors; use shared_lock for them
[x] Add a concurrency stress test: 8 producer threads calling process() at 100 kHz; assert P99 latency ≤ 1 ms with no deadlocks

Performance Targets:

process() lock-hold duration: ≤ 50 µs (deque copy only; training async)
Training throughput: IsolationForest on 1 000-point window ≤ 10 ms

4 · `ModelServingEngine::predict()` — Inference Under Registry Lock

Priority: High Target Version: v1.8.0 Files: src/analytics/model_serving.cpp lines 196–230

predict() acquires std::shared_lock lock(impl_->mu) (line 200) to look up the model entry in impl_->registry, then calls e.model.predictOne(point) (line 206) while still holding the shared lock. Inference is O(depth) for trees or O(k·N) for k-NN and can take several milliseconds for large ensembles. Although it is a shared lock, any concurrent registerModel() or unregisterModel() caller waiting for an exclusive lock is starved for the full inference duration. Additionally, line 211 takes e.health_mu under the outer impl_->mu — nested lock acquisition creates an implicit lock-order dependency.

Implementation Notes:

[x] Restructure predict() to: (1) take shared_lock for a brief pointer/ref capture of *it->second, (2) release shared_lock, (3) run e.model.predictOne(point) outside any registry lock, (4) take only e.health_mu for the health-metric update
[x] Use a std::shared_ptr<Entry> inside the registry so callers can retain a reference-counted handle after releasing the registry lock — eliminates the use-after-free risk from concurrent unregisterModel()
[x] Apply the same pattern to predictBatch() (line 244), explain() (line 283), and evaluate() (line 379) which exhibit the same lock-held-during-compute pattern
[x] Add a benchmark: 16 concurrent predict() callers on the same model; assert throughput ≥ 10 000 predictions/s per core

Performance Targets:

Registry lock-hold per prediction: ≤ 5 µs (pointer capture only)
Inference throughput (decision tree depth=10): ≥ 500 000 predictions/s on 8 cores

5 · `MLServingEngine::infer()` — TOCTOU Session Load + Full-Inference Lock

Priority: High Target Version: v1.8.0 Files: src/analytics/ml_serving.cpp lines 175–210

Two separate issues:

5a – TOCTOU session check: lines 178–188 take sessions_mutex, check whether the session exists and call loadSession(), then release. Lines 190–200 immediately re-acquire the same mutex and call sessions.at(req.model_name). Between the two lock acquisitions another thread can have evicted the session, causing sessions.at() to throw.

5b – ONNX inference under global mutex: lines 190–210 hold sessions_mutex for the entire ONNX Run() call, serializing all model inferences regardless of which model is targeted.

Implementation Notes:

[x] Replace the double-lock pattern with a single lock acquisition that obtains a shared_ptr<OrtSession> reference (or equivalent), then releases the mutex before calling ONNX Run()
[x] Move the session map from std::map<string, unique_ptr<Session>> to std::map<string, shared_ptr<Session>> so per-model handles can be retained outside the map lock
[x] Per-model std::shared_mutex (or std::atomic<bool> loading_) to serialize concurrent loads of the same model without blocking unrelated models
[x] Add test: two threads simultaneously infer on two different models; assert neither blocks the other

Performance Targets:

Lock-hold per inference call: ≤ 5 µs (handle capture only)
Two independent-model inferences: must proceed concurrently with no serialization

6 · `IncrementalView::applyChanges()` — Exclusive Lock for Entire Batch

Priority: High Target Version: v1.8.0 Files: src/analytics/incremental_view.cpp lines 325–400

applyChanges(const std::vector<ChangeRecord>& changes) acquires unique_lock lk(rw_mutex_) at line 325 and holds it for the entire iteration over changes, which may contain thousands of records. Concurrent readers (query() at line 371 uses shared_lock) are blocked for the full batch duration, violating the 50 ms IVM constraint when batches exceed a few hundred rows under load.

applyChange() (single-record path, line 284) exhibits the same pattern: the unique lock spans passesBaseFilters(), applyRow(), and pruneEmptyGroup(), all of which involve unordered_map lookups and string parsing.

Implementation Notes:

[x] In applyChanges(), process changes in micro-batches of ≤ 256 rows: acquire unique_lock, apply micro-batch, release, yield with std::this_thread::yield(), repeat — readers can slip in between micro-batches
[x] Pre-compute passesBaseFilters() outside the write lock using a read-only snapshot of def_ (immutable after construction); only applyRow() and pruneEmptyGroup() need the exclusive lock
[x] Add a read-latency regression test: background writer calls applyChanges(10 000 rows) while a reader thread calls query() in a tight loop; assert reader P99 ≤ 10 ms

Performance Targets:

Reader P99 latency during a 10 000-row batch apply: ≤ 10 ms
applyChanges() throughput: ≥ 200 000 rows/s

7 · `LLMProcessAnalyzer` — O(N) Cache Eviction Under Lock

Priority: Medium Target Version: v1.8.0 Files: src/analytics/llm_process_analyzer.cpp lines 93–115, 515–530

7a – O(N) eviction: putInCache() (line 93) holds cache_mutex and scans all 1 000 entries linearly to find the one with the earliest expiry (lines 105–112). Under high LLM call rates this becomes a serialization bottleneck. The hard-coded limit 1000 (line 105) is not configurable from LLMConfig.

7b – Expensive cache-key serialization: getCacheKey() (line 515) calls request.process_trace.dump() which serializes the full nlohmann::json object to a string on every call — even for cache hits. For large process traces (hundreds of events) this can take several milliseconds in the hot request path.

Implementation Notes:

[x] Replace std::unordered_map<string, CacheEntry> + manual linear eviction with an LRUCache<string, nlohmann::json> backed by a doubly-linked list and hash map, giving O(1) get/put/evict — the pattern already proposed in the OLAP section above, or a simple boost::compute::detail::lru_cache adapter
[x] Expose max_cache_entries in LLMConfig (default 1 000) so operators can tune without recompiling
[x] In getCacheKey(), compute a SHA256 digest of request.process_trace.dump() rather than embedding the full dump string in the key — the key comparison and hash-map lookup are O(1) for fixed-size SHA256 digests; key building is still O(trace_size) once per request, but large JSON blobs no longer live inside the map keys
[x] Add a microbenchmark: putInCache() with 1 000 existing entries must complete in ≤ 1 µs

Performance Targets:

putInCache() / getFromCache(): O(1) amortised, ≤ 1 µs P99 under 16 concurrent callers
getCacheKey() for a 500-event trace: ≤ 50 µs (hash-based, not JSON dump)

8 · `DistributedAnalyticsSharding::getHealthyShardCount()` — Network I/O Under Lock

Priority: Medium Target Version: v1.8.0 Files: src/analytics/distributed_analytics.cpp lines 317–325

getHealthyShardCount() acquires mutex_ (line 317) and calls e.executor->isHealthy() for every shard entry (line 321). ShardQueryExecutor::isHealthy() is a virtual call on a remote executor abstraction — in production implementations this involves a network ping or gRPC health-check. Holding mutex_ for the entire health-check sweep blocks addShard(), removeShard(), getShardIds(), and the scatter-gather executeOnAllShards() for the full network round-trip multiplied by the shard count.

Implementation Notes:

[x] Introduced ShardEntry::cached_healthy (shared_ptr<atomic<bool>>) updated by a background health-monitor thread; getHealthyShardCount() reads the cached value under the lock (< 1 µs) instead of doing live checks
[x] Background health monitor runs at a configurable health_check_interval (default 5 s); uses its own dedicated mutex so it does not contend with the main mutex_
[x] Exposed getHealthyShardCountAsync() → std::future<size_t> for callers that explicitly want live health data without blocking the shard registry
[x] Test added: simulate one shard health check that takes 500 ms; assert addShard() completes within 5 ms during the health check

Performance Targets:

getHealthyShardCount() (cached path): ≤ 2 µs
Health monitor cycle for 64 shards: ≤ 5 s wall time with per-shard 1 s timeout

9 · `DiffEngine::computeDiff()` — Cache Stampede / O(N) Changefeed Scan

Priority: Medium Target Version: v1.8.0 Files: src/analytics/diff_engine.cpp lines 175–220

computeDiff() checks the cache under cache_mutex_ (line 181), releases the lock, then performs a linear scan of the entire changefeed (listEvents with limit=0, line 198), then re-acquires cache_mutex_ to write the result (line 217). Two concurrent callers requesting the same diff range will both miss the cache, both perform the expensive scan, and both write the result — a classic cache stampede. The O(N) post-filter loop (lines 200–207) over all events then discards events outside the requested range; the changefeed should be queried with both from_sequence and to_sequence bounds to avoid scanning the entire log.

Implementation Notes:

[x] Add an in-flight-request set (std::unordered_set<std::pair<int64_t,int64_t>>) so the second caller for the same range waits on a condition_variable rather than re-computing
[x] Pass from_sequence and to_sequence as bounds to changefeed_.listEvents() when the Changefeed::ListOptions struct supports it — avoids materializing the entire event log
[x] Replace raw listEvents(…); filter in loop pattern with a binary-search or indexed range query when the changefeed is backed by a sorted store
[x] evictOldCacheEntries() (called while holding cache_mutex_ at line 217) performs an unguarded iteration — apply the same copy-evict-then-lock pattern to keep lock duration short

Performance Targets:

computeDiff() cache-miss path for a 1 M-event log, range [N-1000, N]: ≤ 50 ms
Stampede prevention: second concurrent caller for the same range must wait ≤ 5 ms

10 · `automl.cpp` — `KNNRegressorModel::predictOneReg()` Stub

Priority: Medium Target Version: v1.8.0 Status: ✅ Implemented (v1.8.0) Files: src/analytics/automl.cpp

KNNModel::predictOneReg() is fully implemented as a weighted inverse-distance mean of the k nearest neighbours' target values (weight = 1/d² where d² is the squared L2 distance; threshold on d² > 1e-15 before applying weight, else w = 1e15). The neighbors() private helper uses squared L2 distance with std::nth_element for O(n) nearest-neighbour selection.

Implementation Notes:

[x] Implement predictOneReg() as the weighted mean of the k_ nearest neighbours' target values, using the existing neighbors() helper in KNNModel
[x] Unit test added: train a KNN model on y = 2x with 100 training points; predictOneReg({5.0}) returns a value within ±0.5 of 10.0 (KNNRegressionTest.PredictOneRegLinearRelation)
[x] Opt-in performance test added: KNNRegressionTest.PredictOneRegPerformance (enabled via THEMIS_RUN_PERF_TESTS=1)

Performance Targets:

predictOneReg() for k=5 on a 10 000-sample training set: ≤ 1 ms

11 · `CEPEngine::computePercentile()` — Pass-by-Value Copy in Hot Path

Priority: Medium Target Version: v1.8.0 ✅ Resolved Files: src/analytics/cep_engine.cpp line 140

double computePercentile(std::vector<double> vals, double p) {

The function signature takes vals by value, forcing a full heap copy of the event-window data on every call. For a 10 000-event window this allocates and copies 80 KB per percentile computation. This function is called from AggregationWindow::computeValue() in the hot event-processing path.

Implementation Notes:

[x] Change signature to computePercentile(const std::vector<double>& vals, double p) — pass-by-const-ref; internal scratch copy is made once inside the shared utility
[x] The same pattern in streaming_window.cpp (calcPercentile) is also fixed — both now delegate to themis::analytics::detail::computePercentile defined in include/analytics/detail/stats.h
[x] include/analytics/detail/stats.h added with computePercentile(const std::vector<double>&, double) and computePercentile(std::span<const double>, double) overloads

Performance Targets:

Copy elimination: ≥ 50 % reduction in heap allocations on the CEP event-processing hot path

12 · Windows Platform Stubs — `olap.cpp` and `process_mining.cpp`

Priority: Medium Target Version: v2.0.0 Files: src/analytics/olap.cpp lines 53–100; src/analytics/process_mining.cpp lines 24–end Status: ✅ Completed (v2.0.0)

olap.cpp previously compiled an entire no-op OLAPEngine on _WIN32. The whole-class stub has been removed: the full cross-platform implementation is now active on all platforms (SIMD intrinsics remain guarded per-instruction via #if defined(__AVX512F__) etc.). process_mining.cpp Windows stub remains gated behind the opt-in flag THEMIS_PROCESS_MINING_WINDOWS_STUB and now emits spdlog::error for every call. Arrow-absent export stubs emit spdlog::warn instead of silently returning false.

Implementation Notes:

[x] Audit OLAPEngine for Windows-specific blockers — no POSIX mmap/pread calls were found; SIMD intrinsics are already guarded by their own #if defined(__AVX512F__) / #if defined(__AVX2__) / #if defined(__ARM_NEON) blocks. The whole-class #if defined(_WIN32) stub has been removed entirely.
[x] CMake CI job for Windows (MSVC 2022 + vcpkg) added at .github/workflows/02-feature-modules_analytics_windows-olap-ci.yml; builds test_olap_lru_cache_focused and runs it via CTest on windows-latest. A pre-build static audit step verifies that no whole-class _WIN32 stub is re-introduced and that the Stubs: counter in the file header is ≤ 2.
[x] exportToParquet() / exportCollectionToParquet() now emit spdlog::warn(...) when Arrow is not compiled in (olap.cpp #else block); throwArrowUnavailable() in analytics_export.cpp also emits spdlog::warn before throwing
[x] ProcessMining Windows stub now calls spdlog::error(...) before returning Status::Error — operators see a log entry when the capability is absent
[x] olap.cpp file-header Stubs: counter updated from 4 → 2 (only the two Arrow-absent export stubs remain); Windows CI workflow enforces this limit ≤ 2.

Performance Targets:

Full OLAPEngine::execute() on Windows: feature-parity with Linux for non-SIMD code paths

13 · `streaming_window.cpp` — 8 Open TODOs + Hard-coded Poll Intervals

Priority: Medium Target Version: v1.8.0 Files: src/analytics/streaming_window.cpp (header reports TODOs: 8)

The file header (line 14) self-reports 8 open TODOs and scores 85/100 for quality. Two concrete structural issues are observable:

13a – Hard-coded expiry poll intervals:

SessionWindow::expiryLoop() line 792: expiry_cv_.wait_for(lk, std::chrono::milliseconds(200), …) — 200 ms is hard-coded
WindowManager::timerLoop() line 1073: timer_cv_.wait_for(lk, std::chrono::milliseconds(500), …) — 500 ms is hard-coded

These intervals control session-gap detection resolution and GLOBAL-window emission latency, respectively. Operators with sub-second SLAs cannot tune them without recompiling.

13b – timerLoop() holds windows_mutex_ while calling user callback_: Lines 1079–1085 lock windows_mutex_, iterate windows, and invoke callback_(w.events, …) inside the lock — the same pattern described in section 2 for CEP, but in the streaming window layer.

Implementation Notes:

[x] Add session_expiry_check_interval_ms and global_window_emit_interval_ms fields to WindowConfig (default 200 ms and 500 ms respectively) — pass them to wait_for in expiryLoop() and timerLoop() instead of literals
[x] In timerLoop(), collect (events_copy, start, now) snapshots into a local vector under windows_mutex_, release the lock, then call all callbacks on the snapshot (already implemented in cep_engine.cpp — snapshot-then-dispatch pattern present before this change; marked complete)
[x] Identify and document all 8 open TODOs in a KNOWN_ISSUES.md or inline comments so they are trackable in code review; the file-header counter is not sufficient
[x] Add a test asserting that SessionWindow emits a result within gap + expiry_check_interval_ms + 50 ms of the last event — validates the configurable interval end-to-end

Performance Targets:

Session expiry detection latency: gap + config.session_expiry_check_interval_ms ± 20 ms

14 · SIMD Vectorization — AVX-512 and ARM NEON

Priority: High Target Version: v1.8.0 Files: src/analytics/olap.cpp, src/analytics/columnar_execution.cpp, src/analytics/forecasting.cpp

The existing SIMD acceleration covers AVX2 for aggregation kernels in olap.cpp and the Yule–Walker autocovariance loop in forecasting.cpp. AVX-512 (2× AVX2 width for double) and ARM NEON (Cortex-A78 and Apple Silicon) paths are absent.

Implementation Notes:

[x] Add #ifdef __AVX512F__ path in olap.cpp vectorizedSum/Avg/Min/Max — process 8 double per cycle vs AVX2's 4; use _mm512_reduce_add_pd for horizontal reduction
[x] Add #ifdef __ARM_NEON path with float64x2_t NEON intrinsics for ColumnAggregator in columnar_execution.cpp — ARM builds currently fall back to scalar
[x] Gate all SIMD paths behind runtime CPUID checks (__builtin_cpu_supports("avx512f")) when the binary must run on heterogeneous hardware
[x] Extend forecasting.cpp Yule–Walker AVX2 inner loop to AVX-512 (8 doubles/cycle) for the acov0_avx2 function already scaffolded in the existing doc
[x] ARM NEON and AVX2 results must produce bit-identical output (within 1 ULP) to the scalar baseline — add a parity assertion in the CI test suite

Performance Targets:

AVX-512 SUM over 10 M doubles: ≥ 2× throughput vs AVX2 baseline
ARM NEON aggregation throughput: ≥ 4 GB/s on Cortex-A78

15 · Memory Pool Allocator for Hot Analytics Paths

Priority: High Target Version: v1.8.0 Files: src/analytics/olap.cpp, src/analytics/columnar_execution.cpp, src/analytics/cep_engine.cpp

Repeated std::vector construction/destruction for intermediate aggregation buffers (group key maps, scratch arrays in ColumnarAggregator::execute(), CEPEngine::workerLoop() event copies) causes frequent heap allocations in the hot path.

Implementation Notes:

[x] Introduce AnalyticsMemoryPool (arena allocator, initial size 64 MB) in src/analytics/detail/memory_pool.h with allocate(size, align) and reset() — no individual free, reset per query
[x] Wire pool into OLAPEngine::Impl and ColumnarAggregator so intermediate group-key strings and AggState maps allocate from the pool; pool_.reset() at the start of each execute() call
[x] For CEPEngine, use a lock-free ring buffer (SPSC if single producer, MPSC if multi) for the event queue rather than std::queue<std::pair<string,Event>> — eliminates per-event std::string copy for the stream_id
[x] Ensure the pool is not shared across threads; each OLAPEngine::Impl thread gets its own pool or uses thread-local storage

Performance Targets:

Allocation overhead in OLAPEngine::execute(): ≤ 5 % of total query time (currently estimated 15–30 % for GROUP BY with many groups)

16 · Forecasting: Batch Prediction, Streaming Update, SIMD Fit

Priority: Medium Target Version: v1.9.0 Files: src/analytics/forecasting.cpp, include/analytics/forecasting.h

The forecasting engine supports fit() + predict(steps) but lacks the following capabilities needed for production deployments.

Implementation Notes:

[x] Add predictBatch(const std::vector<TimeSeries>& batch, int steps) → std::vector<std::vector<ForecastPoint>> to amortise model-state copies across independent series — existing predict() re-copies internal state on every call
[x] Add update(double new_value) for O(1) one-step incremental absorption of a new observation without full fit() rerun — update only the ETS level/trend/seasonal components
[x] Auto-tune (HES auto_tune=true) grid search over alpha/beta/gamma is single-threaded — parallelize with std::async or OpenMP; 9-point grid on 500-sample series currently takes up to 50 ms single-threaded
[x] Cache last fit() result indexed by (xxHash(training_data), config_hash) so repeated fits on unchanged data are O(1) hash lookups
[x] Extend the existing AVX2 Yule–Walker scaffold to a compiled-in AVX-512 path (see section 14)

Performance Targets:

predictBatch() for 1 000 series × 30 steps: ≤ 50 ms on a single core
update(new_value): O(1), ≤ 10 µs per call
Auto-tune grid (9 α, n=500): ≤ 5 ms with parallel search

17 · Arrow Zero-Copy Integration and Result Cache with LRU Eviction

Priority: Medium Target Version: v1.8.0 Files: src/analytics/analytics_export.cpp, src/analytics/olap.cpp, src/analytics/arrow_export.cpp

analytics_export.cpp line 341 allocates a std::vector<uint8_t> chunk(data.begin()+offset, …) for every chunk during Arrow IPC streaming — unnecessary copy when the source buffer is already contiguous. The OLAP result cache in olap.cpp can grow unbounded (no eviction policy).

Implementation Notes:

[x] Use arrow::Buffer::Wrap() or arrow::MutableBuffer zero-copy wrappers instead of copying bytes into std::vector<uint8_t> during Arrow IPC serialization in analytics_export.cpp line 341
[x] Implement LRUCache<std::string, OLAPResult> (doubly-linked list + unordered_map, max 1 000 entries configurable) for OLAP query result caching — current implementation has no eviction
[x] Cache key for OLAP must be computed from a normalized query representation (sorted dimensions, canonical filter order) so semantically equivalent queries hit the same entry
[x] Add TTL-based invalidation: cached entries older than cache_ttl_ms (configurable, default 60 s) are evicted on next access or by a background cleanup thread

Performance Targets:

Arrow IPC export copy overhead: ≤ 1 % of total export time (zero-copy path)
OLAP cache hit rate for repeated identical queries: ≥ 80 % in typical dashboard workloads

Implementation Phases

Phase 1 — Design / API Contracts (2026 Q3)

[x] Define IFormatExporter hierarchy and finalize ExporterFactory dispatch API (section 1)
[x] Draft LRUCache<K,V> utility header in include/analytics/detail/lru_cache.h (sections 7, 17)
[x] Define AnalyticsMemoryPool API (section 15)
[x] Add session_expiry_check_interval_ms / global_window_emit_interval_ms to WindowConfig (section 13)
[x] Add max_cache_entries to LLMConfig (section 7)

Phase 2 — Core Implementations (2026 Q4)

[x] Implement per-format IAnalyticsExporter classes; retire StubAnalyticsExporter (section 1)
[x] Refactor StreamingAnomalyDetector::process() to async training (section 3)
[x] Refactor ModelServingEngine::predict() to inference-outside-lock pattern (section 4)
[x] Implement LRUCache in llm_process_analyzer.cpp (section 7)
[x] Implement KNNRegressorModel::predictOneReg() via KNNModel (section 10)
[x] Fix CEPEngine::timerLoop() callback-under-lock and metricsLoop() shutdown race (section 2)

Phase 3 — Error Handling and Edge Cases (2027 Q1)

[x] Add spdlog warnings to all silent Arrow/Windows return false stubs (section 12)
[x] TOCTOU fix for MLServingEngine::infer() (section 5)
[x] Stampede prevention for DiffEngine::computeDiff() (section 9)
[x] DistributedAnalyticsSharding cached health state (section 8)
[x] IncrementalView::applyChanges() micro-batch lock release (section 6)

Phase 4 — Tests (2027 Q1)

[x] Concurrency stress test for StreamingAnomalyDetector (8 threads, 100 kHz, P99 ≤ 1 ms) — tests/analytics/test_anomaly_detection.cpp StreamingConcurrencyStress::EightProducersP99Latency (run with THEMIS_RUN_PERF_TESTS=1)
[x] OLAP cache eviction test: assert bounded memory growth under 10 000 unique queries — tests/analytics/test_olap_lru_cache.cpp OLAPLRUCache::BoundedMemoryGrowthUnder10kUniqueQueries (Linux RSS assertion + cross-platform functional variant)
[x] CEPEngine::stop() latency test: returns within 100 ms regardless of metrics_interval
[x] IVM reader-latency test: P99 ≤ 10 ms during 10 000-row batch apply
[x] KNNRegressorModel regression accuracy test on y = 2x

Phase 5 — Performance / Hardening (2027 Q2)

[x] AVX-512 and ARM NEON kernels with CI parity assertions (section 14)
[x] AnalyticsMemoryPool integration in OLAP and columnar execution (section 15)
[x] computePercentile pass-by-value elimination (section 11)
[x] Zero-copy Arrow IPC export (section 17)
[x] Forecasting batch prediction and streaming update API (section 16)

Phase 6 — Documentation and Sign-off (2027 Q2)

[ ] Update README.md performance numbers after Phase 5 benchmarks
[x] Document all resolved TODOs in streaming_window.cpp header (TODO #6 resolved)
[x] Update include/analytics/FUTURE_ENHANCEMENTS.md to reflect new public API additions (v1.8.0–v1.9.0 APIs; completed feature statuses)
[ ] Add Windows CI job and set stub-count CI gate to 0 for non-Windows builds (section 12)

Production Readiness Checklist

[x] ExporterFactory returns correct type for every ExportFormat value
[x] All std::lock_guard scopes verified to hold ≤ 1 ms under worst-case production load
[x] CEPEngine::stop() completes within 100 ms in all code paths
[ ] ModelServingEngine inference throughput ≥ 10 000 predictions/s on 8 cores
[x] IncrementalView reader P99 ≤ 10 ms under 10 000-row batch writes
[x] Windows OLAPEngine stubs emit spdlog::error; ProcessMining Windows stub now logs via spdlog::error
[x] KNNRegressorModel::predictOneReg() stub replaced with real implementation (via KNNModel)
[x] All hard-coded poll intervals (200 ms, 500 ms, 100 ms) moved to configuration structs
[x] LLMProcessAnalyzer cache eviction O(1)
[x] SIMD parity tests passing on AVX2 + scalar; AVX-512 and NEON paths added

Known Issues and Limitations

Issue	File	Severity	Notes
`ExporterFactory` always returns stub	`analytics_export.cpp:728`	High	Parquet/Feather silently unavailable without error
Training under `StreamingAnomalyDetector` lock	`anomaly_detection.cpp:1051`	High	O(N²) LOF train blocks all producers
ONNX inference under global `sessions_mutex`	`ml_serving.cpp:190`	High	Serializes all model inferences
Inference under registry `shared_lock`	`model_serving.cpp:206`	High	Starves writers during long inference
User callback under `windows_mutex_`	`cep_engine.cpp:1082`	High	Any slow callback freezes the CEP window layer
O(N) LLM cache eviction under lock	`llm_process_analyzer.cpp:105`	Medium	Degrades under high LLM call rates
Network I/O in `getHealthyShardCount()`	`distributed_analytics.cpp:321`	Medium	✅ Fixed v1.8.0: background monitor + cached_healthy atomic
Cache stampede in `DiffEngine`	`diff_engine.cpp:181`	Medium	Two threads can duplicate expensive changefeed scan
`KNNRegressorModel::predictOneReg()` = 0.0	`automl.cpp:833`	Medium	✅ Fixed v1.8.0: weighted inverse-distance mean of k nearest neighbours
8 unresolved TODOs	`streaming_window.cpp`	Medium	Enumerated as inline TODO(v1.8.0) comments in file header (§13 resolved)
Windows OLAP/ProcessMining stubs	`olap.cpp:53`, `process_mining.cpp:24`	Low	Not a blocker on Linux; silently fails on Windows
`computePercentile` by-value copy	`cep_engine.cpp:140`	Low	80 KB copy per percentile on 10k-event windows

Breaking Changes

None expected through v1.9.0 — all changes are either internal refactors or additive API extensions. The WindowConfig struct additions (section 13) and LLMConfig.max_cache_entries (section 7) are backwards-compatible with default values matching current hard-coded constants.

Test Strategy

Unit tests (≥ 90 % line coverage per file): each fix in sections 1–13 must have a corresponding isolated test in tests/analytics/
Concurrency tests: StreamingAnomalyDetector (8 producers, 100 kHz), ModelServingEngine (16 concurrent predictors), IncrementalView (writer + 4 readers)
Regression benchmarks (Google Benchmark): tracked for OLAPEngine::execute, CEPEngine event throughput, IncrementalView::applyChanges, computePercentile — PRs blocked on ≥ 5 % regression
Platform tests: Linux x86_64 (AVX2 + AVX-512 if available), ARM64, Windows 2022 MSVC
Parity tests: AVX-512 / ARM NEON vs scalar results, tolerance ≤ 1 ULP

Performance Targets

Operation	Current (estimated)	Target
`ModelServingEngine::predict()` (8-core, decision tree depth 10)	~20 000/s (lock-serialized)	≥ 500 000/s
`IncrementalView::applyChanges()` reader P99 during 10k-row batch	~500 ms	≤ 10 ms
`StreamingAnomalyDetector::process()` lock-hold	~10 ms (includes train)	≤ 50 µs
`LLMProcessAnalyzer` cache put/get	O(N)	✅ O(1) ≤ 1 µs (v1.8.0)
`DiffEngine::computeDiff()` cache-miss (1 M event log, range 1000)	~500 ms	≤ 50 ms
AVX-512 SUM over 10 M doubles	N/A (unimplemented)	≥ 2× AVX2
`forecasting.cpp` auto-tune (9α, n=500)	~50 ms single-thread	≤ 5 ms parallel

Security / Reliability

All SIMD code paths compiled with -fstack-protector-strong; no pointer arithmetic on user-controlled offsets
GPU kernel launches gated behind GPUKernelValidator checksum registry when GPU support is enabled
IVM delta messages validated for schema conformance before applyChange() — invalid deltas rejected with EINVAL, never silently ignored
Streaming aggregation enforces a configurable row-count hard cap (default 10 M rows/window) to prevent OOM via adversarial input
LLMProcessAnalyzer API key sanitised from all log output; existing sanitization in analytics_export.cpp must be extended to cover the retry-path exception messages
All public API functions return Result<T> / status codes; exceptions must not propagate across module boundaries into the query executor

Expert System Engine

Priority: High Target Version: v2.1.0 Issue: #PLANNED

Aufbauend auf der bestehenden CEP-Engine (cep_engine.cpp) mit NFA-Pattern-Matching, EPL-Parser und Rule-Engine soll eine vollwertige Expertensystem-Engine entstehen. Die CEP-Komponenten bilden das Regelausführungssubsystem (Working Memory + Agenda + NFA-Matcher); ExpertSystemEngine ergänzt sie um eine persistente Wissensbasis, Vorwärts-/Rückwärtsverkettung und eine Erklärungskomponente.

Scope

include/analytics/expert_system_engine.h (new)
src/analytics/expert_system_engine.cpp (new)
include/analytics/knowledge_base.h (new)
src/analytics/knowledge_base.cpp (new)
Integration: src/analytics/cep_engine.cpp (Rule-Execution-Layer), src/analytics/model_serving.cpp (ML-Scorer)
Integration: src/graph/knowledge_graph_reasoner.cpp (Wissensgraph-Fakten)

Design Constraints

[ ] Working Memory: max 10 000 aktive Fakten (Ring-Eviction bei Überschreitung)
[ ] Regelwerk: max 100 Horn-Klausel-Regeln; Laden aus YAML-Datei oder programmatisch
[ ] Vorwärtsverkettungs-Zyklus ≤ 50 ms für 10 000 Fakten + 100 Regeln
[ ] Rückwärtsverkettung ≤ 20 ms für Tiefe ≤ 10 (Depth-Limited-Search)
[ ] Erklärungsgeneration ≤ 10 ms (Proof-Trace als geordnete Regelanwendungssequenz)
[ ] Thread-safety: assertFact() und forwardChain() via std::mutex; explain() read-only
[ ] ML-Scorer-Integration optional (THEMIS_ENABLE_ANALYTICS_ML_SCORER); deterministischer Fallback wenn nicht aktiviert

Required Interfaces

Interface	Consumer	Notes
`ExpertSystemEngine::assertFact(fact)`	CDC-Pipeline, AQL-Layer	Schreibt `(subject, predicate, object)` Tripel in Working Memory
`ExpertSystemEngine::retractFact(fact_id)`	CDC-Pipeline	Entfernt Fakt; triggert Agenda-Re-Evaluation
`ExpertSystemEngine::forwardChain(max_cycles)`	Scheduler, CDC-Callback	Vorwärtsverkettung bis Fixpunkt; gibt Anzahl gefeuerte Regeln zurück
`ExpertSystemEngine::queryGoal(goal)`	AQL-Layer	Rückwärtsverkettung; liefert `GoalResult` mit Proof-Trace
`ExpertSystemEngine::explain(decision_id)`	Audit-API, Explanation-Endpoint	Exportiert Proof-Trace als JSON-Array von `{rule_id, matched_facts, derived_fact}`
`ExpertSystemEngine::setMLScorer(ModelServingEngine*)`	Server-Startup	Registriert ML-Modell für Konfidenz-Scoring von Regelprämissen
`KnowledgeBase::loadRulesFromYaml(path)`	Server-Startup, Hot-Reload	Lädt Horn-Klausel-Regelwerk; validiert auf Konsistenz
`KnowledgeBase::assertFact(triple)`	Reasoner, CDC	Persistiert Fakt (in-memory + optional RocksDB)
`KnowledgeBase::getFacts(predicate)`	Reasoner	Index-Lookup nach Prädikat; O(log N)

Wissensrepräsentation — Regelformat (YAML)

# Horn-Klausel-Regeln im ThemisDB-Expertensystem-Format
rules:
  - id: compliance_violation_detected
    priority: 10
    description: "Markiert einen Vorfall als Compliance-Verletzung wenn Schwellwert überschritten"
    if:
      - [?incident, type, SecurityIncident]
      - [?incident, severity, critical]
      - [?incident, affected_records, "?count > 1000"]
    then:
      - [?incident, requires_action, compliance_review]
      - [?incident, notification_level, regulatory]
    ml_confidence_threshold: 0.85   # ML-Scorer muss ≥ 0.85 bestätigen

  - id: expert_domain_inference
    priority: 5
    if:
      - [?person, authored, ?document]
      - [?document, hasKeyword, ?keyword]
      - [?keyword, inDomain, ?domain]
    then:
      - [?person, expertIn, ?domain]

Implementation Notes

[ ] ExpertSystemEngine hält einen Pointer auf CEPEngine::RuleEngine (nicht-owning); Horn-Klauseln werden als CEP-Regeln mit EPL-Syntax registriert; der NFA-Matcher dient als Rete-ähnliches Muster-Ausführungssubsystem
[ ] Working Memory: std::unordered_multimap<std::string, Fact> (Prädikat → Fakten); Ring-Eviction via LRU-Verdrängung bei 10 000 Fakten
[ ] KnowledgeBase speichert Fakten als (subject, predicate, object) Tripel; kompatibel mit KnowledgeGraphReasoner — Fakten können bidirektional ausgetauscht werden
[ ] Rückwärtsverkettung: Depth-Limited-Search mit max depth = 10; zirkuläre Beweise werden durch Visited-Set erkannt und als CycleDetected abgebrochen
[ ] ML-Scorer-Augmentierung: ModelServingEngine::predict() bewertet Regelprämissen; Konfidenz < Threshold → Regel als "soft hint" markiert, nicht als harte Entscheidung
[ ] LoRAPatternClassifier (s. u.) kann als ML-Scorer verwendet werden

Test Strategy

tests/analytics/test_expert_system_engine.cpp — ES-01..ES-20
- ES-01..ES-05: assertFact + forwardChain (Vorwärtsverkettung bis Fixpunkt)
- ES-06..ES-10: queryGoal Rückwärtsverkettung + Proof-Trace-Serialisierung
- ES-11..ES-14: ML-Scorer-Integration (Mock ModelServingEngine)
- ES-15..ES-17: Regelkonflikt-Erkennung + ConflictError
- ES-18..ES-20: Concurrency (8 Threads, 10 000 Fakten)
tests/analytics/test_knowledge_base.cpp — KB-01..KB-08
- KB-01..KB-03: YAML-Laden + Validierung
- KB-04..KB-05: assertFact / retractFact Konsistenz
- KB-06..KB-08: getFacts(predicate) Index-Korrektheit

Performance Targets

forwardChain(max=100) auf 10 000 Fakten + 100 Regeln: ≤ 50 ms
queryGoal Tiefe ≤ 10: ≤ 20 ms
explain(decision_id) Proof-Trace: ≤ 10 ms
KnowledgeBase::loadRulesFromYaml (100 Regeln): ≤ 50 ms

Security / Reliability

Regel-YAML wird gegen JSON-Schema validiert bevor Laden; ungültige Regeln → INVALID_ARGUMENT
Keine Shell-Ausführung oder Dateisystemzugriff in Regelaktionen
ML-Scorer-Konfidenzwerte werden geloggt (Audit-Trail); kein silent-override von Regelergebnissen

AI/ML + LoRA Pattern Classification

Priority: High Target Version: v2.1.0 – v2.2.0 Issue: #PLANNED

LoRA-fine-tuned LLM-Adapter liefern domänenspezifische Mustererkennung in Ereignisströmen, Zeitreihendaten und Graphpfaden. LoRAPatternClassifier wrapped MultiLoRAManager und integriert sich in CEP-Engine, ExpertSystemEngine und KnowledgeGraphReasoner.

Scope

include/analytics/lora_pattern_classifier.h (new)
src/analytics/lora_pattern_classifier.cpp (new)
Integration: src/llm/multi_lora_manager.cpp, src/analytics/cep_engine.cpp, src/analytics/model_serving.cpp, src/graph/knowledge_graph_reasoner.cpp

Design Constraints

[ ] Batch-Klassifikation (≤ 64 Events): ≤ 100 ms
[ ] Adapter-Selektion via Embedding-Ähnlichkeit: ≤ 5 ms
[ ] Guard: THEMIS_ENABLE_LLM; AutoML-Fallback (automl.cpp) immer aktiv
[ ] Thread-safety: batchClassify() reentrant; kein globaler Mutex über LoRA-Inference
[ ] Mustererkennung-Metriken: Precision ≥ 0.90 (Betrug), F1 ≥ 0.88 (Zeitreihen-Anomalie)

Required Interfaces

Interface	Consumer	Notes
`LoRAPatternClassifier::classify(events, adapter_id)`	CEP-Engine, ExpertSystem	Klassifiziert Ereignis-Batch; gibt `PatternResult{label, confidence}` zurück
`LoRAPatternClassifier::selectAdapter(context)`	Intern, AQL	Wählt Adapter via Embedding-Cosine-Ähnlichkeit zur Kontext-Domäne
`LoRAPatternClassifier::batchClassify(event_batch)`	High-throughput-Pfad	Parallele Klassifikation via Thread-Pool; gibt geordnete `PatternResult`-Liste
`CEPEngine::setLoRAPatternClassifier(classifier)`	Server-Startup	Registriert Classifier; ermöglicht `PATTERN CLASSIFIED_AS` EPL-Ausdruck
`ExpertSystemEngine::setMLScorer(LoRAPatternClassifier*)`	Server-Startup	LoRA-Classifier als ML-Scorer für Regelprämissen
`KnowledgeGraphReasoner::applyLoRAScore(chain, adapter_id)`	Reasoning-Layer	Soft-Plausibility-Scoring für Inferenz-Kanten

EPL-Erweiterung für LoRA-Mustererkennung

-- CEP-Regel: Betrugssequenz via LoRA-Klassifikation
CREATE RULE fraud_sequence_lora
AS SELECT COUNT(*) AS event_count, FIRST(user_id) AS user
FROM STREAM events
WINDOW (TUMBLING 60s)
WHERE CLASSIFIED_AS('fraud_sequence', min_confidence=0.90)
  AND amount > 10000
PATTERN WITHIN 300s
ACTION alert(channel="fraud_ops");

-- CEP-Regel: Compliance-Verstoß mit Expertensystem-Bestätigung
CREATE RULE compliance_expert
AS SELECT *
FROM STREAM audit_events
WHERE EXPERT_SYSTEM_CONFIRMS('compliance_violation_detected', confidence>=0.85)
ACTION db_write(table="compliance_violations"), slack(channel="#legal");

Implementation Notes

[ ] LoRAPatternClassifier::classify() baut einen strukturierten Prompt aus Event-Features (Typ, Zeitstempel, Werte) und ruft MultiLoRAManager::generateWithAdapter(adapter_id, prompt) auf; JSON-Antwort enthält {"label": "...", "confidence": 0.92}
[ ] selectAdapter(context) berechnet Cosine-Ähnlichkeit zwischen Context-Embedding (via IEmbeddingProvider) und vorregistrierten Adapter-Domänen-Embeddings; wählt Top-1
[ ] batchClassify() spawnt Worker-Threads via std::async; max 4 parallele LoRA-Calls
[ ] AutoML-Fallback: MLServingClient::predict() mit dem aktuell besten AutoML-Modell wenn kein LoRA-Adapter verfügbar oder THEMIS_ENABLE_LLM=OFF
[ ] Adapter-Training: LoRA-Adapter werden über IncrementalLoRATrainer (Training-Modul) trainiert; Export via exportWeights() + Import via MultiLoRAManager::loadAdapter()
[ ] Mustererkennung im Graph: KnowledgeGraphReasoner::applyLoRAScore() nutzt LoRAPatternClassifier::classify(graph_context_events, "graph_patterns_v1")

Test Strategy

tests/analytics/test_lora_pattern_classifier.cpp — LPC-01..LPC-15
- LPC-01..LPC-05: Einzelereignis-Klassifikation (Mock MultiLoRAManager)
- LPC-06..LPC-08: Batch-Klassifikation + Thread-Pool-Parallelismus
- LPC-09..LPC-11: Adapter-Selektion via Cosine-Ähnlichkeit (3 Adapter, 3 Domänen)
- LPC-12..LPC-13: CEP-Integration (CLASSIFIED_AS EPL-Ausdruck)
- LPC-14..LPC-15: AutoML-Fallback wenn LoRA deaktiviert

Performance Targets

Batch-Klassifikation 64 Events: ≤ 100 ms (inkl. LoRA-Inference)
Adapter-Selektion: ≤ 5 ms
AutoML-Fallback: ≤ 20 ms pro Event

Security / Reliability

LoRA-Adapter-Pfade werden durch isLoRAPathTrusted() validiert (multi_lora_manager.cpp)
Klassifikations-Outputs werden nie direkt in Datenbank-Writes ohne menschliche Bestätigung oder Confidence-Threshold verwendet
Adapter-Konfidenzwerte werden im Audit-Log protokolliert

Process Mining Windows Port (Target: Q4 2026)

Stub: src/analytics/process_mining.cpp — THEMIS_PROCESS_MINING_WINDOWS_STUB block
Risk: Windows nodes in a mixed cluster cannot execute process-mining operations. All ProcessMining public methods return Status::Error immediately, so BPM conformance checking and Petri-net analysis are unavailable on Windows.

Scope

Audit all POSIX dependencies in process_mining.cpp and process_mining.h:
- fork()/exec() — if used, replace with CreateProcess() or a cross-platform subprocess library.
- mmap()/mprotect() — replace with MapViewOfFile() or in-memory alternatives.
- pread()/pwrite() — replace with ReadFile()/WriteFile() with seek.
Remove THEMIS_PROCESS_MINING_WINDOWS_STUB CMake option once all blockers are resolved.
Add test_process_mining_windows.yml CI workflow on windows-latest.

Design Constraints

Cross-platform abstraction must not change the public API in process_mining.h.
Windows build must pass the full ProcessMiningTests test suite (tests/test_process_mining.cpp).
BPMN runtime and Petri-net evaluator must produce bit-identical results on Windows and Linux for deterministic event logs.

Test Strategy

Windows CI: build without THEMIS_PROCESS_MINING_WINDOWS_STUB; run all ProcessMiningTests.
Cross-platform parity: same event log → same conformance check output on Linux and Windows.

Security / Reliability

Windows subprocess handling must apply the same input validation as the Linux path.
No PROCESS_CREATE_NO_WINDOW races; subprocess output must be captured deterministically.

FilesExpand file tree

FUTURE_ENHANCEMENTS.md

Latest commit

History

FUTURE_ENHANCEMENTS.md

File metadata and controls

Analytics Module - Future Enhancements

Scope

Design Constraints

Required Interfaces

Planned Features

1 · ExporterFactory Stub Replacement

2 · Lock Held Across User Callbacks in CEPEngine::timerLoop()

3 · StreamingAnomalyDetector::process() — Training Under Lock

4 · ModelServingEngine::predict() — Inference Under Registry Lock

5 · MLServingEngine::infer() — TOCTOU Session Load + Full-Inference Lock

6 · IncrementalView::applyChanges() — Exclusive Lock for Entire Batch

7 · LLMProcessAnalyzer — O(N) Cache Eviction Under Lock

8 · DistributedAnalyticsSharding::getHealthyShardCount() — Network I/O Under Lock

9 · DiffEngine::computeDiff() — Cache Stampede / O(N) Changefeed Scan

10 · automl.cpp — KNNRegressorModel::predictOneReg() Stub

11 · CEPEngine::computePercentile() — Pass-by-Value Copy in Hot Path

12 · Windows Platform Stubs — olap.cpp and process_mining.cpp

13 · streaming_window.cpp — 8 Open TODOs + Hard-coded Poll Intervals

14 · SIMD Vectorization — AVX-512 and ARM NEON

15 · Memory Pool Allocator for Hot Analytics Paths

16 · Forecasting: Batch Prediction, Streaming Update, SIMD Fit

17 · Arrow Zero-Copy Integration and Result Cache with LRU Eviction

Implementation Phases

Phase 1 — Design / API Contracts (2026 Q3)

Phase 2 — Core Implementations (2026 Q4)

Phase 3 — Error Handling and Edge Cases (2027 Q1)

Phase 4 — Tests (2027 Q1)

Phase 5 — Performance / Hardening (2027 Q2)

Phase 6 — Documentation and Sign-off (2027 Q2)

Production Readiness Checklist

Known Issues and Limitations

Breaking Changes

Test Strategy

Performance Targets

Security / Reliability

See Also

Expert System Engine

Scope

Design Constraints

Required Interfaces

Wissensrepräsentation — Regelformat (YAML)

Implementation Notes

Test Strategy

Performance Targets

Security / Reliability

AI/ML + LoRA Pattern Classification

Scope

Design Constraints

Required Interfaces

EPL-Erweiterung für LoRA-Mustererkennung

Implementation Notes

Test Strategy

Performance Targets

Security / Reliability

Process Mining Windows Port (Target: Q4 2026)

Scope

Design Constraints

Test Strategy

Security / Reliability

2 · Lock Held Across User Callbacks in `CEPEngine::timerLoop()`

3 · `StreamingAnomalyDetector::process()` — Training Under Lock

4 · `ModelServingEngine::predict()` — Inference Under Registry Lock

5 · `MLServingEngine::infer()` — TOCTOU Session Load + Full-Inference Lock

6 · `IncrementalView::applyChanges()` — Exclusive Lock for Entire Batch

7 · `LLMProcessAnalyzer` — O(N) Cache Eviction Under Lock

8 · `DistributedAnalyticsSharding::getHealthyShardCount()` — Network I/O Under Lock

9 · `DiffEngine::computeDiff()` — Cache Stampede / O(N) Changefeed Scan

10 · `automl.cpp` — `KNNRegressorModel::predictOneReg()` Stub

11 · `CEPEngine::computePercentile()` — Pass-by-Value Copy in Hot Path

12 · Windows Platform Stubs — `olap.cpp` and `process_mining.cpp`

13 · `streaming_window.cpp` — 8 Open TODOs + Hard-coded Poll Intervals