Skip to content

fix(plugins): restore RTLD_DEEPBIND with ASAN guard (plan B)#75

Closed
pabloinigoblasco wants to merge 301 commits intodevelopmentfrom
fix/restore-rtld-deepbind
Closed

fix(plugins): restore RTLD_DEEPBIND with ASAN guard (plan B)#75
pabloinigoblasco wants to merge 301 commits intodevelopmentfrom
fix/restore-rtld-deepbind

Conversation

@pabloinigoblasco
Copy link
Copy Markdown
Collaborator

Summary

  • Restores RTLD_DEEPBIND on Linux for plugin symbol isolation
  • Prevents Conan-built deps (paho-mqtt + OpenSSL) from resolving to a different version already loaded by the host (e.g. Qt's libssl.so.3) — the crash that prompted this was heap-buffer-overflow when connecting MQTT with SSL
  • Skipped when PJ_ASAN_ACTIVE (set by -DPJ_ENABLE_SANITIZERS=ON) to preserve ASAN's LD_PRELOAD malloc interposer, which is incompatible with RTLD_DEEPBIND (google/sanitizers#611)

Context

This restores the behaviour that was working before Phase 1d (commit 1a732ba). It is kept as plan B alongside PR #73 (fix/plugin-visibility-hidden), which pursues -fvisibility=hidden + -Wl,-Bsymbolic-functions as a longer-term alternative that avoids RTLD_DEEPBIND entirely.

Test plan

  • Release build: MQTT SSL connection works
  • Debug+ASAN build: all plugin tests pass (DEEPBIND skipped via PJ_ASAN_ACTIVE)

🤖 Generated with Claude Code

pabloinigoblasco and others added 30 commits March 11, 2026 16:55
UIBasica-MarketplaceWindow

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!7
feat(marketplace): switch from local registry to GitHub public URL

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!32
refactor(marketplace): apply series naming convention fixes

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!30
fix(marketplace): strip ZIP wrapper folder on extension extraction

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!37
Refactor Extension State Management to use Manifest Discovery

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!36
…_main'

feat(registry): point to PlotJuggler development branch

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!38
fix(marketplace): emit install_finished on all paths and follow HTTP redirects

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!31
fix(ExtensionManager): flatten extension installation directory structure

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!33
…l_main'

docs(marketplace): sync REQUIREMENTS and ARCHITECTURE with current implementation

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!39
fix(ExtensionManager): implement backup before update to prevent data loss

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!27
Vlozano-ibrobotics and others added 27 commits April 24, 2026 05:55
…l_main'

fix(pj_proto_app): refresh Tools menu after marketplace install/uninstall

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!147
The previous attempt used '@' as the raw-string delimiter to sidestep
an MSVC tokenizer quirk, but '@' is not part of the C++ basic source
character set and GCC rejects it with:

  error: invalid character '@' in raw string delimiter

Use a plain ASCII letter instead. R"x( ... )x" is accepted as a
valid d-char sequence by every conforming compiler and still avoids
the MSVC issue with a trailing backslash before the default closing
)". The asserted literal content is byte-identical to what the
original R"( ... )" produced — only the delimiter marker changes.

Contents:
- pj_base/tests/media_metadata_test.cpp: change R"@(...)@" to
  R"x(...)x" in the two literals of EscapesQuotesAndBackslashes and
  update the workaround comment
…nal_main'

fix(pj_marketplace): advance update queue after staging on Windows

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!151
…MSVC

Two earlier attempts at keeping the raw-string form in this test did
not settle the CI build:

  1. R"@(...)@"  — '@' is not in the basic source character set, so
                   GCC rejected it ("invalid character '@' in raw
                   string delimiter").
  2. R"x(...)x"  — GCC accepted it, but MSVC on the CI runner kept
                   failing with "illegal escape sequence" + "invalid
                   literal suffix 'name'", i.e. the preprocessor is
                   falling out of raw-string mode on bodies that
                   combine '"' and '\' and reinterpreting the tail as
                   a user-defined literal suffix. (A global flip to
                   /Zc:preprocessor would work but changes shared
                   compiler settings.)

Switch the two literals to ordinary escaped strings. The escapes
produce identical byte content, the test asserts the exact same two
strings, and every compiler accepts the form regardless of flags.

Contents:
- pj_base/tests/media_metadata_test.cpp: replace R"x(...)x" with
  escaped "..." in both the schema() argument and the EXPECT_EQ
  expected value of EscapesQuotesAndBackslashes; update the explanatory
  comment
pj_media_core depends on pj_datastore (object_store.hpp), so it cannot
build when PJ_BUILD_DATASTORE=OFF. Moving the three pj_media
add_subdirectory calls inside the existing PJ_BUILD_DATASTORE block
expresses the real dependency: media requires the datastore, and
consumers that opt out of the datastore (e.g. plugin-only builds) no
longer hit a missing-header failure.
fix(cmake): build pj_media inside PJ_BUILD_DATASTORE guard

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!153
* feat(plugins): v3.1 plugin protocol — service registry + ABI hardening

Protocol v3 replaces the per-service `bind_<X>_host` slots with a single
`bind(registry, err)` entry point. All fallible ABI calls now carry a
structured `PJ_error_t*` out-param (inline 304-byte struct with domain +
message + growth-path `extended`/`extended_kind` slots). The three
dedicated write-host vtables become services registered under canonical
reverse-DNS names (`pj.source_write.v1`, `pj.parser_write.v1`,
`pj.toolbox_host.v1`, `pj.runtime.v1`, `pj.toolbox_runtime.v1`,
`pj.colormap.v1`). `get_dialog` returns a typed `PJ_borrowed_dialog_t`
fat pointer instead of an untyped `void*`. Every family vtable shares
the same 9-slot lifecycle prefix.

On top of that, this commit lands the v3.1 hardening that locks down
forward compatibility so future additive growth doesn't break existing
plugins:

  E0 — Boot-level ABI symbol + min-vtable-size floor
       Every plugin .so exports `pj_plugin_abi_version` as a C symbol;
       loaders dlsym it before touching the vtable. Each family header
       defines `PJ_<FAMILY>_MIN_VTABLE_SIZE` (pinned at v3.0, never
       grows); loaders accept `struct_size >= MIN_SIZE` instead of
       `>= sizeof(host_struct)`, which would falsely reject plugins
       compiled against older headers. New tail slots are gated by
       `PJ_HAS_TAIL_SLOT(vtable_type, ptr, field)`.

  E1 — PJ_error_t growth path
       Appended `const void* extended` + `char extended_kind[32]` to
       PJ_error_t (struct grew from 260 B to 304 B). Future cause chains,
       stack traces, and structured payloads fit without a v4 break.
       `sdk::fillError` clears both new fields on every write to
       prevent stale-pointer reuse. Helpers `setExtended`/`hasExtended`
       added.

  E2 — CLAP-style plugin extension query
       Each family vtable grows a tail slot
       `const void* get_plugin_extension(ctx, id)`. SDK base classes
       expose `pluginExtension(std::string_view)` virtual; host handles
       expose `getPluginExtension(id)` with tail-slot gating. Mock
       toolbox advertises `pj.experimental.mock_diagnostics/draft-1` for
       integration testing.

  E3 — Compile-time ABI layout sentinels
       New `pj_base/tests/abi_layout_sentinels_test.cpp` pins
       sizeof/alignof/offsetof for every ABI-visible struct, enum sizes
       (defends against -fshort-enums), and `sizeof(void*) == 8`. A
       failing static_assert catches accidental field reorders in PRs.

  E4 — Compile-time service-name validation
       `detail::isValidServiceName` is constexpr; every trait's `kName`
       gets a static_assert. Enforces `"pj.<name>.v<N>"` (stable) or
       `"pj.experimental.<name>/draft-<N>"` (unstable) at definition
       site — no runtime string-parse on every registration.

  E5 — FROZEN vs APPENDABLE struct labels
       Header comments at every ABI-visible struct declare the policy.
       FROZEN = layout permanent (PJ_error_t, fat pointers, handles);
       APPENDABLE = tail slots may grow (all *_vtable_t types).

  E6 — Registry runtime hardening
       `ServiceRegistryBuilder::tryRegisterService` returns
       Expected<void>; rejects null ctx/vtable and silent-overwrite
       duplicates. `dispatchGetService` null-checks before returning
       the fat pointer.

Documentation: `pj_plugins/docs/ARCHITECTURE.md` gains a "§0a. ABI
stability and evolution rules (v3.1)" section listing all seven rules
plus the plugin-extension query contract.

Tests that exercised the removed v1/v2 slots (data_source_plugin_base_test,
message_parser_plugin_base_test, delegated_ingest_integration_test) are
gated off in CMakeLists with TODO(v3-port) markers; coverage is retained
by the integration-level *_library_test.cpp suite.

* fix(v3): enforce one-shot bind + toolbox service name + error hygiene

Three correctness fixes identified by the ABI migration review:

1. Double bind() on DataSource plugins.
   DataSourceSession::bindRuntimeHostForDialog() was binding a
   runtime-only registry before the dialog, and setupAndStart() was
   rebinding the full registry afterward — calling bind() twice per
   plugin instance. The v3 protocol requires bind() to be one-shot.

   Fixed by creating the dataset + write host up-front in a new
   DataSourceSession::bindForDialog() method, so the full registry
   (source_write + runtime) is ready before the dialog is shown.
   Renamed setupAndStart() → applyConfigAndStart() to reflect that it
   no longer binds; added an idempotency guard (bound_ flag) so a
   second call is a no-op. Side-effect: an empty dataset remains if
   the user cancels the dialog — acceptable for now, documented inline.

   Updated call sites in main_window.cpp (onLoadFile, onStartStream,
   startDummyStream, restartSession).

2. Toolbox service name mismatch.
   service_traits.hpp defined ToolboxHostService::kName as
   "pj.toolbox_host.v1" but every doc and comment referenced
   "pj.toolbox_write.v1" — the name used for consistency with
   "pj.source_write.v1" and "pj.parser_write.v1". Renamed to match
   the docs; the C++ trait keeps its historical name because the
   underlying vtable type is PJ_toolbox_host_t.

3. DialogPluginBase::storeError left v3.1 growth slots uncleared.
   The local storeError() helper in dialog_plugin_base.hpp set
   code/domain/message via a writeField lambda but did not reset
   the new extended / extended_kind slots added in v3.1. A reused
   PJ_error_t struct could therefore carry a stale extended pointer
   across calls. Fixed by clearing both slots, matching the
   sdk::fillError discipline.

All 38 non-ASAN-incompatible tests pass.

* feat(v4 ABI): Phase 1a — Arrow C Data Interface + noexcept + thread tags

Part 1 of the v4 ABI migration (Arrow C Data Interface at the plugin
boundary — see .claude/plans/brainstorm-if-what-the-cosmic-wozniak.md).

Data-plane changes (pj_base/plugin_data_api.h):
  * Inlined Arrow C Data Interface POD types (ArrowSchema / ArrowArray /
    ArrowArrayStream) under the standard ARROW_C_DATA_INTERFACE guard.
  * SourceWriteHost and ToolboxHost: append_arrow_ipc REMOVED, replaced
    with append_arrow_stream (producer-owned release, pull-model ingest).
  * ToolboxHost: read_series + PJ_materialized_series_t REMOVED, replaced
    with read_series_arrow (host-owned ArrowSchema + ArrowArray).
  * ParserWriteHost: append_arrow_ipc REMOVED (parsers are per-record;
    host coalesces internally).

ABI hardening:
  * Every vtable slot is now PJ_NOEXCEPT (C++17 type-level noexcept, no-op
    in C). Trampolines that drop exceptions through the ABI now terminate
    the plugin deterministically instead of unwinding.
  * Every slot carries a thread-class tag: [main-thread], [stream-thread],
    [thread-safe].
  * PJ_ABI_VERSION bumped 3 -> 4. Per-family PROTOCOL_VERSION bumped 3 -> 4.
  * MIN_VTABLE_SIZE re-pinned at v4.0 (get_plugin_extension is now part
    of the baseline, no longer a tail slot).

SDK updates (pj_base/sdk/*):
  * All four base classes (DataSourcePluginBase, MessageParserPluginBase,
    ToolboxPluginBase, DialogPluginBase) + their detail/*_trampolines.hpp
    thunks updated to noexcept.
  * PJ_*_PLUGIN macros emit noexcept on PJ_get_*_vtable entry points.
  * SourceWriteHostView / ToolboxHostView:
      - appendArrowIpc replaced with appendArrowStream (ownership-
        transfer on success).
      - readSeries replaced with readSeriesArrow (caller-owned Arrow
        structs).
  * ParserWriteHostView: appendArrowIpc removed.

Host-side (pj_datastore/src/plugin_data_host.cpp):
  * Stubbed implementations of append_arrow_stream and read_series_arrow
    return a clear "not yet implemented (Phase 1b)" error. The real
    nanoarrow-backed implementations land in Phase 1b.
  * All trampolines noexcept.
  * Dropped MaterializedSeriesState and its 200-line readSeries method.

Verification:
  * abi_layout_sentinels_test updated with v4 offsets/sizes. MIN floors
    now at v4.0: DataSource=128, MessageParser=80, Toolbox=88.
  * Release build: 60/60 tests pass.
  * Debug+ASAN build: 36/41 pass; the 5 failures are the pre-existing
    RTLD_DEEPBIND + ASAN dlopen incompatibilities (fixed in Phase 1d).
  * plugin_host_write_test and plugin_host_read_test disabled in CMake
    pending Phase 1b (they exercise the v3 materialised-vector read path
    that no longer exists at the ABI).

ABI_migration_PLAN.md retires in Phase 3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v4 ABI): Phase 1b — host-side Arrow stream implementation

Fills in the stubs left by Phase 1a with real, working implementations of
append_arrow_stream and read_series_arrow.

arrow_import refactor:
  * Factored per-batch ingest logic out of importIpcStream into a new
    private ingestBatchesFromStream helper that works on any
    ArrowArrayStream* — IPC-backed or producer-owned. No functional
    change for the IPC path; shared code path guarantees the two stay
    in sync.
  * Factored schema-parsing logic into a private mappingsFromSchema
    helper. schemaFromIpc and the new schemaFromArrowStream both use it.
  * New public entry points:
      importArrowStream(writer, topic, stream, mappings, ts_col)
      schemaFromArrowStream(stream)
    Both preserve caller-side ownership of the stream — the importer
    never calls stream->release.

Host-side wiring (plugin_data_host.cpp):
  * WriteCore::appendArrowStream replaces the old appendArrowIpc.
  * ToolboxCore::readSeriesArrow materialises one field's time series
    into a host-owned struct ArrowArray with two columns:
    ["timestamp" (int64), <field_name> (typed)]. Built via nanoarrow.
    Supports all primitive types including strings.
  * sourceAppendArrowStream / toolboxAppendArrowStream trampolines now
    enforce the ABI ownership contract: on success the host calls
    stream->release before returning; on failure the plugin retains
    responsibility.

Test (new):
  * arrow_stream_round_trip_test — end-to-end round trip through the
    v4 ABI. Builds an in-memory ArrowArrayStream, feeds it through
    append_arrow_stream, reads back via read_series_arrow, compares
    values exactly. Confirms schema shape and release-callback hygiene.

Verification:
  * Release build: 60/60 tests pass.
  * Debug+ASAN: 37/42 pass (new round-trip test added and green).
    The 5 failures are the pre-existing RTLD_DEEPBIND + ASAN dlopen
    incompatibilities (Phase 1d fixes them).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v4 ABI): Phase 1c — SDK Arrow holders + manifest sidecar emission

Two Phase 1c deliverables, both additive (no ABI change):

1. Arrow C Data Interface RAII holders (pj_base/sdk/arrow.hpp).

   Move-only wrappers around the three Arrow C Data Interface POD types
   declared in the ABI header (ArrowSchema, ArrowArray, ArrowArrayStream).
   Each holder calls release on destruction iff release != nullptr. Makes
   the common "produce -> hand to host -> release" and
   "receive from host -> use -> release" patterns exception-safe and
   terse:

     ArrowSchemaHolder schema;
     ArrowArrayHolder  array;
     auto s = toolbox.readSeriesArrow(field, schema.out(), array.out());
     // schema/array auto-release at scope exit

   Zero-dep: stdlib only. Plugins that want richer Arrow builders link
   nanoarrow themselves.

   New test arrow_holders_test verifies destructor/move/reset/out/release
   semantics plus the post-host-takes-ownership inert path.

2. Plugin manifest sidecar emission (cmake/PjPluginManifest.cmake).

   CMake function pj_emit_plugin_manifest(target FAMILY <f>
   [MANIFEST_FILE <path>] [ABI_MAJOR <n>]) reads the plugin's existing
   manifest.json (the same file pj_embed_manifest bakes into the DSO),
   augments it with auto-generated "abi_major" and "family" keys, and
   writes a sidecar <target>.pjmanifest.json next to the built DSO and
   at install time. Lets a host scan all installed plugins at startup
   without dlopen'ing any — essential at the 20-50 plugin target scale.

   The DSO manifest is still the source of truth; host-side scanning
   will (Phase 1d) verify sidecar vs DSO on activation and fall back to
   DSO on mismatch.

   Root CMakeLists.txt now prepends cmake/ to CMAKE_MODULE_PATH and
   unconditionally includes the helper, so plugins just call the
   function without boilerplate include lines.

Verification:
  * Release build: 60/60 tests pass.
  * Debug+ASAN: 38/43 pass (new arrow_holders_test green; same 5
    pre-existing RTLD_DEEPBIND failures to be fixed in Phase 1d).
  * Verified sidecar emission end-to-end: data_load_csv_plugin writes
    csv_source_plugin.pjmanifest.json with abi_major=4 and
    family="data_source" injected on top of the plugin's existing
    manifest.json content. (The CMake wiring of the CSV plugin lives
    in the pj_ported_plugins repo; committed there separately.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v4 ABI): Phase 1d — drop RTLD_DEEPBIND + sidecar scanner

Two Phase 1d deliverables:

1. Loader hardening: drop RTLD_DEEPBIND (pj_plugins/src/detail/library_loader.hpp)

   The v3 loader set RTLD_NOW | RTLD_LOCAL | RTLD_DEEPBIND on glibc.
   DEEPBIND is a documented AddressSanitizer trap — ASAN flat-out
   refuses to dlopen anything with DEEPBIND because it bypasses
   LD_PRELOAD'd malloc interposition (same issue with jemalloc/
   tcmalloc/mimalloc in production). That cost us all five pre-existing
   debug+ASAN failures. Dropping DEEPBIND fixes them.

   Plugin-local symbol isolation is instead left to -fvisibility=hidden
   on plugin builds, to be enforced when each plugin is ported.

2. Sidecar-based plugin discovery (plugin_catalog.hpp + plugin_catalog.cpp).

   New public API:
     PJ::scanPluginSidecars(directory) -> Expected<vector<PluginDescriptor>>

   Scans a directory non-recursively for *.pjmanifest.json files,
   decodes each into a typed PluginDescriptor (name, version, abi_major,
   family, description, category, encoding, file_extensions,
   capabilities, sidecar_path, dso_path), and returns them sorted.
   Malformed sidecars are skipped silently. Uses nlohmann_json. Zero
   dlopen — the whole point of sidecar discovery.

   Companion to pj_emit_plugin_manifest (Phase 1c): CMake writes the
   sidecars at build time; scanPluginSidecars reads them at startup.

   Test plugin_catalog_test exercises eight cases: missing directory,
   empty directory, valid sidecar round-trip, malformed JSON skipped,
   missing required keys skipped, unknown family skipped, non-sidecar
   files ignored, sorted output, family toString round-trip.

Verification:
  * Release build: 60/60 tests pass.
  * Debug+ASAN: 44/44 tests pass — the 5 pre-existing
    data_source_library_test / source_dialog_integration_test /
    file_source_integration_test / message_parser_library_test /
    toolbox_plugin_test failures that survived v3.1 are now GREEN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v4 ABI): Phase 2 (core side) — sidecar integration test + v3 note drop

Phase 2 is "port plugins to v4." On core-side (plotjuggler_core) the work
is scaffolding:

- plugin_catalog_test gains an optional integration test that, when
  PJ_BUILD_PORTED_PLUGINS is on, scans the build-tree output directory
  for the 4 file-source plugin sidecars produced by pj_emit_plugin_manifest
  and verifies every entry parses cleanly with abi_major==4 and a known
  family. Closes the loop end-to-end: CMake emits sidecars on build,
  scanner reads them at test time.
- CMake wires the test with a generator-expression sidecar dir and
  depends on the four plugin targets so the sidecars exist by the time
  the test runs.

Plus one cleanup, flagged as "not an official version":
- pj_base/CMakeLists.txt: drop the stale TODO(v3-port) tag. The two
  retired unit tests (data_source_plugin_base_test,
  message_parser_plugin_base_test) exercised ABI slots that are long
  gone; the service-registry-era coverage lives in the *_library_test.cpp
  integration tests. Either rewrite or delete them — noted but not
  tackled here.

The plugin-side edits (pj_emit_plugin_manifest calls in data_load_mcap,
data_load_parquet, data_load_ulog; and the v3-era comment in
pj_ported_plugins/CMakeLists.txt) live in the pj_official_plugins repo
and are committed separately.

Verification: release 60/60, debug+ASAN 44/44. Integration test proves
the catalog scanner successfully parses the sidecars emitted by the
four v4-ready file-source plugins.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(v4 ABI): Phase 3 — align plugin docs to v4 + retire migration plan

Plugin ARCHITECTURE.md gets the v4 landing pass:
  * Header "ABI stability and evolution rules" re-tagged v4 (was v3.1).
  * MIN_VTABLE_SIZE floor text now pinned at v4.0 instead of v3.0.
  * "Protocol v3 (current)" section renamed to "Protocol v4 (current)"
    and rewritten to list the v4-distinguishing features up front:
    Arrow C Data Interface at the boundary (append_arrow_stream +
    read_series_arrow), PJ_NOEXCEPT on every slot, thread-class tags,
    sidecar-based discovery via pj_emit_plugin_manifest +
    scanPluginSidecars, RTLD_DEEPBIND removal.
  * The "inherited from the pre-v4 design" callout acknowledges v3 as
    the internal-only iteration it was: its structural changes (service
    registry, error out-params, typed borrowed dialog) carry forward
    into v4 verbatim but it was never an official release.
  * Protocol-version table now shows 4 for every family (was a mix of
    stale 1s and 2s).

All four plugin author guides gain a short "> Tracks the v4 plugin ABI"
callout at the top pointing readers to ARCHITECTURE.md as the binding
reference. The bodies of the guides still have some v1/v2-era lifecycle
prose (bind_write_host, bind_runtime_host) that describes the old
design; those sections are historical context and need rewriting in a
separate author-ergonomics pass. The four reference plugins
(data_load_csv, data_load_mcap, data_load_parquet, data_load_ulog) are
the working v4 examples.

Also retires the scratch ABI_migration_PLAN.md at the repo root — it
was superseded by this whole v4 effort.

Verification: release 60/60, debug+ASAN 44/44.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v4 ABI): Phase 0 — abidiff ABI drift gate + v4.0 baseline

Closes the v4 migration loop: with every other phase landed, the ABI is
now frozen against drift by an opt-in abidiff CI gate.

Components:

  * cmake/PjAbiCheck.cmake — opt-in via -DPJ_ENABLE_ABI_CHECK=ON.
    Emits two targets:
      abi_check            run abidiff baseline.abi vs current DSO
      abi_update_baseline  regenerate baseline (intentional ABI change)
    Also registers abi_check_test with CTest so it runs as part of the
    normal ./test.sh flow when the option is on.

  * cmake/PjAbiCheckRun.cmake — interprets abidiff exit-bit mask:
      bit 0/1 (tool/user error)      hard fail
      bit 4   (compatible change)    warn + continue
      bit 8   (INCOMPATIBLE change)  hard fail
    Warnings carry a pointer to the abi_update_baseline target so
    refreshing is one command.

  * pj_base/abi/baseline.abi — XML snapshot of the v4.0 ABI surface
    from libmock_data_source_plugin.so, filtered to types reachable
    from pj_base/include via --headers-dir. Exempted from the
    check-added-large-files pre-commit hook (it's an intentional
    reference artifact, ~1.1 MB).

  * Top-level CMakeLists.txt adds PJ_ENABLE_ABI_CHECK (default OFF)
    and includes the helper after all targets are defined.

Verified:
  * abi_check passes on the current tree (exit 0, no drift).
  * Simulated drift correctly produces a bit-4 warning.
  * debug+ASAN: 45/45 (new abi_check_test green).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(v4 ABI): align plugin guides to v4 Arrow C Data Interface reality

The v4 merge replaced append_arrow_ipc with append_arrow_stream on the
source and toolbox write hosts, dropped the Arrow slot from parsers
entirely, and added read_series_arrow + the ArrowSchemaHolder /
ArrowArrayHolder / ArrowStreamHolder RAII wrappers — but the prose in
the plugin guides still showed the old appendArrowIpc/readSeries names.

This sweep:
- Fixes every remaining appendArrowIpc / readSeries reference in the
  four SDK tutorials and REQUIREMENTS.md.
- Adds a worked readSeriesArrow example to toolbox-guide.md and a
  worked appendArrowStream example (with ownership-transfer dance) to
  data-source-guide.md.
- Rewrites the parser-guide Arrow section as an explicit 'per-record
  only; redirect bulk flows to DataSource' note.
- Documents the Arrow-at-boundary ownership contract, the manifest
  sidecar format + pj_emit_plugin_manifest CMake helper, and the
  abidiff drift gate as new subsections in ARCHITECTURE.md.
- Clarifies that MaterializedSeries is a host-internal C++ type on
  DatastoreToolboxHost, not part of the ABI surface.

No source changes; build + tests still green.

* feat(v4 ABI): SDK MaterializedSeriesView for toolbox reads

Adds a C++ view around the ArrowSchema + ArrowArray pair returned by
ToolboxHostView::readSeriesArrow. Owns both holders (move-only), decodes
the Arrow format string into PJ::PrimitiveType, exposes timestamps() as
Span<const int64_t> aliasing the Arrow buffer, and a family of
valuesAs{Float64,Float32,Int32,...}() typed pointer accessors that
return nullptr on type mismatch.

Also adds ToolboxHostView::readSeries(field) as a convenience wrapper
that calls the raw readSeriesArrow slot and returns the view. This gives
toolbox plugins a near-drop-in replacement for the pre-v4
'series->timestamps() + series->raw().values.as_float64' API — the port
ends up as a ~2-line find/replace per readSeries call rather than a full
Arrow-walk rewrite.

Format-string → PrimitiveType decoding lives in PJ::sdk::detail, covers
the primitive set defined in the Arrow C Data Interface spec.

* chore(sdk): fix stale 'protocol v3' comments in v4 SDK headers

Eight SDK headers still documented themselves as v3 despite the protocol
version constant reading 4. No behavioural change — comment-only fixup.
New plugin authors tend to trust header comments before constants, so
keeping these accurate matters.

* feat(sdk): add PJ::borrowDialog helper; drop plumbing from mock_source_with_dialog

Plugin authors no longer need to write

  extern "C" PJ_DIALOG_EXPORT const PJ_dialog_vtable_t*
      PJ_get_dialog_vtable() noexcept;

at the top of their source just to satisfy getDialog(). The
PJ_DIALOG_PLUGIN(DialogT) macro now also specialises a new template
PJ::dialogVtableFor<DialogT>(), and PJ::borrowDialog(dialog_) wraps
the type-safe vtable lookup + fat-pointer construction into one call.

Before:
  extern "C" PJ_DIALOG_EXPORT const PJ_dialog_vtable_t*
      PJ_get_dialog_vtable() noexcept;
  ...
  PJ_borrowed_dialog_t getDialog() override {
    return PJ_borrowed_dialog_t{&dialog_, PJ_get_dialog_vtable()};
  }

After:
  PJ_borrowed_dialog_t getDialog() override {
    return PJ::borrowDialog(dialog_);
  }

No ABI change: the exported C symbol PJ_get_dialog_vtable() is still
emitted for host dlsym lookup. Only the C++ plugin-author surface is
cleaner. mock_source_with_dialog updated as the reference example.

* feat(sdk): add appendArrowStream(ArrowStreamHolder&&) rvalue overload

The ownership-transfer dance for appendArrowStream previously required
plugin authors to manually call (void)stream.release() after a successful
append, or the destructor would double-release the stream. Easy to forget.

Add an rvalue-reference overload on both SourceWriteHostView and
ToolboxHostView that takes the ArrowStreamHolder directly and disarms it
on success:

  PJ::sdk::ArrowStreamHolder stream(buildStream());
  auto status = writeHost().appendArrowStream(topic, std::move(stream), "timestamp");
  // On success: stream is inert — destructor is a no-op.
  // On failure: plugin retains ownership — destructor releases.

The raw-pointer overload is kept for ABI-escape-hatch use (callers that
own the stream through some other mechanism). The ArrowStreamHolder
doc-comment in arrow.hpp is updated to recommend the rvalue form first.

No behavioural change for existing raw-pointer call sites; new authors
pick up the safer pattern by default.

* feat(sdk-testing): ship ParserWriteRecorder + port message_parser_library_test

Every parser test used to define its own ~60 line ParserWriteRecorder
struct with three identical C vtable trampolines + a makeWriteHost()
factory. Lift that into a new installed header

  pj_base/include/pj_base/sdk/testing/parser_write_recorder.hpp

exposing:

  PJ::sdk::testing::ParserWriteRecorder recorder;
  PJ::ServiceRegistryBuilder registry;
  registry.registerService<PJ::sdk::ParserWriteHostService>(recorder.makeHost());
  // ... run parser ...
  EXPECT_EQ(recorder.rows()[0].fields[0].numeric, 3.14);

RecordedField exposes typed slots: .numeric (double, populated for all
int/float/bool), .bool_value, .string_value, plus .type (PJ::PrimitiveType)
and .is_null. bool values populate both .bool_value and .numeric (1.0/0.0)
so tests can assert uniformly.

Port message_parser_library_test.cpp as the first user.

* feat(sdk-testing): ship ToolboxTestStore — fake toolbox host with Arrow read path

The quaternion test previously needed ~130 lines of hand-rolled Arrow C
Data Interface plumbing — disjoint ArrowSchema / ArrowArray payload
blocks, release callbacks, buffer arrays — just to feed fake data into
the toolbox via readSeriesArrow. Lift all of that into a new installed
header:

  pj_plugins/include/pj_plugins/testing/toolbox_test_store.hpp

exposing a small builder-style API:

  PJ::testing::ToolboxTestStore store;
  store.addTopic("quat")
       .addField("quat", "x", timestamps, xs)
       .addField("quat", "y", timestamps, ys);
  registry.registerService<PJ::sdk::ToolboxHostService>(store.makeHost());
  registry.registerService<PJ::sdk::ToolboxRuntimeHostService>(store.makeRuntimeHost());

  // ... run toolbox ...

  EXPECT_EQ(store.writtenRecords().size(), N);
  EXPECT_EQ(store.notifyDataChangedCalls(), 1);

The store captures append_record writes (reusing the parser-write
recorder's RecordedRow shape) and counts host-side activity. Internally
it emits the two-column Arrow struct layout readSeriesArrow expects —
with disjoint schema/array ownership so holder destruction order doesn't
matter.

Also exposes:
  - extendField(): append more samples to simulate incremental data
  - flatRecords(): flattened (ts, name, value) view for tests that
    prefer a single linear list

* docs(v4 SDK): rewrite Quick Starts to use new helpers

The SDK guides had been updated with v4 callouts at the top, but their
Quick Start sections and later snippets still showed pre-helper
boilerplate — raw dialogContext() overrides, manual ArrowStreamHolder
release() dances, no references to the new testing helpers.

This pass:
- Replaces dialogContext()/void*-returning getters with the new
  PJ_borrowed_dialog_t getDialog() override + PJ::borrowDialog(dialog_)
  helper across data-source-guide, toolbox-guide, dialog-plugin-guide,
  REQUIREMENTS.md and ARCHITECTURE.md.
- Updates Arrow-bulk-write examples (data-source-guide, toolbox-guide)
  to use the new rvalue-ref overload — std::move(stream) — instead of
  the manual (void)stream.release() after success pattern.
- Adds Testing sections to toolbox-guide and message-parser-guide
  pointing plugin authors at ToolboxTestStore / ParserWriteRecorder
  so unit tests no longer hand-roll Arrow C Data Interface or
  host-vtable plumbing.

All prose examples now compile mentally against the actual current SDK
surface. The one remaining 'appendArrowIpc' reference in
message-parser-guide is an intentional negation ('no appendArrowIpc
slot on parser write host') kept for documentation value.

* docs: V4_STORE plan for plugin-ABI ObjectStore surface

Approved plan describing how to extend the v4 plugin ABI so plugins can
read/write ObjectStore alongside DataEngine. Six phases; phase 5 (toolbox
object write) and auxiliary topic indices (keyframe etc.) are deferred.

Canary use case: MCAP plugin — scalars via delegated parser, small
markers via pushOwned (eager), image/pointcloud bytes via pushLazy.
Video topics deferred in full.

* feat(v4 ABI): phase 1 — source object write host

Adds the plugin-visible surface for DataSource plugins to write into
ObjectStore alongside the existing scalar write host. Composed via the
v4 service registry as an optional service — no ABI break, no protocol
bump, scalar-only plugins unaffected.

New surface:
- C ABI: PJ_object_topic_handle_t, PJ_lazy_fetch_fn_t,
  PJ_object_write_host_vtable_t, PJ_object_write_host_t.
- Service trait: SourceObjectWriteHostService -> "pj.source_object_write.v1".
- SDK view: sdk::SourceObjectWriteHostView with registerTopic /
  pushOwned / pushLazy / setRetentionBudget. pushLazy(Fetch&&) hides
  the fetch_ctx / fetch_ctx_destroy ABI dance behind a C++ lambda via
  a heap-allocated move-capture box.
- DataSourcePluginBase::bind() resolves the service optionally;
  objectWriteHost() returns nullptr on hosts that don't register it.
- Host plumbing: DatastoreSourceObjectWriteHost(ObjectStore&, DatasetId)
  in pj_datastore, with trampolines that wrap the C-ABI fetch callback
  in a shared_ptr<PluginFetchCtx> so fetch_ctx_destroy runs exactly
  once when ObjectStore drops the entry.

Tests: 9 new cases in plugin_data_host_object_test covering register/
push_owned/push_lazy/retention/invalid-topic/unbound-view paths plus
an explicit destroy-callback exact-once verification via the raw C
ABI. 67/67 ctest green under Debug+ASAN.

* feat(v4 ABI): phase 2 — toolbox object read host

Exposes ObjectStore read access to Toolbox plugins via a separate
service (`pj.toolbox_object_read.v1`) rather than extending the
scalar toolbox vtable. Keeps each capability on its own service so
transformer-style plugins (read bytes, emit results) that don't need
scalar read/write can resolve only what they care about.

Read path uses an opaque owning-handle model that mirrors
`shared_ptr<const vector<uint8_t>>`:
- C ABI: PJ_object_bytes_handle_t (opaque forward-declared pointer),
  PJ_object_read_host_vtable_t with lookup_topic / list_topics /
  topic_metadata / read_latest_at / get_bytes / release_bytes /
  entry_count / time_range. get_bytes / release_bytes take the
  handle directly (no ctx) — the handle carries its own state.
- SDK: PJ::sdk::ObjectBytes move-only RAII wrapper that calls
  release_bytes in its destructor. Usable across worker threads
  because the handle keeps bytes alive independent of the store.
- Service trait: ToolboxObjectReadHostService.
- SDK view: ToolboxObjectReadHostView with lookupTopic / listTopics
  / topicMetadata / readLatestAt / entryCount / timeRange. listTopics
  does a two-call resize dance matching the C ABI.
- ToolboxPluginBase::bind() resolves the service optionally;
  objectReadHost() returns nullptr when the host doesn't register it.
- Host plumbing: DatastoreToolboxObjectReadHost allocates an
  ObjectBytesBox (holding the shared_ptr) per successful
  read_latest_at, freed by release_bytes.

Tests: 11 new cases - read-after-write, destructor exact-once,
owning-handle-survives-eviction, lookup/list/metadata round-trip,
time_range, read-miss, cross-thread handle move, unbound-view
fallbacks, moved-from-holder is empty. 68/68 ctest green under
Debug+ASAN.

* feat(v4 ABI): phase 3 — parser optional object write service

Delivers the "two-host parse()" contract from
pj_media/docs/REQUIREMENTS.md Prerequisites **without bumping the
parser protocol version**. Achieved by adding an optional second
service the parser resolves alongside the scalar write host, same
pattern already used for pj.colormap.v1.

- C ABI: PJ_parser_object_write_host_vtable_t (push_owned +
  push_lazy; topic bound by the host at service-creation time - same
  shape as the scalar PJ_parser_write_host_vtable_t but for the
  object path) + PJ_parser_object_write_host_t.
- Service trait: ParserObjectWriteHostService ->
  "pj.parser_object_write.v1".
- SDK view: sdk::ParserObjectWriteHostView with pushOwned /
  pushLazy(Fetch&&). Same heap-allocated move-capture box pattern as
  SourceObjectWriteHostView for the lazy closure.
- MessageParserPluginBase::bind() resolves the service via
  services.optional<>(); objectWriteHost() returns nullptr when
  absent. Media-capable parsers check inside parse() and emit
  header scalars to writeHost() plus the media payload to
  objectWriteHost() from one call.
- Host plumbing: DatastoreParserObjectWriteHost(ObjectStore&,
  uint32_t topic_id) - holds the bound ObjectTopicId in state, the
  parser never names topics. Reuses the PluginFetchCtx shared_ptr
  pattern from phase 1 so fetch_ctx_destroy runs exactly once per
  evicted lazy entry.

Tests: 4 cases in plugin_parser_object_write_test - parser writes
both scalar + object from one parse() call; parser falls back to
scalar-only when the object service is absent; SDK pushLazy wires
through the parser vtable; unbound view returns error. 69/69 ctest
green under Debug+ASAN.

Note: PJ_parser_binding_request_t extension (adding optional
object_topic field for delegated ingest from DataSources) is
deferred to the MCAP port in phase 6 - the host-side plumbing to
register the second service per binding lives in pj_plugins and
will land with the MCAP changes.

* feat(v4 ABI): phase 4 — MediaMetadataBuilder SDK helper

Tiny JSON builder for the metadata_json string attached to ObjectStore
topics at registration time. Three documented keys from
OBJECT_STORE_DESIGN.md §4 become typed methods so typos fail to
compile; raw extras + quoted-string extras for format-specific fields.

Usage:
  auto meta = MediaMetadataBuilder()
      .mediaClass("image")
      .encoding("jpeg")
      .schema("sensor_msgs/CompressedImage")
      .extraString("source", "camera_0")
      .build();
  host.registerTopic(name, meta);

Minimal - no external JSON library dependency, proper escaping for
quotes/backslashes/control chars. Empty keys omitted, non-empty keys
emitted in canonical order (media_class, encoding, schema, extras).

Tests: 9 cases covering empty builder, single-key round-trips,
canonical ordering, extras (raw + quoted), escape sequences. 70/70
ctest green under Debug+ASAN.

Notes on Phase 4 scope:
- Typed ObjectTopicHandle already landed in phase 1 via the
  struct-wrapped uint32_t pattern matching TopicHandle / FieldHandle.
- pushOwned(vector&&) rvalue overload is not worth it: the C ABI
  requires (const uint8_t*, size_t), so the host-side trampoline
  copies into its own std::vector regardless.

* fix(cmake): derive PJ_HAS_PORTED_PLUGINS and gate plugin_catalog_test on it

The top-level already guards add_subdirectory(pj_ported_plugins) with
`PJ_BUILD_PORTED_PLUGINS AND EXISTS pj_ported_plugins/CMakeLists.txt`,
but pj_plugins/CMakeLists.txt was gating the plugin_catalog_test
integration wiring on `PJ_BUILD_PORTED_PLUGINS` alone. In a checkout
without the pj_ported_plugins/ tree (the standalone core build), the
option stays ON by default and the test ends up referencing targets
(csv_source_plugin etc.) that were never added, producing:

  Error evaluating generator expression: No target "csv_source_plugin"
  add_dependencies: The dependency target "csv_source_plugin" does not exist.

Expose a derived PJ_HAS_PORTED_PLUGINS flag from the top-level — set
inside the same existing guard that decides to add the subdirectory —
and gate the integration test wiring on it. Convention: PJ_BUILD_* are
user inputs; PJ_HAS_* are derived state. plugin_catalog_test itself
already falls back to GTEST_SKIP() when PJ_PORTED_PLUGINS_BIN_DIR is
not defined, so when the wiring is skipped the test compiles clean and
just reports SKIPPED at runtime.

* test(pj_base): use custom raw-string delimiter to unblock MSVC build

MSVC's preprocessor was tokenizing part of the raw-string body as a
user-defined-literal suffix when a backslash sat immediately before
the closing quote-paren, producing:

  error C2017: illegal escape sequence
  error C3688: invalid literal suffix 'name';
               literal operator 'operator ""name' not found

GCC and Clang accept the form as the standard prescribes, but switching
to a custom delimiter (R"@( ... )@") sidesteps the MSVC quirk without
changing the literal's content — the two asserted JSON strings still
carry the same bytes.

Contents:
- pj_base/tests/media_metadata_test.cpp: change R"(...)" to R"@(...)@"
  in the two literals of EscapesQuotesAndBackslashes + add a short
  comment explaining the MSVC workaround

* test(pj_base): switch raw-string delimiter to 'x' so GCC also accepts it

The previous attempt used '@' as the raw-string delimiter to sidestep
an MSVC tokenizer quirk, but '@' is not part of the C++ basic source
character set and GCC rejects it with:

  error: invalid character '@' in raw string delimiter

Use a plain ASCII letter instead. R"x( ... )x" is accepted as a
valid d-char sequence by every conforming compiler and still avoids
the MSVC issue with a trailing backslash before the default closing
)". The asserted literal content is byte-identical to what the
original R"( ... )" produced — only the delimiter marker changes.

Contents:
- pj_base/tests/media_metadata_test.cpp: change R"@(...)@" to
  R"x(...)x" in the two literals of EscapesQuotesAndBackslashes and
  update the workaround comment

* test(pj_base): replace failing raw strings with escaped literals for MSVC

Two earlier attempts at keeping the raw-string form in this test did
not settle the CI build:

  1. R"@(...)@"  — '@' is not in the basic source character set, so
                   GCC rejected it ("invalid character '@' in raw
                   string delimiter").
  2. R"x(...)x"  — GCC accepted it, but MSVC on the CI runner kept
                   failing with "illegal escape sequence" + "invalid
                   literal suffix 'name'", i.e. the preprocessor is
                   falling out of raw-string mode on bodies that
                   combine '"' and '\' and reinterpreting the tail as
                   a user-defined literal suffix. (A global flip to
                   /Zc:preprocessor would work but changes shared
                   compiler settings.)

Switch the two literals to ordinary escaped strings. The escapes
produce identical byte content, the test asserts the exact same two
strings, and every compiler accepts the form regardless of flags.

Contents:
- pj_base/tests/media_metadata_test.cpp: replace R"x(...)x" with
  escaped "..." in both the schema() argument and the EXPECT_EQ
  expected value of EscapesQuotesAndBackslashes; update the explanatory
  comment

---------

Co-authored-by: Davide Faconti <davide.faconti@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sync: feat/v4-abi (PR #68) → internal_main

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!154
fix(pj_media): port libjpeg-turbo + libpng Conan migration onto v4-abi

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!152
…specialisation

The PJ_DIALOG_PLUGIN macro emits an inline template specialisation of
dialogVtableFor<T>. Clang (macOS) warns -Wunused-function because the
definition and its only caller (borrowDialog) live in different
translation units. GCC does not warn here.

Adding [[maybe_unused]] to the inline specialisation silences the warning
without changing behaviour.
…specialisation (#71)

The PJ_DIALOG_PLUGIN macro emits an inline template specialisation of
dialogVtableFor<T>. Clang (macOS) warns -Wunused-function because the
definition and its only caller (borrowDialog) live in different
translation units. GCC does not warn here.

Adding [[maybe_unused]] to the inline specialisation silences the warning
without changing behaviour.
sync: clang -Wunused-function fix (PR #71) → feat/v4-abi

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!156
…specialisation (#71)

The PJ_DIALOG_PLUGIN macro emits an inline template specialisation of
dialogVtableFor<T>. Clang (macOS) warns -Wunused-function because the
definition and its only caller (borrowDialog) live in different
translation units. GCC does not warn here.

Adding [[maybe_unused]] to the inline specialisation silences the warning
without changing behaviour.
sync: clang -Wunused-function fix (PR #71) → internal_main

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!157
fix(dialog-sdk): suppress clang -Wunused-function on dialogVtableFor specialisation

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!155
Merges all v4 ABI changes (ObjectStore surface, MaterializedSeriesView,
parser/toolbox helpers, Clang unused-function fix) into internal_main.

Conflict in pj_media/demos/CMakeLists.txt resolved by taking the
feat/v4-abi side (remove legacy elseif(PJ_BUILD_DIALOG_ENGINE_QT) block
for Qt-only image viewer, superseded by pj_media_qt target check).
chore(merge): bring feat/v4-abi into internal_main

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!158
sync: plotjuggler/development → ibrobotics/development → internal_main (PRs #69-#71, #68)

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!159
The elseif(PJ_BUILD_DIALOG_ENGINE_QT) branch built a basic Qt::Widgets
mcap_image_viewer as a fallback when pj_media_qt was not available. Now
that pj_media_qt exists and is the correct path for media demos, this
fallback is dead code: it never executes when pj_media_qt is present,
and building a raw-widget demo without QRhi support provides no value
when pj_media_qt is absent. Remove it to keep the demo CMakeLists clean.
fix(pj_media): remove stale Qt-fallback demo from pj_media/demos

See merge request client-projets/p.2026-plotjuggler/plotjuggler_core!160
Restores RTLD_DEEPBIND on Linux for plugin isolation: prevents
Conan-built deps (paho-mqtt + OpenSSL) from resolving to a different
version already loaded by the host (e.g. Qt's libssl.so.3), which
caused heap-buffer-overflow crashes when connecting MQTT with SSL.

Skipped when PJ_ASAN_ACTIVE (set by -DPJE_ENABLE_SANITIZERS=ON) to
preserve ASAN's malloc interposer — the known incompatibility between
RTLD_DEEPBIND and LD_PRELOAD'd sanitizer runtimes (google/sanitizers#611).
PJ_ASAN_ACTIVE is defined in CMakeLists.txt when PJ_ENABLE_SANITIZERS
is on, so ASAN builds continue to pass without DEEPBIND.

This restores the behaviour that was working before Phase 1d (commit
1a732ba). It is kept as plan B alongside PR #73 (fix/plugin-visibility-
hidden), which pursues the -fvisibility=hidden + -Bsymbolic-functions
approach as the longer-term alternative.
@facontidavide facontidavide deleted the branch development May 4, 2026 12:12
@pabloinigoblasco pabloinigoblasco deleted the fix/restore-rtld-deepbind branch May 4, 2026 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants