Skip to content

refactor: replace the collectd mapper with local flows#4136

Open
didier-wenzek wants to merge 13 commits into
thin-edge:mainfrom
didier-wenzek:refactor/replace-collectd-mapper-with-flows
Open

refactor: replace the collectd mapper with local flows#4136
didier-wenzek wants to merge 13 commits into
thin-edge:mainfrom
didier-wenzek:refactor/replace-collectd-mapper-with-flows

Conversation

@didier-wenzek
Copy link
Copy Markdown
Contributor

@didier-wenzek didier-wenzek commented Apr 16, 2026

Proposed changes

The goal of this PR is to deprecate the collectd mapper providing the tools to implement the same using flows

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue


Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s. You can activate automatic signing by running just prepare-dev once)
  • I ran just format as mentioned in CODING_GUIDELINES
  • I used just check as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

@didier-wenzek didier-wenzek added theme:monitoring Theme: Service monitoring and watchdogs theme:flows labels Apr 16, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 16, 2026

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
919 0 7 919 100 2h18m42.062401s

@didier-wenzek didier-wenzek force-pushed the refactor/replace-collectd-mapper-with-flows branch from a8ba280 to e460f13 Compare April 20, 2026 14:15
@didier-wenzek didier-wenzek force-pushed the refactor/replace-collectd-mapper-with-flows branch from e460f13 to e3a4f91 Compare April 21, 2026 11:22
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 91.06700% with 36 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...tedge_flows/src/transformers/group_measurements.rs 91.56% 24 Missing and 3 partials ⚠️
crates/extensions/tedge_flows/src/flow.rs 43.75% 9 Missing ⚠️

📢 Thoughts on this report? Let us know!

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines -30 to -35
Check running collectd
Service Should Be Running collectd

Is collectd publishing MQTT messages?
${messages}= Should Have MQTT Messages topic=collectd/# minimum=1 maximum=None

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks have been moved to the suite setup. They are not system tests but prerequisite checks.

Comment thread crates/common/batcher/src/batcher.rs Outdated
@didier-wenzek didier-wenzek force-pushed the refactor/replace-collectd-mapper-with-flows branch from dedbc3a to e00dafc Compare April 22, 2026 13:29
Comment thread crates/common/batcher/src/flows.rs Outdated
Comment thread crates/common/batcher/src/flows.rs Outdated
Copy link
Copy Markdown
Member

@rina23q rina23q left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess still code is in progress, so I reviewed the documentation changes.

## Enabling the systemd watchdog feature for a tedge service

Enabling systemd watchdog for a %%te%% service (tedge-agent, tedge-mapper-c8y/az/collectd) is a two-step process.
Enabling systemd watchdog for a %%te%% service (tedge-agent, tedge-mapper-c8y/az) is a two-step process.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(unrelated) Out of curiosity, The watchdog feature doesn't work also for aws and local mappers?
I found this document is actually unlisted, so maybe it doesn't matter how precise information we have here.

Comment thread docs/src/operate/troubleshooting/log-files.md Outdated
Comment thread docs/src/operate/configuration/mosquitto-configuration.md Outdated
Run `tedge_mapper --debug aws` to log more debug messages
:::

### Device monitoring logs {#device-logs}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we can also remove the section "Collectd logs" below (inside "Third-party component logs"). Instead, we can mention how to see the log of collectd in the new repository for the collectd flow.

Install Collectd
Execute Command
... sudo apt-get update && sudo apt-get install -y --no-install-recommends collectd-core libmosquitto1
Execute Command sudo cp /etc/tedge/contrib/collectd/collectd.conf /etc/collectd/collectd.conf
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this collectd.conf file is also a candidate to move out of the thin-edge.io repository later.

@didier-wenzek didier-wenzek force-pushed the refactor/replace-collectd-mapper-with-flows branch from e00dafc to 9bb4852 Compare April 23, 2026 08:28
The fact that the batcher is always initialized with default values
highlights that something is wrong. It's really difficult to pick sensible values.
And in practice nobody care.

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
The following sections have still to be deeply rewritten:
- Getting Started / A tour of thin-edge.io / Monitor the device
- Getting Started / Monitoring
- Operate Devices / Troubleshooting / Device Monitoring

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
A value created in JS using `new Date()` cannot be directly returned to the flow engine.
A script has then to invoke `valueOf()` method to assign a date to a message timestamp.

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
@didier-wenzek didier-wenzek force-pushed the refactor/replace-collectd-mapper-with-flows branch from c972afe to 988c4df Compare May 28, 2026 12:52
@didier-wenzek didier-wenzek temporarily deployed to Test Pull Request May 28, 2026 12:52 — with GitHub Actions Inactive
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
@didier-wenzek didier-wenzek temporarily deployed to Test Pull Request May 28, 2026 14:38 — with GitHub Actions Inactive
@didier-wenzek
Copy link
Copy Markdown
Contributor Author

Updating the docs

The following sections use collectd as an example of an app that can be managed with thin-edge => okay, unchanged

Partially updated

Fully removed because only focused on collectd

What about those ones?

I decided to fully remove the first one i.e. from "A tour of thin-edge.io". Indeed, the point of this tour is to highlight key features, not extensions.
Not perfect, because collectd is used as an example for configuration management :-(

The section "device monitoring" can be kept.
But, I propose to move it under https://thin-edge.github.io/thin-edge.io/operate/monitoring/.

Copy link
Copy Markdown
Contributor

@reubenmiller reubenmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that we can split the collectd stuff and decouple it from the thin-edge.io project, though moving this functionality outside of the core package will required a multi-step transition to ensure we don't break the existing package and can cleaning transition the core collectd logic to the tedge-collectd-setup package (and rename the package in the future).

So I would propose the following actions to facilitate the transition:

  1. Split out the Rust related commits from this PR so that we can merge the new "group-measurements" Rust based step for future use by the collectd flow
  2. Create a naive "group-measurement" JavaScript implementation of the "group-measurements" so that it can be used in the tedge-collectd-setup in this transition period until the next thin-edge.io release
  3. After the next official thin-edge.io release, merge the collectd packaging changes (and docs) to remove the tedge-mapper-collectd service handling. This will require a change in the maintainer script generator to also cleanly stop any existing tedge-mapper-collectd service before removing it so that we don't leave a stale service afterwards (this is something I can help)
  4. In the tedge-collectd-setup, transition to use the Rust based "group-measurements" step instead of the naive Javascript implementation (and add the specific thin-edge.io package version as a dependency to the package)

@didier-wenzek
Copy link
Copy Markdown
Contributor Author

1. Split out the Rust related commits from this PR so that we can merge the new "group-measurements" Rust based step for future use by the collectd flow

Done: #4201

2. Create a naive "group-measurement" JavaScript implementation of the "group-measurements" so that it can be used in the tedge-collectd-setup in this transition period until the next thin-edge.io release

Done: thin-edge/tedge-collectd-setup@c7e0b73

3. After the next official thin-edge.io release, merge the collectd packaging changes (and docs) to remove the tedge-mapper-collectd service handling. This will require a change in the maintainer script generator to also cleanly stop any existing tedge-mapper-collectd service before removing it so that we don't leave a stale service afterwards (this is something I can help)

I will clean up that PR, once merged #4201

4. In the tedge-collectd-setup, transition to use the Rust based "group-measurements" step instead of the naive Javascript implementation (and add the specific thin-edge.io package version as a dependency to the package)

This will be a simple change reverting thin-edge/tedge-collectd-setup@c7e0b73

-   { script = "group-measurements.js", interval = "1s" },
+   { builtin = "group-measurements", interval = "1s", config = { time_window = "500ms" } },

Copy link
Copy Markdown
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactoring has significantly improved the testability of the batching logic and that itself is a huge win.

The only thing that I found weird is batches getting split when conflicting measurements arrived within a batch window. I was wondering if replacing the old values with newer would have been better, so that batches are always emitted at the configured intervals. But, I understand our current rationale to push those into separate batches so that no mesurements are lost and the batch timestamps don't need dynamic re-compute either, which would have further complicated the batching logic.

@didier-wenzek
Copy link
Copy Markdown
Contributor Author

The only thing that I found weird is batches getting split when conflicting measurements arrived within a batch window. I was wondering if replacing the old values with newer would have been better, so that batches are always emitted at the configured intervals.

Yeah, the main difference with the deprecated batcher is that there is no attempt to concurrently build more than one batch of messages per topic. So when a message is received for a topic but cannot be added to the associated batch we are forced to take a decision:

  • either to keep the batch unchanged and to forward the new message: this is what is done for late messages
  • or to close the current batch and to create a new one with the new messages: this is what is done for conflicting messages and those more recent than the batch time-window

But, I understand our current rationale to push those into separate batches so that no mesurements are lost

Indeed, avoiding message lost has been the driver.

the batch timestamps don't need dynamic re-compute either, which would have further complicated the batching logic.

There is a bit of window re-computation logic though. When a message is received late (i.e. its event time is older than the oldest batched so far) but not too late (i.e. the difference between that message and the newest batched so far is within the configured time window), then that message is accepted and the batch time window adjusted accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

theme:flows theme:monitoring Theme: Service monitoring and watchdogs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants