refactor: replace the collectd mapper with local flows#4136
refactor: replace the collectd mapper with local flows#4136didier-wenzek wants to merge 13 commits into
Conversation
Robot Results
|
a8ba280 to
e460f13
Compare
e460f13 to
e3a4f91
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! 🚀 New features to boost your workflow:
|
| Check running collectd | ||
| Service Should Be Running collectd | ||
|
|
||
| Is collectd publishing MQTT messages? | ||
| ${messages}= Should Have MQTT Messages topic=collectd/# minimum=1 maximum=None | ||
|
|
There was a problem hiding this comment.
These checks have been moved to the suite setup. They are not system tests but prerequisite checks.
dedbc3a to
e00dafc
Compare
rina23q
left a comment
There was a problem hiding this comment.
I guess still code is in progress, so I reviewed the documentation changes.
| ## Enabling the systemd watchdog feature for a tedge service | ||
|
|
||
| Enabling systemd watchdog for a %%te%% service (tedge-agent, tedge-mapper-c8y/az/collectd) is a two-step process. | ||
| Enabling systemd watchdog for a %%te%% service (tedge-agent, tedge-mapper-c8y/az) is a two-step process. |
There was a problem hiding this comment.
(unrelated) Out of curiosity, The watchdog feature doesn't work also for aws and local mappers?
I found this document is actually unlisted, so maybe it doesn't matter how precise information we have here.
| Run `tedge_mapper --debug aws` to log more debug messages | ||
| ::: | ||
|
|
||
| ### Device monitoring logs {#device-logs} |
There was a problem hiding this comment.
Probably we can also remove the section "Collectd logs" below (inside "Third-party component logs"). Instead, we can mention how to see the log of collectd in the new repository for the collectd flow.
| Install Collectd | ||
| Execute Command | ||
| ... sudo apt-get update && sudo apt-get install -y --no-install-recommends collectd-core libmosquitto1 | ||
| Execute Command sudo cp /etc/tedge/contrib/collectd/collectd.conf /etc/collectd/collectd.conf |
There was a problem hiding this comment.
I guess this collectd.conf file is also a candidate to move out of the thin-edge.io repository later.
e00dafc to
9bb4852
Compare
The fact that the batcher is always initialized with default values highlights that something is wrong. It's really difficult to pick sensible values. And in practice nobody care. Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
The following sections have still to be deeply rewritten: - Getting Started / A tour of thin-edge.io / Monitor the device - Getting Started / Monitoring - Operate Devices / Troubleshooting / Device Monitoring Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
A value created in JS using `new Date()` cannot be directly returned to the flow engine. A script has then to invoke `valueOf()` method to assign a date to a message timestamp. Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
c972afe to
988c4df
Compare
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
reubenmiller
left a comment
There was a problem hiding this comment.
Great that we can split the collectd stuff and decouple it from the thin-edge.io project, though moving this functionality outside of the core package will required a multi-step transition to ensure we don't break the existing package and can cleaning transition the core collectd logic to the tedge-collectd-setup package (and rename the package in the future).
So I would propose the following actions to facilitate the transition:
- Split out the Rust related commits from this PR so that we can merge the new "group-measurements" Rust based step for future use by the collectd flow
- Create a naive "group-measurement" JavaScript implementation of the "group-measurements" so that it can be used in the tedge-collectd-setup in this transition period until the next thin-edge.io release
- After the next official thin-edge.io release, merge the collectd packaging changes (and docs) to remove the tedge-mapper-collectd service handling. This will require a change in the maintainer script generator to also cleanly stop any existing tedge-mapper-collectd service before removing it so that we don't leave a stale service afterwards (this is something I can help)
- In the tedge-collectd-setup, transition to use the Rust based "group-measurements" step instead of the naive Javascript implementation (and add the specific thin-edge.io package version as a dependency to the package)
Done: #4201
Done: thin-edge/tedge-collectd-setup@c7e0b73
I will clean up that PR, once merged #4201
This will be a simple change reverting thin-edge/tedge-collectd-setup@c7e0b73 - { script = "group-measurements.js", interval = "1s" },
+ { builtin = "group-measurements", interval = "1s", config = { time_window = "500ms" } }, |
albinsuresh
left a comment
There was a problem hiding this comment.
The refactoring has significantly improved the testability of the batching logic and that itself is a huge win.
The only thing that I found weird is batches getting split when conflicting measurements arrived within a batch window. I was wondering if replacing the old values with newer would have been better, so that batches are always emitted at the configured intervals. But, I understand our current rationale to push those into separate batches so that no mesurements are lost and the batch timestamps don't need dynamic re-compute either, which would have further complicated the batching logic.
Yeah, the main difference with the deprecated batcher is that there is no attempt to concurrently build more than one batch of messages per topic. So when a message is received for a topic but cannot be added to the associated batch we are forced to take a decision:
Indeed, avoiding message lost has been the driver.
There is a bit of window re-computation logic though. When a message is received late (i.e. its event time is older than the oldest batched so far) but not too late (i.e. the difference between that message and the newest batched so far is within the configured time window), then that message is accepted and the batch time window adjusted accordingly. |
Proposed changes
The goal of this PR is to deprecate the collectd mapper providing the tools to implement the same using flows
Merge Collectd Flow example: feat: translate collect metrics into tedge measurements tedge-flows-examples#121Types of changes
Paste Link to the issue
Checklist
just prepare-devonce)just formatas mentioned in CODING_GUIDELINESjust checkas mentioned in CODING_GUIDELINESFurther comments