Skip to content

[metrics] add switch info/meta to ddmd/mgd related timeseries#402

Open
zeeshanlakhani wants to merge 2 commits intomainfrom
zl/mgd-ddm-meta
Open

[metrics] add switch info/meta to ddmd/mgd related timeseries#402
zeeshanlakhani wants to merge 2 commits intomainfrom
zl/mgd-ddm-meta

Conversation

@zeeshanlakhani
Copy link
Copy Markdown

@zeeshanlakhani zeeshanlakhani commented Oct 30, 2024

Add additional switch and sled metadata to bfd/bgp/routing/ddm timeseries.

Related:

@zeeshanlakhani zeeshanlakhani force-pushed the zl/mgd-ddm-meta branch 2 times, most recently from dc92119 to 2ed1c9f Compare October 30, 2024 13:26
@zeeshanlakhani zeeshanlakhani changed the title .. add switch info/meta to ddmd/mgd related timeseries Oct 30, 2024
@zeeshanlakhani zeeshanlakhani changed the title add switch info/meta to ddmd/mgd related timeseries [metrics] add switch info/meta to ddmd/mgd related timeseries Feb 14, 2025
@zeeshanlakhani zeeshanlakhani marked this pull request as ready for review February 14, 2025 18:46
@zeeshanlakhani
Copy link
Copy Markdown
Author

Will update cargo.lock post review.

let props = match get_stats_server_props(pg) {
Ok(props) => props,
Err(e) => {
info!(log, "stats server not running on refresh: {e}");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of a weird message. Maybe "failed to get server properties on SMF refresh"? It seems like this should be an error! too.

@@ -134,16 +227,16 @@ impl Producer for Stats {

samples.push(ddm_router_quantity!(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why these are all macros instead of functions. We're generating a ton of duplicated code for no real benefit that I can see. Something like

impl Stats {
  fn router_quantity(&self, kind: WhatEverTheFuckThisIs, stats) -> Sample {
    Sample::new(
      &DdmRouter {
          hostname: self.hostname.clone().into(),
          sled_idents_macro!(self),
          switch_idents_macro!(self),
          kind,
          stats
        }
      )
  }
}

Then each of the next things just becomes:

samples.push(self.router_quantity(
    OriginatedUnderlayPrefixes, 
    self.router_stats.originated_underlay_prefixes));

I'm guessing the kind is an enum of some sort, but I can't figure it out. I got as far as finding a macro that generated code based on a toml file before bailing out.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exact same code is repeated at 70-100, 125-155, 175-205, and a bunch of times in mgd/src/oxstats.rs. A macro that just expanded to that sequence of assignments would be useful.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I plugged into what's there, but I'll do a nicer refactor.

Comment on lines +114 to +138
rack_id: $sled_idents.rack_id,
sled_id: $sled_idents.sled_id,
sled_model: $sled_idents.model.clone().into(),
sled_revision: $sled_idents.revision,
sled_serial: $sled_idents.serial.clone().into(),
switch_id: $switch_idents.sidecar_id,
switch_model: $switch_idents.model.clone().into(),
switch_revision: $switch_idents.revision,
switch_serial: $switch_idents.serial.clone().into(),
switch_slot: $switch_idents.slot,
asic_fab: $switch_idents
.fab
.clone()
.map(|c| c.to_string())
.unwrap_or_else(|| $switch_idents.asic_backend.to_string())
.into(),
asic_lot: $switch_idents
.lot
.clone()
.map(|c| c.to_string())
.unwrap_or_else(|| $switch_idents.asic_backend.to_string())
.into(),
asic_wafer: $switch_idents.wafer.unwrap_or(0),
asic_wafer_loc_x: $switch_idents
.wafer_loc
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can extract all of this logic into a single macro that is called by all the other _counter macros?
Unless I'm reading it wrong, it seems like we're repeating the same logic in each macro

Comment on lines +192 to +200
asic_wafer: $switch_idents.wafer.unwrap_or(0),
asic_wafer_loc_x: $switch_idents
.wafer_loc
.map(|[x, _]| x)
.unwrap_or(0),
asic_wafer_loc_y: $switch_idents
.wafer_loc
.map(|[_, y]| y)
.unwrap_or(0),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is .unwrap_or(0) the right thing to do in these calls?

Admittedly I'm unfamiliar with the oxstats side of things, but if 0 is valid then it seems like there would be no way to distinguish between "valid 0" and "hit an error 0".

level: ConfigLoggingLevel::Debug,
});
let registry = ProducerRegistry::new();
let _handle = tokio::spawn(async move {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_handle is returned, so it probably shouldn't have an underscore prefix

Comment on lines +124 to +135
asic_fab: $switch_idents
.fab
.clone()
.map(|c| c.to_string())
.unwrap_or_else(|| $switch_idents.asic_backend.to_string())
.into(),
asic_lot: $switch_idents
.lot
.clone()
.map(|c| c.to_string())
.unwrap_or_else(|| $switch_idents.asic_backend.to_string())
.into(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional that both fab and lot default to a backend string instead of a different sentinel value?

Comment on lines +85 to +88
oximeter = { git = "https://github.com/oxidecomputer/omicron", branch = "zl/mgd-ddm-meta"}
oximeter-producer = { git = "https://github.com/oxidecomputer/omicron", branch = "zl/mgd-ddm-meta"}
oxnet = { version = "0.1.0", default-features = false, features = ["schemars", "serde"] }
omicron-common = { git = "https://github.com/oxidecomputer/omicron", branch = "main"}
omicron-common = { git = "https://github.com/oxidecomputer/omicron", branch = "zl/mgd-ddm-meta"}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting here that we'll need to follow-up and point this back to main after the corresponding omicron PR is merged.

Comment on lines -30 to +34
<propval name='rack_uuid' type='astring' value='unknown' />
<propval name='sled_uuid' type='astring' value='unknown' />
<propval name='rack_id' type='astring' value='unknown' />
<propval name='sled_id' type='astring' value='unknown' />
<propval name='sled_model' type='astring' value='unknown' />
<propval name='sled_revision' type='astring' value='unknown' />
<propval name='sled_serial' type='astring' value='unknown' />
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unfamiliar with the pre-existing implementation, but won't the string "unknown" fail to parse?
Is the assumption here that omicron will update the smf service with valid/parse-able values?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants