[metrics] add switch info/meta to ddmd/mgd related timeseries#402
[metrics] add switch info/meta to ddmd/mgd related timeseries#402zeeshanlakhani wants to merge 2 commits intomainfrom
Conversation
dc92119 to
2ed1c9f
Compare
2ed1c9f to
a259ea2
Compare
|
Will update cargo.lock post review. |
| let props = match get_stats_server_props(pg) { | ||
| Ok(props) => props, | ||
| Err(e) => { | ||
| info!(log, "stats server not running on refresh: {e}"); |
There was a problem hiding this comment.
Kind of a weird message. Maybe "failed to get server properties on SMF refresh"? It seems like this should be an error! too.
| @@ -134,16 +227,16 @@ impl Producer for Stats { | |||
|
|
|||
| samples.push(ddm_router_quantity!( | |||
There was a problem hiding this comment.
I'm not sure why these are all macros instead of functions. We're generating a ton of duplicated code for no real benefit that I can see. Something like
impl Stats {
fn router_quantity(&self, kind: WhatEverTheFuckThisIs, stats) -> Sample {
Sample::new(
&DdmRouter {
hostname: self.hostname.clone().into(),
sled_idents_macro!(self),
switch_idents_macro!(self),
kind,
stats
}
)
}
}
Then each of the next things just becomes:
samples.push(self.router_quantity(
OriginatedUnderlayPrefixes,
self.router_stats.originated_underlay_prefixes));
I'm guessing the kind is an enum of some sort, but I can't figure it out. I got as far as finding a macro that generated code based on a toml file before bailing out.
There was a problem hiding this comment.
The exact same code is repeated at 70-100, 125-155, 175-205, and a bunch of times in mgd/src/oxstats.rs. A macro that just expanded to that sequence of assignments would be useful.
There was a problem hiding this comment.
yeah, I plugged into what's there, but I'll do a nicer refactor.
| rack_id: $sled_idents.rack_id, | ||
| sled_id: $sled_idents.sled_id, | ||
| sled_model: $sled_idents.model.clone().into(), | ||
| sled_revision: $sled_idents.revision, | ||
| sled_serial: $sled_idents.serial.clone().into(), | ||
| switch_id: $switch_idents.sidecar_id, | ||
| switch_model: $switch_idents.model.clone().into(), | ||
| switch_revision: $switch_idents.revision, | ||
| switch_serial: $switch_idents.serial.clone().into(), | ||
| switch_slot: $switch_idents.slot, | ||
| asic_fab: $switch_idents | ||
| .fab | ||
| .clone() | ||
| .map(|c| c.to_string()) | ||
| .unwrap_or_else(|| $switch_idents.asic_backend.to_string()) | ||
| .into(), | ||
| asic_lot: $switch_idents | ||
| .lot | ||
| .clone() | ||
| .map(|c| c.to_string()) | ||
| .unwrap_or_else(|| $switch_idents.asic_backend.to_string()) | ||
| .into(), | ||
| asic_wafer: $switch_idents.wafer.unwrap_or(0), | ||
| asic_wafer_loc_x: $switch_idents | ||
| .wafer_loc |
There was a problem hiding this comment.
Maybe we can extract all of this logic into a single macro that is called by all the other _counter macros?
Unless I'm reading it wrong, it seems like we're repeating the same logic in each macro
| asic_wafer: $switch_idents.wafer.unwrap_or(0), | ||
| asic_wafer_loc_x: $switch_idents | ||
| .wafer_loc | ||
| .map(|[x, _]| x) | ||
| .unwrap_or(0), | ||
| asic_wafer_loc_y: $switch_idents | ||
| .wafer_loc | ||
| .map(|[_, y]| y) | ||
| .unwrap_or(0), |
There was a problem hiding this comment.
Is .unwrap_or(0) the right thing to do in these calls?
Admittedly I'm unfamiliar with the oxstats side of things, but if 0 is valid then it seems like there would be no way to distinguish between "valid 0" and "hit an error 0".
| level: ConfigLoggingLevel::Debug, | ||
| }); | ||
| let registry = ProducerRegistry::new(); | ||
| let _handle = tokio::spawn(async move { |
There was a problem hiding this comment.
_handle is returned, so it probably shouldn't have an underscore prefix
| asic_fab: $switch_idents | ||
| .fab | ||
| .clone() | ||
| .map(|c| c.to_string()) | ||
| .unwrap_or_else(|| $switch_idents.asic_backend.to_string()) | ||
| .into(), | ||
| asic_lot: $switch_idents | ||
| .lot | ||
| .clone() | ||
| .map(|c| c.to_string()) | ||
| .unwrap_or_else(|| $switch_idents.asic_backend.to_string()) | ||
| .into(), |
There was a problem hiding this comment.
Is it intentional that both fab and lot default to a backend string instead of a different sentinel value?
| oximeter = { git = "https://github.com/oxidecomputer/omicron", branch = "zl/mgd-ddm-meta"} | ||
| oximeter-producer = { git = "https://github.com/oxidecomputer/omicron", branch = "zl/mgd-ddm-meta"} | ||
| oxnet = { version = "0.1.0", default-features = false, features = ["schemars", "serde"] } | ||
| omicron-common = { git = "https://github.com/oxidecomputer/omicron", branch = "main"} | ||
| omicron-common = { git = "https://github.com/oxidecomputer/omicron", branch = "zl/mgd-ddm-meta"} |
There was a problem hiding this comment.
Just noting here that we'll need to follow-up and point this back to main after the corresponding omicron PR is merged.
| <propval name='rack_uuid' type='astring' value='unknown' /> | ||
| <propval name='sled_uuid' type='astring' value='unknown' /> | ||
| <propval name='rack_id' type='astring' value='unknown' /> | ||
| <propval name='sled_id' type='astring' value='unknown' /> | ||
| <propval name='sled_model' type='astring' value='unknown' /> | ||
| <propval name='sled_revision' type='astring' value='unknown' /> | ||
| <propval name='sled_serial' type='astring' value='unknown' /> |
There was a problem hiding this comment.
I'm unfamiliar with the pre-existing implementation, but won't the string "unknown" fail to parse?
Is the assumption here that omicron will update the smf service with valid/parse-able values?
Add additional switch and sled metadata to bfd/bgp/routing/ddm timeseries.
Related: