Multirack: Wicketd API for cluster join config#10465
Conversation
Add 2 new API methods to get and put a minimum configuration required for joining a new rack to an uninitialized cluster. This multirack join configuration is mutually exlcusive with the rss configuration, and so wicketd will only maintain one of them at a time. A wrapper enum, `RssOrMultirackJoinConfig`, was added to support this. A couple of new wrapper types: `BgpAuthKeys` and `SledInventory` were introduced to commonize logic between RSS and Mulitrack Join. RSS behavior should not have changed except in the case when a multirack join configuration has been uploaded and a user requests to get the current RSS config. In the old code, a default struct would be returned, now a `NotFound` is returned. The Join code follows the patterns of the RSS code as much as possible.
| use wicketd_api::SetBgpAuthKeyStatus; | ||
|
|
||
| pub(crate) struct CurrentMultirackJoinConfig { | ||
| inventory: SledInventory, |
There was a problem hiding this comment.
inventory, bootstrap_sleds, and bgp_auth_keys fields exist in both CurrentMultirackJoinConfig and CurrentRssConfig.
I'm wondering if there should be a wrapper struct around RssOrMultirackJoinConfig instead that centralizes those fields. That might help deduplicate things a bit more, but maybe not. The biggest win, if it matters, would be making it easier to maintain the bgp auth keys when the config changes. This could happen if, for example, a customer uploaded BGP keys first and then uploaded the MultirackJoinConfig. I could certainly make it so that we copy the fields across types, to fix this, but at that point I'd probably prefer a wrapper type with those fields.
There was a problem hiding this comment.
Naming is hard, but something like this?
struct RssOrMultirackJoinConfig {
common: RssOrMultirackJoinCommonConfig,
flavor: RssOrMultirackJoinConfigFlavor,
}where RssOrMultirackJoinCommonConfig is a struct with inventory etc and RssOrMultirackJoinConfigFlavor is an enum with just the fields unique to each flavor? I like that idea, assuming (a) the common fields are likely to stay common and (b) we don't want to be annoyed if we need to update them.
There was a problem hiding this comment.
I think this is indeed the way to go. Thanks @jgallagher. I'll take a stab at it.
| pub struct MultirackJoinConfigBaseUserInput { | ||
| /// List of slot numbers only. | ||
| pub bootstrap_slots: BTreeSet<u16>, | ||
| pub rack_network_config: UserSpecifiedRackNetworkConfig, |
There was a problem hiding this comment.
I think this will work for the scenario where racks are interconnected over BGP. However, for the DDM case where racks are either directly plugged into each other or are connected over a switch and are on the same broadcast domain, we need to identify what external switch ports to treat as underlay ports. One way to introduce this could be turning UserSpecifiedPortConfig into an enum like
enum UserSpecifiedPortConfig {
MultirackDdmPortConfig,
SoloRackPortConfig { ... }
}Where the SoloRackPortConfig contains what UserSpecifiedPortConfig currently contains. I think the multi rack DDM ports should be completely self configuring and not require explicit configuration.
There was a problem hiding this comment.
Ah, that's great. This is the turning on all the ports to listen for router announcements/solicitations option that we discussed, right?
There was a problem hiding this comment.
Listening for announcements/solicitations yes, but not necessarily all ports. If we take the above approach, just the ports the user indicates as having the MultirackDdmPortConfig.
There was a problem hiding this comment.
Make sense. Thanks. I had inferred from the lack of a data carrying enum variant on MultirackDdmPortConfig that it was all ports. But that's because I was referring to the UserSpecifiedRackNetworkConfig and not per port config. Poor reading comprehension on my part.
There was a problem hiding this comment.
@rcgoodfellow I'm looking closer at this as I'm about to start implementing, and I have a few questions.
Is the SoloRackPortConfig really only for the initial RSS rack? It seems like racks could have both types of port configs, and likely will.
What if instead we used something like the following:
enum UserSpecifiedPortConfig {
DdmAutoPortConfig,
BgpFullManualConfig
}There was a problem hiding this comment.
I went ahead and added some support for this in bb42a84
| } | ||
|
|
||
| #[derive(Clone, Debug, PartialEq, Eq, Deserialize, Serialize, JsonSchema)] | ||
| #[serde(rename_all = "snake_case", tag = "type")] |
There was a problem hiding this comment.
Is this a breaking change for the wicket TOML file? (That might be okay but will need to be coordinated with support; it might be worth figuring out if we can avoid it though.)
There was a problem hiding this comment.
Yes, this is a breaking change for the wicket TOML file. You have to add type = "manual" for each port now. I'd also prefer to skip making this breaking change now, but I think eventually we'll want to be able to differentiate at the top level like this per port, as suggested by @rcgoodfellow.
I'm definitely open to suggestions here though.
| } | ||
| } | ||
|
|
||
| Ok(()) |
There was a problem hiding this comment.
Should this fail if our_baseboard is None and we couldn't do the check, or if our_baseboard isn't a Baseboard::Gimlet? Edit: it looks like this code just moved; I imagine this is strongly related to the followup work you want to do on Baseboard / BaseboardId?
There was a problem hiding this comment.
Yup, this is going away in the follow up PR, where we use BaseboardId in inventory instead. This was just for deduplication purposes.
| } | ||
|
|
||
| /// Builds a [`PortConfig`] from a [`UserSpecifiedPortConfig`]. | ||
| /// Builds a [`PortConfig`] from a [`wicket_common::rack_setup::UserSpecifiedPortConfig`]. |
There was a problem hiding this comment.
Tiny style nit - can we wrap this to 80 cols?
| } = cfg; | ||
| }) = cfg | ||
| else { | ||
| unimplemented!("DdmAutoPortConfig currently unsupported") |
There was a problem hiding this comment.
I'm a little nervous about merging this because I think it opens the door for someone to panic wicket by trying to upload a TOML file with type = "ddm-auto-port-confg", right?
This depends a lot on the question about TOML compatibility I left above, but assuming we keep the current type = ... change, could we return a table here with that single entry instead of panicking?
There was a problem hiding this comment.
Good point. That's an easy change to make if we keep this style.
| join_config.update_with_inventory_and_bootstrap_peers( | ||
| &inventory, | ||
| &ctx.bootstrap_peers, | ||
| &ctx.log, | ||
| ); | ||
| join_config | ||
| .update(body.into_inner(), ctx.baseboard.as_ref()) | ||
| .map_err(|err| HttpError::for_bad_request(None, err))?; |
There was a problem hiding this comment.
Do these two operations need to be done in the opposite order? If body adds new bootstrap peers, will those have their IPs set from inventory?
| let mut config = ctx.rss_or_multirack_join_config.lock().unwrap(); | ||
|
|
||
| let rss_config = config.rss_config_mut().ok_or_else(|| { | ||
| HttpError::for_client_error( |
There was a problem hiding this comment.
Is this repeated enough we should promote it up to a helper? Something like
config.rss_config_mut_or_conflict("the conflict reason")?or if a method on config doesn't make sense, a private function in this file like
rss_config_mut_or_conflict(&mut config, "the conflict reason")?;?
| use wicketd_api::SetBgpAuthKeyStatus; | ||
|
|
||
| pub(crate) struct CurrentMultirackJoinConfig { | ||
| inventory: SledInventory, |
There was a problem hiding this comment.
Naming is hard, but something like this?
struct RssOrMultirackJoinConfig {
common: RssOrMultirackJoinCommonConfig,
flavor: RssOrMultirackJoinConfigFlavor,
}where RssOrMultirackJoinCommonConfig is a struct with inventory etc and RssOrMultirackJoinConfigFlavor is an enum with just the fields unique to each flavor? I like that idea, assuming (a) the common fields are likely to stay common and (b) we don't want to be annoyed if we need to update them.
| UserSpecifiedPortConfig::Manual(cfg) => { | ||
| Some((SwitchSlot::Switch0, port.as_str(), cfg)) | ||
| } | ||
| UserSpecifiedPortConfig::DdmAutoPortConfig => None, |
There was a problem hiding this comment.
Related to the TOML question above - this means iter_uplinks() silently discards any DdmAutoPortConfig links. That seems like it might end up being surprising - they won't be available to wicket, and wicketd will filter them out when building the RSS config. That latter one seems correct but maybe points to a structural issue - what would it mean for someone to upload an RSS config with an uplink of type DdmAutoPortConfig? Should we only use UserSpecifiedPortConfig for the multirack config toml, and keep using ManualPortConfig for the main RSS path?
There was a problem hiding this comment.
Hmm. I responded to your other comment, but what you are suggesting here seems reasonable. This means that we'd essentially have different types for RSS and Multirack Join configs that we then put into to the underlying enum once received (or not?). This would likely have pretty large ripple effects on this PR, but that's not a big deal.
What I'm not sure about is if this semantically makes sense though. You could picture a customer running RSS with the front ports configured for DDM expecting a multirack config soon after. If we didn't allow that we'd force the user to setup those ports via the API/console. That's also not a big deal, and we'll eventually need to provide that path. I'm just not sure we should eliminate it in RSS. That's how I ended up sharing so much code here.
@rcgoodfellow what do you think?
| let mut config = ctx.rss_config.lock().unwrap(); | ||
| let mut config = ctx.rss_or_multirack_join_config.lock().unwrap(); | ||
| let rss_config = config.rss_config_mut().ok_or_else(|| { | ||
| HttpError::for_not_found(None, "rss config not found".to_string()) |
There was a problem hiding this comment.
Should this return CONFLICT instead of not found? The only way to have no RSS config is if someone explicitly uploaded a multirack config, right?
| *config = Default::default(); | ||
| let mut config = ctx.rss_or_multirack_join_config.lock().unwrap(); | ||
| let rss_config = config.rss_config_mut().ok_or_else(|| { | ||
| HttpError::for_not_found(None, "rss config not found".to_string()) |
There was a problem hiding this comment.
Same question about whether this should be CONFLICT instead of 404
Add 2 new API methods to get and put a minimum configuration required for joining a new rack to an uninitialized cluster. This multirack join configuration is mutually exlcusive with the rss configuration, and so wicketd will only maintain one of them at a time. A wrapper enum,
RssOrMultirackJoinConfig, was added to support this.A couple of new wrapper types:
BgpAuthKeysandSledInventorywere introduced to commonize logic between RSS and Mulitrack Join.RSS behavior should not have changed except in the case when a multirack join configuration has been uploaded and a user requests to get the current RSS config. In the old code, a default struct would be returned, now a
NotFoundis returned. The Join code follows the patterns of the RSS code as much as possible.Notes for reviewers