Skip to content

Multirack: Wicketd API for cluster join config#10465

Open
andrewjstone wants to merge 6 commits into
mainfrom
wicket-multirack-join-support
Open

Multirack: Wicketd API for cluster join config#10465
andrewjstone wants to merge 6 commits into
mainfrom
wicket-multirack-join-support

Conversation

@andrewjstone
Copy link
Copy Markdown
Contributor

Add 2 new API methods to get and put a minimum configuration required for joining a new rack to an uninitialized cluster. This multirack join configuration is mutually exlcusive with the rss configuration, and so wicketd will only maintain one of them at a time. A wrapper enum, RssOrMultirackJoinConfig, was added to support this.

A couple of new wrapper types: BgpAuthKeys and SledInventory were introduced to commonize logic between RSS and Mulitrack Join.

RSS behavior should not have changed except in the case when a multirack join configuration has been uploaded and a user requests to get the current RSS config. In the old code, a default struct would be returned, now a NotFound is returned. The Join code follows the patterns of the RSS code as much as possible.

Notes for reviewers

  1. I expect that the exact configuration information is not quite right. Any feedback here would be appreciated. I did it this way to get something out the door.
  2. I purposefully did not add any support to wicket itself on purpose. This functionality really shouldn't be exposed before multirack is further down the line.

Add 2 new API methods to get and put a minimum configuration required
for joining a new rack to an uninitialized cluster. This multirack join
configuration is mutually exlcusive with the rss configuration, and
so wicketd will only maintain one of them at a time. A wrapper enum,
`RssOrMultirackJoinConfig`, was added to support this.

A couple of new wrapper types: `BgpAuthKeys` and `SledInventory` were
introduced to commonize logic between RSS and Mulitrack Join.

RSS behavior should not have changed except in the case when a multirack
join configuration has been uploaded and a user requests to get the
current RSS config. In the old code, a default struct would be returned,
now a `NotFound` is returned. The Join code follows the patterns of the
RSS code as much as possible.
use wicketd_api::SetBgpAuthKeyStatus;

pub(crate) struct CurrentMultirackJoinConfig {
inventory: SledInventory,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inventory, bootstrap_sleds, and bgp_auth_keys fields exist in both CurrentMultirackJoinConfig and CurrentRssConfig.

I'm wondering if there should be a wrapper struct around RssOrMultirackJoinConfig instead that centralizes those fields. That might help deduplicate things a bit more, but maybe not. The biggest win, if it matters, would be making it easier to maintain the bgp auth keys when the config changes. This could happen if, for example, a customer uploaded BGP keys first and then uploaded the MultirackJoinConfig. I could certainly make it so that we copy the fields across types, to fix this, but at that point I'd probably prefer a wrapper type with those fields.

Copy link
Copy Markdown
Contributor

@jgallagher jgallagher Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is hard, but something like this?

struct RssOrMultirackJoinConfig {
    common: RssOrMultirackJoinCommonConfig,
    flavor: RssOrMultirackJoinConfigFlavor,
}

where RssOrMultirackJoinCommonConfig is a struct with inventory etc and RssOrMultirackJoinConfigFlavor is an enum with just the fields unique to each flavor? I like that idea, assuming (a) the common fields are likely to stay common and (b) we don't want to be annoyed if we need to update them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is indeed the way to go. Thanks @jgallagher. I'll take a stab at it.

pub struct MultirackJoinConfigBaseUserInput {
/// List of slot numbers only.
pub bootstrap_slots: BTreeSet<u16>,
pub rack_network_config: UserSpecifiedRackNetworkConfig,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will work for the scenario where racks are interconnected over BGP. However, for the DDM case where racks are either directly plugged into each other or are connected over a switch and are on the same broadcast domain, we need to identify what external switch ports to treat as underlay ports. One way to introduce this could be turning UserSpecifiedPortConfig into an enum like

enum UserSpecifiedPortConfig {
   MultirackDdmPortConfig,
   SoloRackPortConfig { ... }
}

Where the SoloRackPortConfig contains what UserSpecifiedPortConfig currently contains. I think the multi rack DDM ports should be completely self configuring and not require explicit configuration.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's great. This is the turning on all the ports to listen for router announcements/solicitations option that we discussed, right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Listening for announcements/solicitations yes, but not necessarily all ports. If we take the above approach, just the ports the user indicates as having the MultirackDdmPortConfig.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Thanks. I had inferred from the lack of a data carrying enum variant on MultirackDdmPortConfig that it was all ports. But that's because I was referring to the UserSpecifiedRackNetworkConfig and not per port config. Poor reading comprehension on my part.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rcgoodfellow I'm looking closer at this as I'm about to start implementing, and I have a few questions.

Is the SoloRackPortConfig really only for the initial RSS rack? It seems like racks could have both types of port configs, and likely will.

What if instead we used something like the following:

enum UserSpecifiedPortConfig {
    DdmAutoPortConfig,
    BgpFullManualConfig
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and added some support for this in bb42a84

@andrewjstone andrewjstone requested a review from bnaecker May 26, 2026 19:41
@andrewjstone andrewjstone self-assigned this May 30, 2026
}

#[derive(Clone, Debug, PartialEq, Eq, Deserialize, Serialize, JsonSchema)]
#[serde(rename_all = "snake_case", tag = "type")]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a breaking change for the wicket TOML file? (That might be okay but will need to be coordinated with support; it might be worth figuring out if we can avoid it though.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a breaking change for the wicket TOML file. You have to add type = "manual" for each port now. I'd also prefer to skip making this breaking change now, but I think eventually we'll want to be able to differentiate at the top level like this per port, as suggested by @rcgoodfellow.

I'm definitely open to suggestions here though.

}
}

Ok(())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this fail if our_baseboard is None and we couldn't do the check, or if our_baseboard isn't a Baseboard::Gimlet? Edit: it looks like this code just moved; I imagine this is strongly related to the followup work you want to do on Baseboard / BaseboardId?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this is going away in the follow up PR, where we use BaseboardId in inventory instead. This was just for deduplication purposes.

Comment thread wicketd/src/rss_config.rs
}

/// Builds a [`PortConfig`] from a [`UserSpecifiedPortConfig`].
/// Builds a [`PortConfig`] from a [`wicket_common::rack_setup::UserSpecifiedPortConfig`].
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny style nit - can we wrap this to 80 cols?

} = cfg;
}) = cfg
else {
unimplemented!("DdmAutoPortConfig currently unsupported")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little nervous about merging this because I think it opens the door for someone to panic wicket by trying to upload a TOML file with type = "ddm-auto-port-confg", right?

This depends a lot on the question about TOML compatibility I left above, but assuming we keep the current type = ... change, could we return a table here with that single entry instead of panicking?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. That's an easy change to make if we keep this style.

Comment on lines +197 to +204
join_config.update_with_inventory_and_bootstrap_peers(
&inventory,
&ctx.bootstrap_peers,
&ctx.log,
);
join_config
.update(body.into_inner(), ctx.baseboard.as_ref())
.map_err(|err| HttpError::for_bad_request(None, err))?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these two operations need to be done in the opposite order? If body adds new bootstrap peers, will those have their IPs set from inventory?

let mut config = ctx.rss_or_multirack_join_config.lock().unwrap();

let rss_config = config.rss_config_mut().ok_or_else(|| {
HttpError::for_client_error(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this repeated enough we should promote it up to a helper? Something like

config.rss_config_mut_or_conflict("the conflict reason")?

or if a method on config doesn't make sense, a private function in this file like

rss_config_mut_or_conflict(&mut config, "the conflict reason")?;

?

use wicketd_api::SetBgpAuthKeyStatus;

pub(crate) struct CurrentMultirackJoinConfig {
inventory: SledInventory,
Copy link
Copy Markdown
Contributor

@jgallagher jgallagher Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is hard, but something like this?

struct RssOrMultirackJoinConfig {
    common: RssOrMultirackJoinCommonConfig,
    flavor: RssOrMultirackJoinConfigFlavor,
}

where RssOrMultirackJoinCommonConfig is a struct with inventory etc and RssOrMultirackJoinConfigFlavor is an enum with just the fields unique to each flavor? I like that idea, assuming (a) the common fields are likely to stay common and (b) we don't want to be annoyed if we need to update them.

UserSpecifiedPortConfig::Manual(cfg) => {
Some((SwitchSlot::Switch0, port.as_str(), cfg))
}
UserSpecifiedPortConfig::DdmAutoPortConfig => None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the TOML question above - this means iter_uplinks() silently discards any DdmAutoPortConfig links. That seems like it might end up being surprising - they won't be available to wicket, and wicketd will filter them out when building the RSS config. That latter one seems correct but maybe points to a structural issue - what would it mean for someone to upload an RSS config with an uplink of type DdmAutoPortConfig? Should we only use UserSpecifiedPortConfig for the multirack config toml, and keep using ManualPortConfig for the main RSS path?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I responded to your other comment, but what you are suggesting here seems reasonable. This means that we'd essentially have different types for RSS and Multirack Join configs that we then put into to the underlying enum once received (or not?). This would likely have pretty large ripple effects on this PR, but that's not a big deal.

What I'm not sure about is if this semantically makes sense though. You could picture a customer running RSS with the front ports configured for DDM expecting a multirack config soon after. If we didn't allow that we'd force the user to setup those ports via the API/console. That's also not a big deal, and we'll eventually need to provide that path. I'm just not sure we should eliminate it in RSS. That's how I ended up sharing so much code here.

@rcgoodfellow what do you think?

let mut config = ctx.rss_config.lock().unwrap();
let mut config = ctx.rss_or_multirack_join_config.lock().unwrap();
let rss_config = config.rss_config_mut().ok_or_else(|| {
HttpError::for_not_found(None, "rss config not found".to_string())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return CONFLICT instead of not found? The only way to have no RSS config is if someone explicitly uploaded a multirack config, right?

*config = Default::default();
let mut config = ctx.rss_or_multirack_join_config.lock().unwrap();
let rss_config = config.rss_config_mut().ok_or_else(|| {
HttpError::for_not_found(None, "rss config not found".to_string())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about whether this should be CONFLICT instead of 404

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants