Skip to content

Optimizations to the Nexus schema tests#8271

Merged
smklein merged 36 commits intomainfrom
schema-optimization
Mar 27, 2026
Merged

Optimizations to the Nexus schema tests#8271
smklein merged 36 commits intomainfrom
schema-optimization

Conversation

@smklein
Copy link
Copy Markdown
Collaborator

@smklein smklein commented Jun 4, 2025

This PR optimizes several of the Nexus schema tests:

  • It uses a "baseline" version of dbinit.sql to avoid scaling linearly with the number of schema changes. It adds the cargo xtask schema generate-base command to help pull down this baseline.
  • This PR also adds cargo xtask schema old-migrations to assist with cleaning up old migrations that are before the "baseline", and are no longer supported.
  • It changes a few of the schema tests to compare "baseline" -> "latest" versions, rather than applying all schema changes that have ever been created.
 $ cargo nt -p omicron-nexus -- integration_tests::schema
 ...
        PASS [   2.152s] omicron-nexus::test_all integration_tests::schema::dbinit_version_matches_version_known_to_nexus
        PASS [   8.946s] omicron-nexus::test_all integration_tests::schema::nexus_applies_update_on_boot
        PASS [  10.831s] omicron-nexus::test_all integration_tests::schema::compare_table_differing_constraint
        PASS [  10.861s] omicron-nexus::test_all integration_tests::schema::compare_sequence_differing_increment
        PASS [  10.997s] omicron-nexus::test_all integration_tests::schema::compare_index_creation_differing_columns
        PASS [  11.043s] omicron-nexus::test_all integration_tests::schema::compare_view_differing_where_clause
        PASS [  11.075s] omicron-nexus::test_all integration_tests::schema::compare_table_differing_not_null_order
        PASS [  11.111s] omicron-nexus::test_all integration_tests::schema::compare_index_creation_differing_where_clause
        PASS [   8.621s] omicron-nexus::test_all integration_tests::schema::update_since_base_has_idempotent_up
        PASS [   8.008s] omicron-nexus::test_all integration_tests::schema::validate_data_migrations
        PASS [  30.662s] omicron-nexus::test_all integration_tests::schema::nexus_cannot_apply_update_from_unknown_version
        PASS [  23.131s] omicron-nexus::test_all integration_tests::schema::validate_migration_from_base_version
────────────
     Summary [  34.713s] 12 tests run: 12 passed, 604 skipped

Fixes #10005

Comment on lines -1332 to -1333
// Used as a regression test against
// https://github.com/oxidecomputer/omicron/issues/5561
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full admission, I'm dropping this regression test. It's sorta clinging onto the side of this data migration test, but is actually not validating data migration.

It was originally added in #5565, and moved in #7471 , but it's painful to maintain, because it relies on "finding an enum that has been created, dropped, and re-created with the same name", and then applying all updates to the most recent schema so we can issue diesel commands against it.

This cache has been gone for over a year - I think we can re-visit this if/when we ever add back an OID cache.

@smklein smklein marked this pull request as ready for review June 5, 2025 17:24
Base automatically changed from phys-disk-table-for-u.2s-only to main June 5, 2025 18:09
@smklein smklein requested review from davepacheco and sunshowers June 6, 2025 19:40
@david-crespo
Copy link
Copy Markdown
Contributor

Noting that the kv.raft_log.disable_synchronization_unsafe change mentioned in the description was pulled out to #8275.

@smklein
Copy link
Copy Markdown
Collaborator Author

smklein commented Jun 26, 2025

Noting that the kv.raft_log.disable_synchronization_unsafe change mentioned in the description was pulled out to #8275.

Good call, updated the description

`NEW_VERSION` and still appears at the top of the list (logically after the
new version that came in from "main").
* Update the version in `dbinit.sql` to match the new `NEW_VERSION`.
* Re-run `cargo xtask schema generate-previous`.
Copy link
Copy Markdown
Contributor

@david-crespo david-crespo Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what is supposed to happen (like in CI) if you fail to do this

@davepacheco
Copy link
Copy Markdown
Collaborator

Should we consider just removing old schema versions altogether?

I'm imagining something where:

  • at runtime, instead of starting with schema/crdb/1.0.0/up.sql, we start with a new file called schema/crdb/dbinit-base.sql (initial contents would be copied from schema/crdb/1.0.0/up.sql)
  • immediately after cutting release N, we:
    • copy the dbinit.sql file from release N - 1 to dbinit-base.sql
    • delete all the schema versions (directories and entries from schema_versions.rs) that were present in release N - 1 (because those are incorporated into its dbinit-base.sql)

Concretely, say that today:

  • release 10: contains schema versions 1-12 + dbinit.sql representing version 12
  • release 11: contains schema versions 1-15 + dbinit.sql representing version 15
  • release 12: contains schema versions 1-20 + dbinit.sql representing version 20
  • main: contains schema versions 1-23 + dbinit.sql representing version 23

In this world:

  • release 11:
    • contains dbinit-base.sql that's equivalent to dbinit.sql from version 12 (the one from release 10)
    • contains schema migrations for versions 13 - 15
    • contains dbinit.sql representing version 15
  • release 12:
    • contains dbinit-base.sql that's equivalent to dbinit.sql from version 15 (the one from release 11)
    • contains schema migrations for versions 16-20
    • contains dbinit.sql representing version 20
  • main:
    • contains dbinit-base.sql that's equivalent to dbinit.sql from version 20 (the one from release 12)
    • contains schema migrations for versions 21-23
    • contains dbinit.sql representing version 23

Basically:

  • dbinit.sql is exactly as it is today: it's a human-readable version of the latest schema
  • the migrations are just as they are today and the process for adding one is the same as today
  • the only difference is that we keep updating a "base" file with each release and have the tests use that

I think it's also pretty easy to do mechanically, though probably annoying to automate.

@smklein
Copy link
Copy Markdown
Collaborator Author

smklein commented Jun 30, 2025

Should we consider just removing old schema versions altogether?

I'm imagining something where:

  • at runtime, instead of starting with schema/crdb/1.0.0/up.sql, we start with a new file called schema/crdb/dbinit-base.sql (initial contents would be copied from schema/crdb/1.0.0/up.sql)

  • immediately after cutting release N, we:

    • copy the dbinit.sql file from release N - 1 to dbinit-base.sql
    • delete all the schema versions (directories and entries from schema_versions.rs) that were present in release N - 1 (because those are incorporated into its dbinit-base.sql)

Concretely, say that today:

  • release 10: contains schema versions 1-12 + dbinit.sql representing version 12
  • release 11: contains schema versions 1-15 + dbinit.sql representing version 15
  • release 12: contains schema versions 1-20 + dbinit.sql representing version 20
  • main: contains schema versions 1-23 + dbinit.sql representing version 23

In this world:

  • release 11:

    • contains dbinit-base.sql that's equivalent to dbinit.sql from version 12 (the one from release 10)
    • contains schema migrations for versions 13 - 15
    • contains dbinit.sql representing version 15
  • release 12:

    • contains dbinit-base.sql that's equivalent to dbinit.sql from version 15 (the one from release 11)
    • contains schema migrations for versions 16-20
    • contains dbinit.sql representing version 20
  • main:

    • contains dbinit-base.sql that's equivalent to dbinit.sql from version 20 (the one from release 12)
    • contains schema migrations for versions 21-23
    • contains dbinit.sql representing version 23

Basically:

  • dbinit.sql is exactly as it is today: it's a human-readable version of the latest schema
  • the migrations are just as they are today and the process for adding one is the same as today
  • the only difference is that we keep updating a "base" file with each release and have the tests use that

I think it's also pretty easy to do mechanically, though probably annoying to automate.

Sounds totally reasonable to me. I might tweak some of the names and things, but if we're fine dropping old support beyond the past release, this seems like it would clean up a lot, and let us release a lot of the data migration tests too.

I'll give this a swing. I think I can modify the existing xtask to handle a lot of the functionality of "check out the base dbinit we want".

@smklein
Copy link
Copy Markdown
Collaborator Author

smklein commented Jul 1, 2025

Okay, went ahead and implemented this, using rel/v15/rc1 (aka 189adbb0b39f3c1c4ac2371aaf4cac18c571f9bb) as our "baseline" dbinit.sql -- but it would be very straightforward to select something else as the "oldest-supported version", if we'd like.

@smklein
Copy link
Copy Markdown
Collaborator Author

smklein commented Mar 26, 2026

Okay! This PR should be updated. Of note:

  • I'm still moving the baseline to rel/v15/rc1. We could move it further forward. Happy to do this in a follow-up PR -- the meat of this PR is "building the mechanism", which should make subsequent "base-moving" easier. (edit: tested throwing claude at this for an upgrade to v16, and it aced it first try: Update dbinit-base to rel/v16.1/rc0 #10165)
  • I've updated the docs to include instructions of "how to move forward the baseline". IMO the most manual part of this, for now, is figuring out how to clean up the data migration tests. I'm going to try to make this easier in a follow-up PR (now done in Improve ergonomics of data migration test #10164).

All that said, this should be ready for review!

Copy link
Copy Markdown
Contributor

@mergeconflict mergeconflict left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment on lines +3436 to +3449
// Parse the base schema version from the header
for line in content.lines() {
if line.starts_with("-- Schema version:") {
let version_str = line
.strip_prefix("-- Schema version:")
.ok_or_else(|| {
anyhow::anyhow!("Invalid schema version line format")
})?
.trim();
return semver::Version::parse(version_str).with_context(|| {
format!("Failed to parse base schema version: {}", version_str)
});
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this duplicates parse_base_schema_version in dev-tools/schema/src/main.rs. I wonder if there's a reasonable way to have these share code? Possibly not.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulled it into nexus/db-model, since that's a common dependency, and it's a cheap function to compile

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, ok. Wasn't sure offhand if there were any sensible deps in common.

Comment on lines +47 to +58
fn find_workspace_root() -> Result<Utf8PathBuf> {
let current_dir =
std::env::current_dir().context("Failed to get current directory")?;
let current_dir = Utf8PathBuf::try_from(current_dir)
.context("Current directory path is not valid UTF-8")?;

current_dir
.ancestors()
.find(|p| p.join("Cargo.toml").exists() && p.join(".git").exists())
.map(|p| p.to_path_buf())
.ok_or_else(|| anyhow::anyhow!("Could not find workspace root"))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use cargo locate-project --workspace here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat; updated to use this in 777a605

Comment on lines +265 to +267
println!(
" - Update EARLIEST_SUPPORTED_VERSION in schema_versions.rs if needed"
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should actually bail here if EARLIEST_SUPPORTED_VERSION != base_version?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this check in 777a605

Comment on lines +866 to +867
#[allow(dead_code)]
log: &'a Logger,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to keep this around? I think it's unused.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

==== Updating the baseline after a release

After each Omicron release, update the baseline to match the previous release.
The "baseline" will become the new "earlist supported CockroachDB schema version",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: earlist

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


let file_content = String::from_utf8(show_output.stdout)?;

// Parse the SCHEMA_VERSION from the file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit fragile, although I can't really imagine a better way to do it. I wonder, if it fails for whatever reason, should we bail rather than returning Ok(None)?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can just make get_schema_version_from_commit return a Result<Version>, and have this case be an error.

Updated in 777a605

@smklein smklein merged commit 74b7742 into main Mar 27, 2026
18 checks passed
@smklein smklein deleted the schema-optimization branch March 27, 2026 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test failed in CI: nexus_applies_update_on_boot

4 participants