DNM: kola testiso folding proof of concept by dustymabe · Pull Request #4511 · coreos/coreos-assembler

dustymabe · 2026-03-29T23:56:13Z

This builds on #4509 and #4510
and shows a way to keep some the testiso tests being defined the way they are today where there is just a list of "test names" that are then translated into actual tests that get run based on each of the components in the name (i.e. each component of install-media.miniso-install.4k.uefi means something and the test function picks up on it and does the right thing.

Store the test execution timeout context on RuntimeConfig.TestExecTimeout so that BaseCluster.SSH can enforce the timeout without requiring callers to pass a context.Context through every function signature. The harness sets TestExecTimeout to h.TimeoutContext() when building the RuntimeConfig in runTest(). BaseCluster.SSH uses Start()+Wait() with a select on this context, closing the SSH session when the context is cancelled. This enables us to essentially have timeout checking on every SSH() call we do. Written-by: <anthropic/claude-opus-4-6>

There were two deferred functions and we might as well combine them into one.

In the previous commit we plumbed through timeout/cancelling into every SSH command so now we don't really need to SkipStartMachine and then call mach.Start() inside a RunWithExecTimeoutCheck() here in RunTest any longer.

Some tests don't start their machines via the Harness, but rather directly in the tests via `NewMachine()`. In those cases we were releasing the memory reservation prematurely. Let's asynchronously wait for the machines to be up and then release the memory reservation.

The early return guard that checks t.ReservedMemoryCountMiB == 0 was performed outside the mutex. Since releaseMemoryCount can be called concurrently from both the async memory-release goroutine and the deferred cleanup function, this unsynchronized read could see a stale value under the Go memory model, potentially causing a double-subtract. Move the check inside the mutex to ensure the read of t.ReservedMemoryCountMiB has a proper happens-before relationship with the write that zeroes it. Written-by: <anthropic/claude-opus-4-6>

Maintaining separate QemuMachineOptions and MachineOptions structs led to complications: awkward two-level nesting at call sites, a shadowed Firmware field, and type assertions needed just to pass QEMU-specific options. Move the QEMU-only fields (HostForwardPorts, DisablePDeathSig, OverrideBackingFile, Nvme, Cex) into MachineOptions and delete the QemuMachineOptions type. Add EnsureNoQEMUOnlyOptions() so non-QEMU platforms (aws, azure, gcp, do, esx, openstack) reject these options early with a clear error. Keep NewMachineWithQemuOptions as a thin alias on qemu/qemuiso clusters for backward compatibility. Written-by: <anthropic/claude-opus-4.6>

MultiPathDisk, PrimaryDisk, MinMemory, NumaNodes, AdditionalNics, AppendKernelArgs, AppendFirstbootKernelArgs, and Firmware are only implemented by QEMU-based platforms. Move them below the QEMU-only comment boundary and check them in EnsureNoQEMUOnlyOptions(), removing the now-redundant per-platform rejection checks in the non-QEMU cluster implementations. The remaining cross-platform fields are AdditionalDisks (qemu, qemuiso, azure, gcloud), MinDiskSize (qemu, aws), and InstanceType (azure). Written-by: <anthropic/claude-opus-4.6>

Now that QemuMachineOptions has been folded into MachineOptions, NewMachineWithQemuOptions is just a trivial wrapper around NewMachineWithOptions. Delete it and update all callers to use NewMachineWithOptions through the Cluster interface directly. This removes the need for type assertions to *qemu.Cluster in tests that only used them to access NewMachineWithQemuOptions. Tests that still need the type assertion (luks tang setup, ostree sync, rhcos upgrade) keep it for platform-specific branching logic, but call NewMachineWithOptions through the interface rather than through the concrete type. Written-by: <anthropic/claude-opus-4.6>

Previously, kola spawn only entered the DisablePDeathSig / spawnMachineOptions code path when the platform was "qemu". This guard existed because DisablePDeathSig and the machine options JSON file were QEMU-specific concepts that required a type assertion to *qemu.Cluster. Now that MachineOptions is the single unified type used by all platforms, and non-QEMU platforms call EnsureNoQEMUOnlyOptions() in their NewMachineWithOptions(), the guard is no longer necessary: - On QEMU: behavior is unchanged. - On non-QEMU with --remove=false: previously this silently ignored the user's intent (DisablePDeathSig has no meaning outside QEMU) and created the machine via NewMachine() with no options. Now it calls NewMachineWithOptions() which will reject DisablePDeathSig with a clear error, surfacing the fact that the flag combination is unsupported rather than silently doing the wrong thing. - On non-QEMU with --machine-options: same improvement — the user gets a clear error if the JSON contains QEMU-only fields. Written-by: <anthropic/claude-opus-4.6>

…ptions The Test struct duplicated many fields from platform.MachineOptions (MultiPathDisk, AdditionalDisks, PrimaryDisk, MinMemory, MinDiskSize, NumaNodes, AdditionalNics, AppendKernelArgs, AppendFirstbootKernelArgs, InstanceType). The harness then manually copied each one into a MachineOptions before creating machines. Replace all of these with a single MachineOptions field on the Test struct. The harness now uses t.MachineOptions directly. Fields that are not MachineOptions (InjectContainer, ReservedMemoryCountMiB) stay on the Test struct. Update the two test files (multipath.go, network.go) that set these fields in their registrations, and update harness.go external test registration to populate the MachineOptions sub-struct. Written-by: <anthropic/claude-opus-4.6>

The basic.uefi, basic.uefi-secure, and basic.nvme tests previously set ClusterSize to 0 and manually created machines inside wrapper functions (uefiWithBasicTests, uefiSecureWithBasicTests, nvmeBasicTests) just to pass Firmware/Nvme in MachineOptions. They then manually called ScpKolet and LocalTests. Now that the Test struct carries MachineOptions directly, these tests can set ClusterSize to 1 with the desired MachineOptions and use LocalTests as their Run function. The harness handles machine creation, kolet upload, and native function dispatch automatically. This removes the three wrapper functions, runBasicTests, and the kola/cluster imports that were only needed for them. Written-by: <anthropic/claude-opus-4.6>

openshift-ci · 2026-03-29T23:56:18Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

gemini-code-assist

Code Review

This pull request performs a major refactoring of the kola test harness and platform abstraction layer. It consolidates QEMU-specific machine options into a unified MachineOptions struct and introduces a TestExecTimeout context within the platform's SSH implementation to handle command timeouts automatically. Additionally, it integrates the install-media tests into the standard kola registry and simplifies machine creation across several test suites. I have no feedback to provide as there were no review comments to assess.

This merges mantle/cmd/kola/testiso.go into mantle/kola/tests/install-media/test-install-media.go. It keeps most of the actual testing the same but switches it so that now the tests are just registered with the harness and then called just like any other kola test. There is a lot of work still to do to dedupe code with mantle/platform/qemu but it at least compiles and the tests run.

They can be run as part of the normal kola run now. Let's just hack the jenkinsfile up a bit to test this out.

dustymabe · 2026-03-29T23:57:28Z

mantle/kola/tests/install-media/test-install-media.go

-	root.AddCommand(cmdTestIso)
+	for _, test := range getAllTests(kola.CosaBuild) {
+		register.RegisterTest(&register.Test{
+			Name:        "install-media." + test,


This is a good opportunity to reclassify these tests as "install-media" tests because not all of them actually test ISO images.

dustymabe · 2026-03-29T23:57:56Z

mantle/kola/tests/install-media/test-install-media.go

+		register.RegisterTest(&register.Test{
+			Name:        "install-media." + test,
+			Description: "The %s install test",
+			Run:         runTestInstall,


the runTestInstall function is mostly what we had in the past. It parses the test name and then dispatches the right test.

dustymabe · 2026-03-29T23:59:01Z

mantle/kola/tests/install-media/test-install-media.go

+	addNmKeyfile = false
+	enable4k = false
+	enableMultipath = false
+	enableUefi = false
+	enableUefiSecure = false
+	isOffline = false


These globals won't really work if we were to run tests in parallel. @nikita-dubrovskii PR #4377 doesn't have this problem.

openshift-ci · 2026-04-01T19:02:23Z

@dustymabe: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/images	`b77efce`	link	true	`/test images`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci · 2026-04-02T09:45:57Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

dustymabe added 11 commits March 28, 2026 14:42

mantle/kola/harness: combine deferred functions in runTest

4190b80

There were two deferred functions and we might as well combine them into one.

mantle/platform: drop SkipStartMachine=true behavior

89cdd32

In the previous commit we plumbed through timeout/cancelling into every SSH command so now we don't really need to SkipStartMachine and then call mach.Start() inside a RunWithExecTimeoutCheck() here in RunTest any longer.

openshift-ci bot added the do-not-merge/work-in-progress label Mar 29, 2026

gemini-code-assist bot reviewed Mar 29, 2026

View reviewed changes

dustymabe added 2 commits March 30, 2026 08:28

.cci.Jenkinsfile: adjust logic for running testiso tests

b77efce

They can be run as part of the normal kola run now. Let's just hack the jenkinsfile up a bit to test this out.

dustymabe force-pushed the dusty-foldiso-alt branch from ffb0696 to b77efce Compare March 30, 2026 12:28

dustymabe changed the title ~~DNM: kola testios folding proof of concept~~ DNM: kola testiso folding proof of concept Mar 30, 2026

dustymabe commented Mar 30, 2026

View reviewed changes

openshift-ci bot added the needs-rebase label Apr 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNM: kola testiso folding proof of concept#4511

DNM: kola testiso folding proof of concept#4511
dustymabe wants to merge 13 commits intocoreos:mainfrom
dustymabe:dusty-foldiso-alt

dustymabe commented Mar 29, 2026 •

edited

Loading

Uh oh!

openshift-ci bot commented Mar 29, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

dustymabe Mar 29, 2026

Uh oh!

dustymabe Mar 29, 2026

Uh oh!

dustymabe Mar 29, 2026

Uh oh!

openshift-ci bot commented Apr 1, 2026

Uh oh!

openshift-ci bot commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dustymabe commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Mar 29, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

dustymabe Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

dustymabe Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

dustymabe Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Apr 1, 2026

Uh oh!

openshift-ci bot commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dustymabe commented Mar 29, 2026 •

edited

Loading