Skip to content

kola: add KubeVirt platform support#4507

Open
qinqon wants to merge 4 commits intocoreos:mainfrom
qinqon:kubevirt-kola-platform
Open

kola: add KubeVirt platform support#4507
qinqon wants to merge 4 commits intocoreos:mainfrom
qinqon:kubevirt-kola-platform

Conversation

@qinqon
Copy link
Copy Markdown

@qinqon qinqon commented Mar 26, 2026

Summary

  • Add a new kubevirt platform to kola for testing CoreOS on KubeVirt VMs
  • Enable end-to-end testing of afterburn's KubeVirt metadata provider (ConfigDrive and NoCloud)
  • Extend external test framework to support network data files

Details

Platform implementation

  • API wrapper using sigs.k8s.io/controller-runtime client + kubevirt.io/api types (lightweight alternative to kubevirt.io/client-go)
  • SSH access via WebSocket port-forward through the KubeVirt API server (no virtctl binary dependency)
  • containerDisk support for FCOS images (cosa already builds these)
  • Per-test CloudInitType and NetworkData via MachineOptions

Closes #4508

External test framework

  • network_data.json in test dir → ConfigDrive cloud-init type (OpenStack convention)
  • network-config in test dir → NoCloud cloud-init type (cloud-init convention)
  • Silently ignored on platforms that don't support network data (qemu, aws, etc.)

New CLI flags

--kubevirt-kubeconfig     Path to kubeconfig
--kubevirt-namespace      Kubernetes namespace for VMs
--kubevirt-image          Container disk image pull spec
--kubevirt-cloud-init-type  Cloud-init type: configdrive or nocloud
--kubevirt-memory         VM memory (default 2Gi)
--kubevirt-cpus           VM CPUs (default 2)

New built-in tests

  • fcos.metadata.kubevirt.configdrive — verify afterburn metadata with ConfigDrive
  • fcos.metadata.kubevirt.nocloud — verify afterburn metadata with NoCloud

Test plan

  • go build ./mantle/cmd/kola/... compiles clean
  • go vet passes
  • Tested kola spawn against a kind cluster with KubeVirt — VM boots, SSH works through WebSocket tunnel
  • Tested kola run fcos.metadata.kubevirt.configdrive — PASS
  • Test kola run fcos.metadata.kubevirt.nocloud
  • Test external test with network_data.json file
  • Test external test with network-config file
  • Vendor directory completeness (go mod vendor)

TODO

We may need a mechanism to specify kubevirt VMs inetrfaces and network since that may be needed to propertly tests afterburn network data support.

🤖 Generated with Claude Code

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 26, 2026

Hi @qinqon. Thanks for your PR.

I'm waiting for a coreos member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates KubeVirt as a new platform into the mantle/kola testing framework, enabling VM lifecycle management and SSH port forwarding for KubeVirt-based tests. The changes include adding KubeVirt-specific options, API implementations, and machine/cluster logic. Review feedback highlights several areas for improvement: the VM deletion polling loop should specifically check for NotFound errors to prevent premature termination, rand.Read failures for VM name generation should panic to ensure unique names, and there are opportunities to refactor duplicated code for handling network data and KubeVirt metadata verification to improve conciseness and maintainability. Additionally, the Kubernetes config loading logic could be simplified for better readability.

Comment on lines +179 to +181
if err != nil {
return true, nil // Gone
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation incorrectly assumes that any error from a.client.Get means the VM is gone. This could lead to premature termination of the polling loop if a transient error occurs. You should specifically check for a NotFound error.

You'll need to add an import for k8s.io/apimachinery/pkg/api/errors as apierrors.

    if err != nil {
      if apierrors.IsNotFound(err) {
        return true, nil // Gone
      }
      return false, err // Propagate other errors to stop polling.
    }

Comment on lines +117 to +119
if _, err := rand.Read(b); err != nil {
plog.Errorf("failed to generate a random vmname: %v", err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If rand.Read fails, the error is logged but the function continues with a zero-filled byte slice. This will result in non-unique VM names and could cause test flakiness. Since this function doesn't return an error, it should panic on failure to prevent creating VMs with predictable names.

Suggested change
if _, err := rand.Read(b); err != nil {
plog.Errorf("failed to generate a random vmname: %v", err)
}
if _, err := rand.Read(b); err != nil {
panic(fmt.Sprintf("failed to generate a random vmname: %v", err))
}

Comment on lines +1393 to +1412
} else if isreg && c.Name() == "network_data.json" {
if cloudInitType != "" {
return fmt.Errorf("found both network_data.json and network-config in %s; use only one", dir)
}
v, err := os.ReadFile(filepath.Join(dir, c.Name()))
if err != nil {
return errors.Wrapf(err, "reading %s", c.Name())
}
networkData = string(v)
cloudInitType = "configdrive"
} else if isreg && c.Name() == "network-config" {
if cloudInitType != "" {
return fmt.Errorf("found both network_data.json and network-config in %s; use only one", dir)
}
v, err := os.ReadFile(filepath.Join(dir, c.Name()))
if err != nil {
return errors.Wrapf(err, "reading %s", c.Name())
}
networkData = string(v)
cloudInitType = "nocloud"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's some code duplication in handling network_data.json and network-config. You can refactor this to make it more concise and maintainable by combining the logic into a single if block.

Suggested change
} else if isreg && c.Name() == "network_data.json" {
if cloudInitType != "" {
return fmt.Errorf("found both network_data.json and network-config in %s; use only one", dir)
}
v, err := os.ReadFile(filepath.Join(dir, c.Name()))
if err != nil {
return errors.Wrapf(err, "reading %s", c.Name())
}
networkData = string(v)
cloudInitType = "configdrive"
} else if isreg && c.Name() == "network-config" {
if cloudInitType != "" {
return fmt.Errorf("found both network_data.json and network-config in %s; use only one", dir)
}
v, err := os.ReadFile(filepath.Join(dir, c.Name()))
if err != nil {
return errors.Wrapf(err, "reading %s", c.Name())
}
networkData = string(v)
cloudInitType = "nocloud"
} else if isreg && (c.Name() == "network_data.json" || c.Name() == "network-config") {
if cloudInitType != "" {
return fmt.Errorf("found both network_data.json and network-config in %s; use only one", dir)
}
v, err := os.ReadFile(filepath.Join(dir, c.Name()))
if err != nil {
return errors.Wrapf(err, "reading %s", c.Name())
}
networkData = string(v)
if c.Name() == "network_data.json" {
cloudInitType = "configdrive"
} else {
cloudInitType = "nocloud"
}

Comment on lines +91 to +111
func verifyKubeVirtConfigDrive(c cluster.TestCluster) {
opts := platform.MachineOptions{
CloudInitType: "configdrive",
}
_, err := c.NewMachineWithOptions(enableMetadataService, opts)
if err != nil {
c.Fatalf("Unable to create machine: %v", err)
}
verify(c, "AFTERBURN_KUBEVIRT_INSTANCE_ID", "AFTERBURN_KUBEVIRT_HOSTNAME")
}

func verifyKubeVirtNoCloud(c cluster.TestCluster) {
opts := platform.MachineOptions{
CloudInitType: "nocloud",
}
_, err := c.NewMachineWithOptions(enableMetadataService, opts)
if err != nil {
c.Fatalf("Unable to create machine: %v", err)
}
verify(c, "AFTERBURN_KUBEVIRT_INSTANCE_ID", "AFTERBURN_KUBEVIRT_HOSTNAME")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The functions verifyKubeVirtConfigDrive and verifyKubeVirtNoCloud are nearly identical. To improve maintainability and reduce code duplication, you could extract the common logic into a helper function.

For example:

func verifyKubeVirt(c cluster.TestCluster, cloudInitType string) {
	opts := platform.MachineOptions{
		CloudInitType: cloudInitType,
	}
	_, err := c.NewMachineWithOptions(enableMetadataService, opts)
	if err != nil {
		c.Fatalf("Unable to create machine: %v", err)
	}
	verify(c, "AFTERBURN_KUBEVIRT_INSTANCE_ID", "AFTERBURN_KUBEVIRT_HOSTNAME")
}

func verifyKubeVirtConfigDrive(c cluster.TestCluster) {
	verifyKubeVirt(c, "configdrive")
}

func verifyKubeVirtNoCloud(c cluster.TestCluster) {
	verifyKubeVirt(c, "nocloud")
}

Comment on lines +88 to +99
if opts.Kubeconfig != "" {
config, err = clientcmd.BuildConfigFromFlags("", opts.Kubeconfig)
} else {
// Try in-cluster first, fall back to default kubeconfig
config, err = rest.InClusterConfig()
if err != nil {
loadingRules := clientcmd.NewDefaultClientConfigLoadingRules()
configOverrides := &clientcmd.ConfigOverrides{}
config, err = clientcmd.NewNonInteractiveDeferredLoadingClientConfig(
loadingRules, configOverrides).ClientConfig()
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for loading the Kubernetes config can be simplified by using clientcmd.NewDefaultClientConfigLoadingRules more directly. This will make the code cleaner and more idiomatic.

 loadingRules := clientcmd.NewDefaultClientConfigLoadingRules()
  if opts.Kubeconfig != "" {
    loadingRules.ExplicitPath = opts.Kubeconfig
  }
  configOverrides := &clientcmd.ConfigOverrides{}
  config, err = clientcmd.NewNonInteractiveDeferredLoadingClientConfig(
    loadingRules, configOverrides).ClientConfig()

qinqon and others added 2 commits March 26, 2026 10:47
Add a new "kubevirt" platform to kola for testing CoreOS on KubeVirt VMs.
This enables end-to-end testing of afterburn's KubeVirt metadata provider,
supporting both ConfigDrive and NoCloud cloud-init types.

Platform implementation:
- API wrapper using controller-runtime client + kubevirt.io/api types
  (lightweight alternative to kubevirt.io/client-go)
- SSH access via WebSocket port-forward through the KubeVirt API server
  (no virtctl binary dependency)
- containerDisk support for FCOS images (cosa already builds these)
- Per-test CloudInitType and NetworkData via MachineOptions

External test framework:
- Detect network_data.json in test dir -> ConfigDrive cloud-init type
- Detect network-config in test dir -> NoCloud cloud-init type
- File naming follows real-world conventions (OpenStack / cloud-init)
- Error if both files present in the same test directory
- Silently ignored on platforms that don't support network data

New CLI flags:
  --kubevirt-kubeconfig, --kubevirt-namespace, --kubevirt-image,
  --kubevirt-cloud-init-type, --kubevirt-memory, --kubevirt-cpus

New tests:
  fcos.metadata.kubevirt.configdrive
  fcos.metadata.kubevirt.nocloud

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add vendored dependencies for the new KubeVirt kola platform:
- kubevirt.io/api v1.8.0
- sigs.k8s.io/controller-runtime v0.23.3
- k8s.io/client-go v0.35.3
- k8s.io/apimachinery v0.35.3
- k8s.io/api v0.35.3
- github.com/gorilla/websocket v1.5.4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@qinqon qinqon force-pushed the kubevirt-kola-platform branch from 9eedd3d to 65687fd Compare March 26, 2026 09:48
@travier
Copy link
Copy Markdown
Member

travier commented Mar 26, 2026

Let's use #4508 to track this. Parent issue in FCOS: coreos/fedora-coreos-tracker#1432

qinqon and others added 2 commits March 26, 2026 12:28
Add a workflow that builds a Fedora CoreOS KubeVirt OCI archive
image using coreos-assembler. Runs on pull requests and can be
triggered manually via workflow_dispatch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cosa init fails when the working directory is not empty. Use a
separate temp directory for the cosa build and copy artifacts
back for upload.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 31, 2026

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kola: Add support for Kubevirt

2 participants