Skip to content

Add support for qemu virtual machines using KVM#423

Draft
IkerGalardi wants to merge 8 commits into
seL4:mainfrom
IkerGalardi:kvm-support
Draft

Add support for qemu virtual machines using KVM#423
IkerGalardi wants to merge 8 commits into
seL4:mainfrom
IkerGalardi:kvm-support

Conversation

@IkerGalardi
Copy link
Copy Markdown

@IkerGalardi IkerGalardi commented Feb 26, 2026

This PR adds support for running microkit based operating systems using qemu with KVM enabled.

The main issue with the qemu platform with KVM enabled is that it drops the kernel in EL1 instead of the expected EL2. This PR builds a custom kernel for this platform with hypervisor support disabled.

The loader seems to work fine, but jumping to the kernel causes a instruction abort exception. The next is the log:

LDR|ERROR: loader trapped exception: Synchronous (Current Exception level with SP_ELx)
    esr_el1: 0x0000000086000004
    ec: 0x00000021 (Instruction Abort taken without a change in Exception level)
    il: 0x0000000000000001
    iss: 0x0000000000000004
    far: 0xffffff8040000000
    reg: 0x00000000: 0x0000000040041000
    reg: 0x00000001: 0x0000000000211274
    reg: 0x00000002: 0x0000000000000000
    reg: 0x00000003: 0x0000000000000000
    reg: 0x00000004: 0xffffff8040000000
    reg: 0x00000005: 0xffffffffffffffff
    reg: 0x00000006: 0x0000000000c5183d
    reg: 0x00000007: 0x0000000000001124
    reg: 0x00000008: 0x0000001485100510
    reg: 0x00000009: 0x0000000000000000
    reg: 0x0000000a: 0x0000000000007830
    reg: 0x0000000b: 0x000000000000000d
    reg: 0x0000000c: 0x0000000000000030
    reg: 0x0000000d: 0x0000000000000030
    reg: 0x0000000e: 0xffffff8040000000
    reg: 0x0000000f: 0x0000000000000030
    reg: 0x00000010: 0x0000000000000030
    reg: 0x00000011: 0x0000000070003298
    reg: 0x00000012: 0x0000000000000000
    reg: 0x00000013: 0x0000000070004000
    reg: 0x00000014: 0x00000000700032f0
    reg: 0x00000015: 0x0000000070003550
    reg: 0x00000016: 0x0000000000000100
    reg: 0x00000017: 0x000000007000c008
    reg: 0x00000018: 0x0000000070003510
    reg: 0x00000019: 0x00000000700034f8
    reg: 0x0000001a: 0x000000007000c008
    reg: 0x0000001b: 0x0000000070005f70
    reg: 0x0000001c: 0x0000000000000000
    reg: 0x0000001d: 0x0000000000000000
    reg: 0x0000001e: 0x0000000000000000
    reg: 0x0000001f: 0x0000000000000000

@Ivan-Velickovic
Copy link
Copy Markdown
Collaborator

Ivan-Velickovic commented Mar 23, 2026

@IkerGalardi you should include the full logs for the loader, it's not clear to me that the loader is working properly. There is most likely an issue with the initial virtual address space that is setup by the loader.

I also would imagine that the smc calls in the loader would need to be replaced with hvc calls when using KVM, as I had to do something similar in the past.

@IkerGalardi
Copy link
Copy Markdown
Author

IkerGalardi commented Mar 23, 2026

Here are the full logs:

LDR|INFO: disabling MMU (if it was enabled)
LDR|INFO: PSCI version is 1.1
LDR|INFO: altloader for seL4 starting
LDR|INFO: flags:
LDR|INFO: kernel:      entry:   0xffffff8040000000
LDR|INFO: root server: physmem: 0x0000000040241000 -- 0x00000000406cc000
LDR|INFO:              virtmem: 0x0000000000200000 -- 0x000000000068b000
LDR|INFO:              entry  : 0x0000000000211274
LDR|INFO: region: 0x00000000   addr: 0x0000000040000000   size: 0x0000000000241000   offset: 0x0000000000000000   type: 0x0000000000000001
LDR|INFO: region: 0x00000001   addr: 0x0000000040241000   size: 0x00000000000086e8   offset: 0x0000000000241000   type: 0x0000000000000001
LDR|INFO: region: 0x00000002   addr: 0x000000004024a6e8   size: 0x0000000000015428   offset: 0x00000000002496e8   type: 0x0000000000000001
LDR|INFO: region: 0x00000003   addr: 0x0000000040260b10   size: 0x00000000000100b0   offset: 0x000000000025eb10   type: 0x0000000000000001
LDR|INFO: region: 0x00000004   addr: 0x0000000040271000   size: 0x000000000005b3cc   offset: 0x000000000026ebc0   type: 0x0000000000000001
LDR|INFO: region: 0x00000005   addr: 0x00000000402cd000   size: 0x00000000003ff000   offset: 0x00000000002c9f8c   type: 0x0000000000000001
LDR|INFO: copying region 0x00000000
LDR|INFO: copying region 0x00000001
LDR|INFO: copying region 0x00000002
LDR|INFO: copying region 0x00000003
LDR|INFO: copying region 0x00000004
LDR|INFO: copying region 0x00000005
LDR|INFO|CPU0: active CPUs to start: 0x00000001
LDR|INFO|CPU0: enabling MMU
LDR|INFO|CPU0: CurrentEL=EL1
LDR|INFO|CPU0: enabling MMU
LDR|INFO|CPU0: jumping to kernel

LDR|ERROR: loader trapped exception: Synchronous (Current Exception level with SP_ELx)
    esr_el1: 0x0000000086000004
    ec: 0x00000021 (Instruction Abort taken without a change in Exception level)
    il: 0x0000000000000001
    iss: 0x0000000000000004
    far: 0xffffff8040000000
    reg: 0x00000000: 0x0000000040041000
    reg: 0x00000001: 0x0000000000211274
    reg: 0x00000002: 0x0000000000000000
    reg: 0x00000003: 0x0000000000000000
    reg: 0x00000004: 0xffffff8040000000
    reg: 0x00000005: 0xffffffffffffffff
    reg: 0x00000006: 0x0000000000c5183d
    reg: 0x00000007: 0x0000000000001124
    reg: 0x00000008: 0x0000001485100510
    reg: 0x00000009: 0x0000000000000000
    reg: 0x0000000a: 0x0000000000007830
    reg: 0x0000000b: 0x000000000000000d
    reg: 0x0000000c: 0x0000000000000030
    reg: 0x0000000d: 0x0000000000000030
    reg: 0x0000000e: 0xffffff8040000000
    reg: 0x0000000f: 0x0000000000000030
    reg: 0x00000010: 0x0000000000000030
    reg: 0x00000011: 0x0000000070003298
    reg: 0x00000012: 0x0000000000000000
    reg: 0x00000013: 0x0000000070004000
    reg: 0x00000014: 0x00000000700032f0
    reg: 0x00000015: 0x0000000070003550
    reg: 0x00000016: 0x0000000000000100
    reg: 0x00000017: 0x000000007000c008
    reg: 0x00000018: 0x0000000070003510
    reg: 0x00000019: 0x00000000700034f8
    reg: 0x0000001a: 0x000000007000c008
    reg: 0x0000001b: 0x0000000070005f70
    reg: 0x0000001c: 0x0000000000000000
    reg: 0x0000001d: 0x0000000000000000
    reg: 0x0000001e: 0x0000000000000000
    reg: 0x0000001f: 0x0000000000000000

The issue could be the page tables, but still, the kernel entry being at 0xffffff8040000000 feels kinda strange.

About the smc instruction, secure monitor calling convention states that hypervisors catch and emulate those smc calls, so it should be fine™.

@Indanz
Copy link
Copy Markdown

Indanz commented Mar 23, 2026

The issue could be the page tables, but still, the kernel entry being at 0xffffff8040000000 feels kinda strange.

On non-HYP, the kernel uses the upper address range for its page table configured by TTBR1_EL1. It uses TTBR0_EL1 to configure the user space tables. For HYP, when running in EL2, there is no corresponding TTBR1_EL2, only a TTBR0_EL2. However, it is not shared with user space like TTBR0_EL1 and TTBR1_EL1 are.

This board differs from the standard qemu_virt_aarch64 board in that
the entry point is in EL2 instead of EL1. This is necessary due to the
lack of nested virtualization support on the KVM subsystem.

Signed-off-by: IkerGalardi <contacto.ikergalardi@gmail.com>
Previously the defaults where applied ALWAYS, meaning that if some
board tried to specialize a kernel building parameter that the default
configuration did, it would get overwriten by the default
configuration. This patch applies the specialization to the default
configuration instead of doing it the other way around.

Signed-off-by: IkerGalardi <contacto.ikergalardi@gmail.com>
Signed-off-by: IkerGalardi <contacto.ikergalardi@gmail.com>
EL1 software can not access interrupt group registers on the GIC
distributor.

Signed-off-by: IkerGalardi <contacto.ikergalardi@gmail.com>
The el1_mmu_disable function at the start pushed both x29 and x30
(frame pointer and link register) into the stack. But when ending the
function, before the RET instruction, it poped 4 values instead of
just the pushed 2, eating the stack frame of the parent scope.

Probably copy-pasted from the el2_mmu_disable, which pushes x27, x28,
x29 and x30 and pops all of them. Both functions are identical, but
el1 version for some reason does not push x27 and x28.

Signed-off-by: IkerGalardi <contacto.ikergalardi@gmail.com>
There was a missing ldp instruction poping the x27 and x28 values from
the stack.

Signed-off-by: IkerGalardi <contacto.ikergalardi@gmail.com>
@IkerGalardi
Copy link
Copy Markdown
Author

Makes sense.

I've seen that the page tables used in the upper VA range are set in boot_lvlX_upper variable. That variable is populated by the microkit tool (I assume after seeing references to it on the function aarch64_setup_pagetables in loader.rs).

lvl0 descriptors are completelly set to 0, so it makes sense that the there is a instruction abort.

(gdb) p /x boot_lvl0_upper 
$5 = {0x0 <repeats 512 times>}
(gdb) p /x boot_lvl1_upper 
$6 = {0x0, 0x7000a003, 0x0 <repeats 510 times>}
(gdb) p /x boot_lvl2_upper 
$7 = {0x40000711, 0x40200711, 0x40400711, 0x40600711, 0x40800711, 0x40a00711, 0x40c00711, 0x40e00711, 0x41000711, 0x41200711, 0x41400711, 
  0x41600711, 0x41800711, 0x41a00711, 0x41c00711, 0x41e00711, 0x42000711, 0x42200711, 0x42400711, 0x42600711, 0x42800711, 0x42a00711, 
  0x42c00711, 0x42e00711, 0x43000711, 0x43200711, 0x43400711, 0x43600711, 0x43800711, 0x43a00711, 0x43c00711, 0x43e00711, 0x44000711, 
  0x44200711, 0x44400711, 0x44600711, 0x44800711, 0x44a00711, 0x44c00711, 0x44e00711, 0x45000711, 0x45200711, 0x45400711, 0x45600711, 
  0x45800711, 0x45a00711, 0x45c00711, 0x45e00711, 0x46000711, 0x46200711, 0x46400711, 0x46600711, 0x46800711, 0x46a00711, 0x46c00711, 
  0x46e00711, 0x47000711, 0x47200711, 0x47400711, 0x47600711, 0x47800711, 0x47a00711, 0x47c00711, 0x47e00711, 0x48000711, 0x48200711, 
  0x48400711, 0x48600711, 0x48800711, 0x48a00711, 0x48c00711, 0x48e00711, 0x49000711, 0x49200711, 0x49400711, 0x49600711, 0x49800711, 
  0x49a00711, 0x49c00711, 0x49e00711, 0x4a000711, 0x4a200711, 0x4a400711, 0x4a600711, 0x4a800711, 0x4aa00711, 0x4ac00711, 0x4ae00711, 
  0x4b000711, 0x4b200711, 0x4b400711, 0x4b600711, 0x4b800711, 0x4ba00711, 0x4bc00711, 0x4be00711, 0x4c000711, 0x4c200711, 0x4c400711, 
  0x4c600711, 0x4c800711, 0x4ca00711, 0x4cc00711, 0x4ce00711, 0x4d000711, 0x4d200711, 0x4d400711, 0x4d600711, 0x4d800711, 0x4da00711, 
  0x4dc00711, 0x4de00711, 0x4e000711, 0x4e200711, 0x4e400711, 0x4e600711, 0x4e800711, 0x4ea00711, 0x4ec00711, 0x4ee00711, 0x4f000711, 
  0x4f200711, 0x4f400711, 0x4f600711, 0x4f800711, 0x4fa00711, 0x4fc00711, 0x4fe00711, 0x50000711, 0x50200711, 0x50400711, 0x50600711, 
  0x50800711, 0x50a00711, 0x50c00711, 0x50e00711, 0x51000711, 0x51200711, 0x51400711, 0x51600711, 0x51800711, 0x51a00711, 0x51c00711, 
  0x51e00711, 0x52000711, 0x52200711, 0x52400711, 0x52600711, 0x52800711, 0x52a00711, 0x52c00711, 0x52e00711, 0x53000711, 0x53200711, 
  0x53400711, 0x53600711, 0x53800711, 0x53a00711, 0x53c00711, 0x53e00711, 0x54000711, 0x54200711, 0x54400711, 0x54600711, 0x54800711, 
  0x54a00711, 0x54c00711, 0x54e00711, 0x55000711, 0x55200711, 0x55400711, 0x55600711, 0x55800711, 0x55a00711, 0x55c00711, 0x55e00711, 
  0x56000711, 0x56200711, 0x56400711, 0x56600711, 0x56800711, 0x56a00711, 0x56c00711, 0x56e00711, 0x57000711, 0x57200711, 0x57400711, 
  0x57600711, 0x57800711, 0x57a00711, 0x57c00711, 0x57e00711, 0x58000711, 0x58200711, 0x58400711, 0x58600711, 0x58800711, 0x58a00711, 
  0x58c00711, 0x58e00711...}

The kernel has its entry in the 0xff8040000000 memory address, so in order to resolve the first level translation I've set the 0x1ff entry on the base table with set variable boot_lvl0_upper[0x1ff] = 0x7000b003 and the kernel is now fully booted.

My test setup fails with a data fault on the serial transmit virtualizer, but this might be unrelated to the tool.

MON|ERROR: received message 0x00000006  badge: 0x0000000000000002  tcb cap: 0x000000000000000b
MON|ERROR: faulting PD: serial_virt_tx
Registers: 
pc : 0x0000000000204ce0
sp: 0x00007ffffffffd40
spsr : 0x0000000020000040
x0 : 0x0000000000000000
x1 : 0x0000000000000000
x2 : 0x0000000000000000
x3 : 0xffffffffffffffff
x4 : 0x0000000000201fc4
x5 : 0x000000000000001b
x6 : 0x0000000000000000
x7 : 0x0000000000000000
x8 : 0x0000000000000000
x16 : 0x0000000000000000
x17 : 0x0000000000000000
x18 : 0x0000000000000000
x29 : 0x00007ffffffffd50
x30 : 0x0000000000204e7c
x9 : 0x0000000000000000
x10 : 0x0000000000000000
x11 : 0x0000000000000000
x12 : 0x0000000000000000
x13 : 0x0000000000000000
x14 : 0x0000000000000000
x15 : 0x0000000000000000
x19 : 0x00007ffffffffe50
x20 : 0x0000000000000000
x21 : 0x0000000000207000
x22 : 0x000000000020b2a0
x23 : 0x0000000000000000
x24 : 0x0000000000000000
x25 : 0x0000000000000000
x26 : 0x0000000000000000
x27 : 0x0000000000000000
x28 : 0x0000000000000000
tpidr_el0 : 0x0000000000000000
tpidrro_el0 : 0x0000000000000000
MON|ERROR: VMFault: ip=0x0000000000204ce0  fault_addr=0x0000000000000000  fsr=0x0000000092000006  (data fault)
MON|ERROR:   ec: 0x00000024  Data Abort from a lower Exception level   il: 1   iss: 0x00000006
MON|ERROR:   dfsc = translation fault, level 2 (0x00000006)

I'll try to fix the table generator in the loader.rs file, but I'm not really familiarized with rust so I might take my time to fix 😅.

@midnightveil
Copy link
Copy Markdown
Contributor

This is as per the documentation given above the function,
just for some reason only lvl0_lower ever had anything placed
in it. This makes the code dual-use for hypervisor/not.

Signed-off-by: Julia Vassiliki <julia.vassiliki@unsw.edu.au>
seL4#493

Signed-off-by: Julia Vassiliki <julia.vassiliki@unsw.edu.au>
@IkerGalardi
Copy link
Copy Markdown
Author

The EL1 page table generation works on my webserver example. But still faults on the Serial transmission virtualizer.

Once I figure out whats going on there I'll mark this PR ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants