Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
608 changes: 608 additions & 0 deletions pocs/linux/kernelctf/CVE-2026-23111_cos/docs/exploit.md

Large diffs are not rendered by default.

121 changes: 121 additions & 0 deletions pocs/linux/kernelctf/CVE-2026-23111_cos/docs/novel-techniques.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# Novel Techniques

While exploiting CVE-2026-23111, we identified two novel exploit techniques: an effective bypass of physical ASLR through linear physical memory scanning, and a method to pivot a SLUB object UAF into a page-level UAF via invalid `kfree`. These techniques enabled a full exploit chain from a Netfilter chain unbalanced refcount decrement to root code execution without requiring any kernel address leak.

## Effective bypass of physASLR

Physical ASLR (physASLR/KASLR) randomizes the physical base address at which the kernel image is loaded. This is intended to prevent attackers from predicting where kernel data structures reside in physical memory. However, when an attacker already has a physical arbitrary read/write primitive, physASLR can be bypassed with a simple linear scan without requiring any information leak.

In our exploit, we know the offset of `core_pattern` from the kernel physical base on the target COS image (`0x3fb3440`). The kernel physical base is aligned to a large boundary. We scan candidate base addresses at 16 MB (0x1000000) intervals:

- [exploit/cos-121-18867.294.100/exploit.c#L927](../exploit/cos-121-18867.294.100/exploit.c#L927)
```c
#define CORE_PATTERN_PHYS_ADDR 0x3fb3440
int cnt = 0;
while (1) {
aar(pipe_arr[pt_idx], tmp_buf,
CORE_PATTERN_PHYS_ADDR + cnt * 0x1000000, pt_addr, PG_SIZE);
char *core_pattern_addr = (char *)(pt_addr + (CORE_PATTERN_PHYS_ADDR & 0xfff));
if (!strcmp(core_pattern_addr, "core")) {
// found kernel physical base at offset cnt * 0x1000000
break;
}
cnt++;
}
```

Each iteration reads one page at a candidate physical address using our `aar()` (arbitrary address read) function, which works by rewriting a PTE to map a user-space virtual address to the target physical address. If the page contains the expected `"core"` string at the correct offset, we have found the kernel image.

On typical configurations with a few GB of physical memory, this scan completes within a small number of iterations (less than ~256 for 4 GB of RAM at 16 MB intervals). Each attempt is a simple memory read with no side effects that would trigger detection or instability.

This demonstrates that physASLR provides limited protection against attackers with physical memory access. Once an attacker can read/write arbitrary physical addresses (e.g., through page table corruption), the randomized base can be trivially recovered through sequential probing. Unlike virtual KASLR bypass techniques that typically require information leaks via side channels, OOB reads, or sprayed kernel pointers, this approach requires no information leak at all, only the ability to read physical memory.

To the best of our knowledge, this is the first kernelCTF submission to bypass physASLR using a linear physical memory scan.

## Exploiting unexpected behavior of invalid address `kfree`

When `kfree()` is called on a pointer whose underlying slab page has already been freed and reassigned to a non-slab use (e.g., pipe buffer), the slab allocator can inadvertently free the page back to the page allocator:

- [mm/slab_common.c:kfree()](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/slab_common.c?h=v6.6)
```c
void kfree(const void *object)
{
struct folio *folio;
struct slab *slab;
struct kmem_cache *s;
// ...
folio = virt_to_folio(object); // [1] resolve physical page
if (unlikely(!folio_test_slab(folio))) {
free_large_kmalloc(folio, (void *)object); // [2] non-slab path
return;
}
slab = folio_slab(folio);
s = slab->slab_cache;
__kmem_cache_free(s, (void *)object, _RET_IP_); // [3] slab path
}
```

At [1], `kfree()` resolves the pointer to its physical page. If the page is no longer marked as a slab page, it takes the non-slab path at [2], which calls `free_large_kmalloc()`:

- [mm/slab_common.c:free_large_kmalloc()](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/slab_common.c?h=v6.6)
```c
void free_large_kmalloc(struct folio *folio, void *object)
{
unsigned int order = folio_order(folio);

if (WARN_ON_ONCE(order == 0)) // [4]
pr_warn_once("object pointer: 0x%p\n", object);
// ...
__free_pages(folio_page(folio, 0), order); // [5]
}
```

The function warns if the folio order is 0 [4], but does not prevent the free. `__free_pages()` [5] frees the underlying page back to the page allocator regardless. This means `kfree()` on a stale pointer whose page has been repurposed to a non-slab use will trigger a warning but still free the page, turning a SLUB object UAF into a page UAF.

In our exploit, we achieve this through two stages of cross-cache reclamation (`kmalloc-cg-128` -> `kmalloc-16` -> pipe page). First, the freed `nft_chain` object (`kmalloc-cg-128`) is reclaimed by `unix_address` objects (`kmalloc-16`). The `unix_address->refcnt` field at offset 0 overlaps with `nft_chain->use`, so the vulnerability's UAF decrement corrupts the refcount, leading to a premature `kfree()` of the `unix_address` while a server socket still holds a dangling reference.

We then drain the `kmalloc-16` slab and spray pipe pages to reclaim the freed pages. The dangling `unix_address` pointer now points into a pipe buffer page, which is no longer a slab page. When we close the remaining server socket, `unix_sock_destructor()` calls `kfree()` on this dangling pointer. Since the page is no longer a slab page, `folio_test_slab()` [1] fails and `kfree()` takes the non-slab path [2], freeing the pipe page back to the page allocator. A page table spray then reclaims it, creating a pipe buffer / page table overlap that provides arbitrary physical read/write.

This technique generalizes: any situation where a UAF access corrupts a refcount field of a cross-cache overlapping object can be pivoted into a page-level primitive via the resulting double free. The premature free effectively "promotes" a SLUB object UAF to a page UAF, bypassing slab-level mitigations such as `CONFIG_RANDOM_KMALLOC_CACHES`. This is particularly useful in scenarios where the attacker does not have enough control over the freed object's content to exploit it at the slab level (e.g., no heap/KASLR leak for traditional slab-based techniques). By escalating to the page level, we gain physical AARW without ever needing a kernel address leak.

Interestingly, this technique is prevented on the [mitigation instance](https://github.com/thejh/linux/tree/4c5b4a60a8f52798223807f76442e96d9eb15046) by `CONFIG_SLAB_VIRTUAL`, though likely as an unintended side effect rather than a deliberate defense against this specific attack pattern. With `CONFIG_SLAB_VIRTUAL`, slab objects are accessed through a dedicated virtual address range (`SLAB_DATA_BASE_ADDR`), decoupled from the underlying physical pages. When a slab page is freed via `__free_slab()`, the PTEs for the slab's virtual address range are cleared:

- [mm/slub.c:__free_slab() (CONFIG_SLAB_VIRTUAL)](https://github.com/thejh/linux/blob/4c5b4a60a8f52798223807f76442e96d9eb15046/mm/slub.c)
```c
#ifdef CONFIG_SLAB_VIRTUAL
static void __free_slab(struct kmem_cache *s, struct slab *slab)
{
// ...
for (i = 0; i < pages; i++) {
ptep_clear(...); // clear PTE for each slab page
}
// queue for TLB flush and physical page deallocation
}
#endif
```

After our cross-cache step (chain freed from `kmalloc-cg-128`, page returned to page allocator, reclaimed by `kmalloc-16`), the stale chain pointer's virtual address is no longer mapped. Any UAF access through this pointer, such as the `nft_data_release()` decrement on `chain->use`, would fault or be rejected by `virt_to_slab()`, which checks `is_slab_addr()` and validates slab metadata:

- [mm/slab.h:virt_to_slab() (CONFIG_SLAB_VIRTUAL)](https://github.com/thejh/linux/blob/4c5b4a60a8f52798223807f76442e96d9eb15046/mm/slab.h)
```c
static inline struct slab *virt_to_slab(const void *addr)
{
struct slab *slab, *slab_head;

if (!is_slab_addr(addr)) // reject if not in slab virtual range
return NULL;

slab = (struct slab *)virt_to_slab_raw((unsigned long)addr);
slab_head = slab->compound_slab_head;

if (CHECK_DATA_CORRUPTION(!is_slab_meta(slab_head),
"compound slab head out of meta range: %p", slab_head))
return NULL;

return slab_head;
}
```

The decoupling of slab virtual addresses from physical pages means that cross-cache reuse does not preserve the old pointer's validity, which happens to block the UAF field overlap that our technique relies on. The original design goal of `CONFIG_SLAB_VIRTUAL` is to prevent cross-cache attacks, but this incidental invalidation of stale pointers also closes off the escalation path from SLUB-level UAF to page-level UAF.

To the best of our knowledge, this is the first kernelCTF submission to exploit invalid `kfree` behavior to pivot a SLUB UAF into a page-level UAF.
29 changes: 29 additions & 0 deletions pocs/linux/kernelctf/CVE-2026-23111_cos/docs/vulnerability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Vulnerability

A use-after-free vulnerability was found in the Linux kernel's Netfilter nf_tables subsystem (`net/netfilter/nf_tables_api.c`). Inverted genmask check in `nft_map_catchall_activate()` abort path leading to chain refcount and use-after-free of `nft_chain` object. This leads to local privilege escalation (LPE).

## Requirements to trigger the vulnerability:
- Capabilities: To trigger the vulnerability, `CAP_NET_ADMIN` capability is required to access the Netfilter system.
- Kernel configuration: Kernel configs related to the Netfilter nf_tables system (e.g., `CONFIG_NETFILTER`, `CONFIG_NF_TABLES`) are required to trigger this vulnerability. This config is generally enabled by default (ex. x86_64_defconfig).
- Are user namespaces needed?: Yes. As this vulnerability requires `CAP_NET_ADMIN`, which is not usually given to the normal user, we used the unprivileged user namespace to achieve this capability.

## Commit which introduced the vulnerability
- This vulnerability was introduced in Linux v6.4, with commit [628bd3e49cba1c066228e23d71a852c23e26da73]https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=628bd3e49cba1c066228e23d71a852c23e26da73)
- This commit fixes an inverted genmask check in nft_map_catchall_activate() abort path that caused chain refcount underflow and use-after-free in nf_tables.

## Commit which fixed the vulnerability
- This vulnerability was fixed in Linux v6.19, with commit [f41c5d151078c5348271ffaf8e7410d96f2d82f8](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f41c5d151078c5348271ffaf8e7410d96f2d82f8)
- This commit moves map element reference dropping from the set `.destroy` phase to the preparation phase to prevent refcount imbalance and spurious EBUSY errors in nf_tables.

## Affected kernel versions
- Linux version v6.4 ~ v6.19 affects to this vulnerability

## Affected component, subsystem
- net/netfilter (nf_tables)

## Cause (UAF, BoF, race condition, double free, refcount overflow, etc)
- Use-after-free

## Which syscalls or syscall parameters are needed to be blocked to prevent triggering the vulnerability? (If there is any easy way to block it.)
- Disable syscalls for Netfilter (specifically, Netfilter nf_tables) system (ex. `socket`, `sendmsg` with Netlink socket) to prevent this vulnerability.
- Disable syscalls for unprivileged user namespace (ex. `clone`, `unshare`) can reduce the attack surface since the Netfilter system requires `CAP_NET_ADMIN` to use.
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
CC = g++
SRCS := ./exploit.cpp
TARGETS := exploit exploit_debug
LIBMNL_DIR = $(realpath ./)/libmnl_build
LIBNFTNL_DIR = $(realpath ./)/libnftnl_build
LIBXDK_DIR = $(realpath ./)/libxdk_build

CFLAGS = -w -static -Wall -fpermissive
LIBS = -L$(LIBMNL_DIR)/install/usr/local/lib -L$(LIBNFTNL_DIR)/install/usr/local/lib -L$(LIBXDK_DIR)/lib -lnftnl -lmnl -lkernelXDK -lkeyutils
INCLUDES = -I$(LIBMNL_DIR)/install/usr/local/include -I$(LIBNFTNL_DIR)/install/usr/local/include -I$(LIBXDK_DIR)/include

all: exploit

exploit : libmnl-build libnftnl-build libxdk-build target_db.kxdb
$(CC) $(CFLAGS) $(SRCS) -o $@ $(INCLUDES) $(LIBS)

exploit_debug: CFLAGS += -g -DDEBUG
exploit_debug: libmnl-build libnftnl-build libxdk-build target_db.kxdb
$(CC) $(CFLAGS) $(SRCS) -o $@ $(INCLUDES) $(LIBS)

libmnl-build : libmnl-download
tar -C $(LIBMNL_DIR) -xvf $(LIBMNL_DIR)/libmnl-1.0.5.tar.bz2
cd $(LIBMNL_DIR)/libmnl-1.0.5 && ./configure --enable-static
cd $(LIBMNL_DIR)/libmnl-1.0.5 && make -j`nproc`
cd $(LIBMNL_DIR)/libmnl-1.0.5 && mkdir ../install && make DESTDIR=`realpath ../install` install

libnftnl-build : libmnl-build libnftnl-download
tar -C $(LIBNFTNL_DIR) -xvf $(LIBNFTNL_DIR)/libnftnl-1.2.1.tar.bz2
cd $(LIBNFTNL_DIR)/libnftnl-1.2.1 && PKG_CONFIG_PATH=$(LIBMNL_DIR)/install/usr/local/lib/pkgconfig ./configure --enable-static
cd $(LIBNFTNL_DIR)/libnftnl-1.2.1 && C_INCLUDE_PATH=$(C_INCLUDE_PATH):$(LIBMNL_DIR)/install/usr/local/include LD_LIBRARY_PATH=$(LD_LIBRARY_PATH):$(LIBMNL_DIR)/install/usr/local/lib make -j`nproc`
cd $(LIBNFTNL_DIR)/libnftnl-1.2.1 && mkdir ../install && make DESTDIR=`realpath ../install` install

libmnl-download :
mkdir $(LIBMNL_DIR)
wget -P $(LIBMNL_DIR) https://netfilter.org/projects/libmnl/files/libmnl-1.0.5.tar.bz2


libnftnl-download :
mkdir $(LIBNFTNL_DIR)
wget -P $(LIBNFTNL_DIR) https://netfilter.org/projects/libnftnl/files/libnftnl-1.2.1.tar.bz2

libxdk-build :
mkdir -p $(LIBXDK_DIR)
wget -O $(LIBXDK_DIR)/libxdk-v0.1.tar.gz https://github.com/google/kernel-research/releases/download/libxdk/v0.1/libxdk-v0.1.tar.gz
tar -C $(LIBXDK_DIR) -xzf $(LIBXDK_DIR)/libxdk-v0.1.tar.gz

target_db.kxdb :
wget -O target_db.kxdb https://storage.googleapis.com/kernelxdk/db/kernelctf.kxdb

.PHONY: all libmnl-build libnftnl-build libxdk-build libmnl-download libnftnl-download clean
clean:
rm -f $(TARGETS)
if [ -d $(LIBMNL_DIR)/libmnl-1.0.5 ]; then cd $(LIBMNL_DIR)/libmnl-1.0.5 && make DESTDIR=`realpath ../install` uninstall; fi
if [ -d $(LIBNFTNL_DIR)/libnftnl-1.2.1 ]; then cd $(LIBNFTNL_DIR)/libnftnl-1.2.1 && make DESTDIR=`realpath ../install` uninstall; fi
rm -rf $(LIBMNL_DIR)
rm -rf $(LIBNFTNL_DIR)
rm -rf $(LIBXDK_DIR)
rm -f target_db.kxdb
Binary file not shown.
Loading
Loading