Skip to content

vmaffione/bpfhv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

350 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

=== Structure of this repository ===
The driver/ directory contains the bpfhv guest driver for
Linux kernels (>= 4.18):
    - bpfhv.c: driver source code

The proxy/ directory contains the implementation of an external
backend process associated to the QEMU bpfhv-proxy network backend
(-netdev bpfhv-proxy).
The external backend provides the queue processing functionalities
for a single bpfhv device that belongs to a QEMU VM. In other words,
QEMU only implements the control functionalities of the device, while
RX and TX queues are processed by the external process. Similarly
to vhost-user, QEMU and the external backend process use a dedicated
control channel to exchange some information needed for the packet
processing tasks (e.g. guest memory map, addresses of each TX and RX
queue, file descriptors for notifications, etc.).
Files:
    - backend.c: main file, implements the control protocol and
                 two packet processing loops (the first one is
                 a poll() event loop, whereas the second one uses
                 busy-wait;
    - sring.[ch]: hv implementation of a device which uses a minimal
                  descriptor format, with no support for offloads (and
                  reduced per-packet overhead);
    - sring_progs.c: eBPF programs for the sring device;
    - sring_gso.[ch]: hv implementation of a device which uses an
                      extended descriptor format, supporting checksum
                      offloads and TCP/UDP segmentation offloads;
    - sring_gso_progs.c: eBPF programs for the sring_gso device
    - vring_packed.[ch]: hv implementation of the packed virtqueue
                         in the VirtIO 1.1 specification;
    - vring_packed_progs.c: eBPF programs for the vring_packed device;
    - start-qemu.sh: an example script to start a QEMU VM with a
                     bpfhv device peered with a bpfhv-proxy network
                     backend;
    - start-proxy.sh: an example script to start the external backend
                      process and configure the backend network device
                      (e.g. a TAP interface or a netmap port);


=== Some advantages of bpfhv ===
    - Have doorbells on separate pages (configurable stride)
    - Provider can evolve the medatata header (e.g., virtio-net)
      to balance between the needs of FreeBSD and Linux.
      (virtio-net is good for Linux, but not for FreeBSD).
    - Virtio 1.1 vs 1.0 (while 0.95 is still around). This is a
      sign that there is a need for evolution and compatibility
      problems.
    - You can define a metadata format (e.g. virtio-net header)
      that fits the specific hardware NIC features used by the
      cloud provider.
    - Let the provider inject code to encrypt/decrypt the payload,
      together with the hardcoded key. The encrypt/decrypt routines
      can be helper functions that take as argument the OS packet
      pointer and the key.
    - Simplification of device paravirtualization. Fixed datapath
      ABI means that you need to be backward compatible. Look at
      virtio implementation in Linux 4.20: it needs to support
      both split and packet ring --> complex, error prone, less
      efficient.
    - Change virtual switch and backend under the hood (tap,
      netmap, other).
    - Adapt to changing workloads.


=== TODOs (driver) ===
    - Let BPFHV_MAX_TX_BUFS and BPFHV_MAX_RX_BUFS be variable.
      This requires of course reshaping the layout of the
      context data structures.
    - Try to replace dma_map_single() with dma_map_page() on
      the RX datapath ? Not sure this is relevant.
    - What if the eBPF program needs to modify the SG layout,
      e.g., for encapsulation or encryption? This would require
      changing the paddr/vaddr/len in the buffer descriptors,
      and DMA mapping and unmapping... So maybe we should ask
      the eBPF program to DMA map/unmap so, that it can do
      that after encapsulation or encryption (i.e. once
      the SG layout is stable).


=== TODOs (qemu) ===
    - Replace cpu_physical_memory_[un]map() with dma_memory_[un]map()
      and the MemoryRegionCache library. This should be only necessary
      if the guest platform has an IOMMU.
      Code in virtqueue_pop() and virtqueue_push().

    - let backend.ops.init fail (vring packed < 2^15)
    - move vring_packed generic code at the top of the files

About

Adaptive network paravirtualization for continuous Cloud Provider evolution

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages