I would like to make a contribution to the Linux kernel. The first ideas that come to mind are to create a new hardware driver or add more functionality to an existing hardware driver. drivers/staging may be a good place to look. Or potentially patches that were never accepted, although that could lead to some nasty surprises.

Unfortunately, I can’t imagine an easy way to determine how often a particular driver or feature is used. One way would be to look at the usage of compatible devicetree properties in the kernel. There are however many devicetrees stored outside the upstream kernel. Another option may be a board specific configuration file in Buildroot or Yocto.

The top ten most common compatible strings in arm64 devicetrees are as follows:

Count Compatible
118 “gpio-leds”
118 “operating-points-v2”
127 “arm,armv8-timer”
133 “arm,pl061”, “arm,primecell”
151 “arm,cortex-a72”
182 “simple-bus”
223 “cache”
306 “arm,cortex-a53”
410 “fixed-clock”
865 “regulator-fixed”

Note that this is not truly representative of the usage of these drivers in the arm64 devicetrees since they’re often aliased and included in multiple files where each file may represent a device.

Toy experimental driver

Another approach may be to implement a toy driver just to experiment with new features in the kernel. On 8/26/21 Jonathan Corbet of LWN presented The Kernel Report partially to test the infrastructure for the 2021 Linux Plumbers Conference. In that presentation he touched on numerous interesting features, but the most memorable for me were io_uring and dot2k.

io_uring

io_uring was mainlined with 5.1 in 2019. It is a interface that provides submission and completion queue rings that are shared between the kernel and userspace to avoid copies and excessive system calls. The sharing is achieved safely using memory ordering and barrier techniques. This can replace private IO offload thread pools.

It is very interesting to think of single producer and single consumer ring buffer as a synchronization mechanism. To share memory between userspace and the kernel the intuitive approach would be to share locks, but this would require significant overhead of system calls.

The head and tail of the ring buffers are stored as 32-bit integers that are incremented. Since the buffer likely has fewer than 2^32 elements a mask is used to mask away bits above the size of the buffer. This requires the buffers to be a power of 2 in size. This has the added benefit of knowing whether the producer has passed the consumer.

The submission side has an extra layer of indirection with the ring buffer submission queue indexes into an array that contains indexes into entries. That is done so that application can embed request units inside their program.

For completion the kernel updates the tail and userpace the head. For submission it is the reverse.

Kernel implementation

In the kernel it is implemented in fs/io_uring.c, include/linux/io_uring.h and tools/io_uring/ (see IO_URING in MAINTAINERS). A small thread pool implementation, io-wq, replaces Kernel workqueues.

The only references to io_uring outside include, arch and tools are the following:

On linux-block/for-next a Broadcom driver (formerly LSI Corporation) has been updated (see mpt3sas).

Resources

Jens Axboe wrote a fantastic summary of io_uring in 2019, Efficient IO with io_uring. A higher level description with more context can be found in How io_uring and eBPF Will Revolutionize Programming in Linux. Which naturally leads to a LWN article titled BPF meets io_uring.