I would like to make a contribution to the Linux kernel. The first ideas that come to mind are to create a new hardware driver or add more functionality to an existing hardware driver. drivers/staging
may be a good place to look. Or potentially patches that were never accepted, although that could lead to some nasty surprises.
Unfortunately, I can’t imagine an easy way to determine how often a particular driver or feature is used. One way would be to look at the usage of compatible
devicetree properties in the kernel. There are however many devicetrees stored outside the upstream kernel. Another option may be a board specific configuration file in Buildroot or Yocto.
The top ten most common compatible strings in arm64
devicetrees are as follows:
Count | Compatible |
---|---|
118 | “gpio-leds” |
118 | “operating-points-v2” |
127 | “arm,armv8-timer” |
133 | “arm,pl061”, “arm,primecell” |
151 | “arm,cortex-a72” |
182 | “simple-bus” |
223 | “cache” |
306 | “arm,cortex-a53” |
410 | “fixed-clock” |
865 | “regulator-fixed” |
Note that this is not truly representative of the usage of these drivers in the arm64
devicetrees since they’re often aliased and included in multiple files where each file may represent a device.
Another approach may be to implement a toy driver just to experiment with new features in the kernel. On 8/26/21 Jonathan Corbet of LWN presented The Kernel Report partially to test the infrastructure for the 2021 Linux Plumbers Conference. In that presentation he touched on numerous interesting features, but the most memorable for me were io_uring
and dot2k
.
io_uring
io_uring
was mainlined with 5.1 in 2019. It is a interface that provides submission and completion queue rings that are shared between the kernel and userspace to avoid copies and excessive system calls. The sharing is achieved safely using memory ordering and barrier techniques. This can replace private IO offload thread pools.
It is very interesting to think of single producer and single consumer ring buffer as a synchronization mechanism. To share memory between userspace and the kernel the intuitive approach would be to share locks, but this would require significant overhead of system calls.
The head and tail of the ring buffers are stored as 32-bit integers that are incremented. Since the buffer likely has fewer than 2^32 elements a mask is used to mask away bits above the size of the buffer. This requires the buffers to be a power of 2 in size. This has the added benefit of knowing whether the producer has passed the consumer.
The submission side has an extra layer of indirection with the ring buffer submission queue indexes into an array that contains indexes into entries. That is done so that application can embed request units inside their program.
For completion the kernel updates the tail and userpace the head. For submission it is the reverse.
In the kernel it is implemented in fs/io_uring.c
, include/linux/io_uring.h
and tools/io_uring/
(see IO_URING
in MAINTAINERS
). A small thread pool implementation, io-wq
, replaces Kernel workqueues.
The only references to io_uring
outside include
, arch
and tools
are the following:
drivers/scsi/megaraid/megaraid_sas_base.c
specifies a module parameter named poll_queues
that specifies the number of queues to be used for io_uring
poll modenet/unix/scm.c:unix_get_socket()
will pass a file pointer to io_uring_get_socket()
if it the pointer fails a couple checksstruct io_uring_task
is a member of the first task table, init_task
. The struct task_struct
is defined in linux/sched.h
kernel/fork.c
clears the aforementioned member of task_struct
in copy_process()
and calls io_uring_free()
On linux-block/for-next
a Broadcom driver (formerly LSI Corporation) has been updated (see mpt3sas
).
Jens Axboe wrote a fantastic summary of io_uring
in 2019, Efficient IO with io_uring. A higher level description with more context can be found in How io_uring and eBPF Will Revolutionize Programming in Linux. Which naturally leads to a LWN article titled BPF meets io_uring.