I would like to make a contribution to the Linux kernel. The first
ideas that come to mind are to create a new hardware driver or add more
functionality to an existing hardware driver.
drivers/staging
may be a good place to look. Or potentially
patches that were never accepted, although that could lead to some nasty
surprises.
Unfortunately, I can’t imagine an easy way to determine how
often a particular driver or feature is used. One way would be to look
at the usage of compatible
devicetree properties in the
kernel. There are however many devicetrees stored outside the upstream
kernel. Another option may be a board specific configuration file in
Buildroot or Yocto.
The top ten most common compatible strings in arm64
devicetrees are as follows:
Count | Compatible |
---|---|
118 | “gpio-leds” |
118 | “operating-points-v2” |
127 | “arm,armv8-timer” |
133 | “arm,pl061”, “arm,primecell” |
151 | “arm,cortex-a72” |
182 | “simple-bus” |
223 | “cache” |
306 | “arm,cortex-a53” |
410 | “fixed-clock” |
865 | “regulator-fixed” |
Note that this is not truly representative of the usage of these
drivers in the arm64
devicetrees since they’re often
aliased and included in multiple files where each file may represent a
device.
Another approach may be to implement a toy driver just to experiment
with new features in the kernel. On 8/26/21 Jonathan Corbet of LWN
presented The Kernel Report partially to test the
infrastructure for the 2021 Linux Plumbers Conference. In that
presentation he touched on numerous interesting features, but the most
memorable for me were io_uring
and dot2k
.
io_uring
io_uring
was mainlined with 5.1 in 2019. It is a
interface that provides submission and completion queue rings that are
shared between the kernel and userspace to avoid copies and excessive
system calls. The sharing is achieved safely using memory ordering and
barrier techniques. This can replace private IO offload thread
pools.
It is very interesting to think of single producer and single consumer ring buffer as a synchronization mechanism. To share memory between userspace and the kernel the intuitive approach would be to share locks, but this would require significant overhead of system calls.
The head and tail of the ring buffers are stored as 32-bit integers that are incremented. Since the buffer likely has fewer than 2^32 elements a mask is used to mask away bits above the size of the buffer. This requires the buffers to be a power of 2 in size. This has the added benefit of knowing whether the producer has passed the consumer.
The submission side has an extra layer of indirection with the ring buffer submission queue indexes into an array that contains indexes into entries. That is done so that application can embed request units inside their program.
For completion the kernel updates the tail and userpace the head. For submission it is the reverse.
In the kernel it is implemented in fs/io_uring.c
,
include/linux/io_uring.h
and tools/io_uring/
(see IO_URING
in MAINTAINERS
). A small thread
pool implementation, io-wq
, replaces Kernel workqueues.
The only references to io_uring
outside
include
, arch
and tools
are the
following:
drivers/scsi/megaraid/megaraid_sas_base.c
specifies a
module parameter named poll_queues
that specifies the
number of queues to be used for io_uring
poll modenet/unix/scm.c:unix_get_socket()
will pass a file
pointer to io_uring_get_socket()
if it the pointer fails a
couple checksstruct io_uring_task
is a member of the first task
table, init_task
. The struct task_struct
is
defined in linux/sched.h
kernel/fork.c
clears the aforementioned member of
task_struct
in copy_process()
and calls
io_uring_free()
On linux-block/for-next
a Broadcom driver (formerly LSI
Corporation) has been updated (see mpt3sas
).
Jens Axboe wrote a fantastic summary of io_uring
in
2019, Efficient IO with
io_uring. A higher level description with more context can be
found in How
io_uring and eBPF Will Revolutionize Programming in Linux.
Which naturally leads to a LWN article titled BPF meets io_uring.