Peripheral access to system memory without going through a processor. The processor interfaces with a DMA controller which generates an interrupt when the peripheral operation is complete. The operation may be requested by the host or execute asynchronously by the peripheral.

The core problem that DMA is trying to solve is bus management. In the classic example the DMA controller facilitates ownership of the bus. Really the notion of a DMA controller is outdated and the real scope of the problem is bus design.

In a bus mastering system or burst mode the peripheral is granted control of the memory bus for some period. Alternatively, in cycle stealing mode the DMA controller asks the processor for each block of data that the peripheral device wants to write.

DMA controllers may also provide scatter/gather functionality where a buffer scattered in memory appears contiguous to a device. This reduces the number of operations that the processor needs to perform.

They may also support memory-to-memory operations.

In PCI there is no physical DMA controller or third-party DMA. Instead each device may become a bus master (i.e. first-party DMA). Multi-master configurations are less deterministic and require arbitration.

Types of addresses

User space processes each have their own virtual memory. The kernel deals directly with the physical addresses of the processor. Except in small processors, a memory management unit (MMU) does the translation between virtual and physical addresses.

The kernel technically has it’s own address space known as logical addresses, which are normally just offset from physical addresses. Additionally, there are kernel virtual addresses, which includes logical addresses.

Why is virtual address space split between user-space and the kernel? The portion of virtual address space used by the kernel includes kernel code and the remainder is used to map physical memory. How do logical addresses relate to this situation?

Physical addresses to memory, in a system with 4096-byte pages, consist of 12 bits that define a offset into a page and the remaining bits specify the page.

The global address space in a processor is defined by a map. For example, the Zynq UltraScale+ MPSoC consists of a hierarchy of inclusive map addresses.1 That includes a 32-bit (4 GB) map, 36-bit (64 GB) map and 40-bit (1 TB) map. DDR low (2 GB) is stored in the first map, while DDR high (32 GB) is stored in the second. PCIe is divided between all three.

Bus addresses often match the physical addresses used by the processor.

The ZynqMP uses ARM’s IOMMU that is called the System MMU (SMMU). The SMMU isolates peripherals to protect system memory and provides translation for devices with limited addressing capability.

Intel’s x86 architecture does not include an IOMMU, but it is implemented in Intel VT-d and AMD-Vi, which are designed to provide virtual machines with direct access to peripherals like Ethernet, graphics cards and persistent storage.

See Heterogeneous Memory Management (HMM) documentation in the kernel for more information.

PCIe

A PCIe link is the physical connection between devices that consists of lanes, which are differential signal pairs in both directions. So 32 lanes would require 128 signals which is calculated by 2 signals * 2 directions * 32 lanes. Instead of including a clock signal transitions around data are detected using a PLL.