The function call stack is a LIFO data structure that contains stack frames that in turn contain the return address of a calling function as well as automatic/local variables. This design allows functions to be reentrant and to be called recursively. Some languages pass parameters through registers rather than through the stack.
The most confusing part of most descriptions of a call stack is the abstraction. For example, stating that when a function completes “it exits by popping its locals off the stack and returns to the caller”. What does this mean practically in terms of how a computer operates?
In Rust a panic is only caught by a parent thread. Rust optimizes for
the case that unwinding occurs sparingly and therefore is more costly
than in languages like Java or C++ that have exceptions. This means that
less state is required to be ready to unwind. It also means that it is
not compatible with other languages and therefore exceptions must be
caught at the FFI boundary. Unwinding is implemented using the
invoke
LLVM instruction, which causes control to transfer
to a specified function.
There may not always be a stack unwinder (e.g. when writing a kernel). What is a system stack unwinder?
With a mutex a thread will sleep if the mutex is locked. While a spinlock will poll until it is unlocked. A hybrid mutex will poll for a period before getting put to sleep. A hybrid spinlock may lower the priority of a thread over time.
Context switching is the action of storing the state of a operating system process or thread. A classic example is when the operating system handles an interrupt.
A lightweight context switch occurs when switching between user threads (e.g. green threads) since minimal context needs to be stored.
System calls change the processor execution mode to a more privileged level. This is a feature of the processor that the operating system leverages, while context switching is a component of the operating system.
A memory barrier enforces the order of memory operations at the hardware level, which is critical for multiprocessor or multi-threaded environments. This is also necessary because modern processors execute instructions out-of-order. Note that compiler reordering optimizations are distinct and may require further protection.
Since processor cores have increased in speed at a faster rate than main memory small caches have been added between the cores and main memory. For example, a L1 cache of 2 to 64 KB may have a latency of 4 cycles, a L2 cache of 256 KB may have a latency of 11 cycles, a L3 cache shared between cores of 8 MB may have a latency of 39 cycles, while system memory may require 107 cycles.
A cache line specifies the amount of data transfered between the cache and main memory at a time. It is typically 64 bytes.