The Matrix as an Operating System

Blast from the past

Mar 08, 2024

Years ago, in 2009, while working as a FreeBSD kernel developer (think Unix), I was deeply immersed in how operating systems work, and wrote this piece of text comparing the Matrix (from the 1999 movie and its sequels) to how operating system work.

I’d like to save this text from digital oblivion and so I’m reproducing it here.

Since one of my fascinations is operating systems design, implementation and maintenance, ever since I've first seen the Matrix movies I've thought some of the concepts in them can be related to familiar concepts in operating systems:
The Matrix world: a running operating system, with userland (the "common" world, in which people live), and the kernel (the "Matrix" proper, the meta). Apparently it's a pretty buggy OS...
People: processes, both kernel processes and user processes. There's a big distinction between normal, "unprivileged" people, and daemons with root privileges - "agents". Root daemons can open privileged ports, kill random processes, manage memory, etc.
Matrix: the kernel. It looks like a message passing kernel, not necessarily a microkernel (though they are some microkernel aspects, such as the abundance of kernel processes, strict separation of duty between them, and the already mentioned message passing). The kernel manages all processes, and performs operations on their behalf (such as keeping them alive, servicing them and recycling them). But there's an apparent security defect: some userland processes can (because of a bug) transfer and execute parts of their programs in the kernel space. Only certain syscalls are affected (the "phones"), and this kind of privilege escalation garbles the userland process' return stack, such that if the process receives a signal, it segfaults and is garbage collected (if you're killed in the Matrix, you're dead for real).
Oracle: the process (task) scheduler. Has all the numbers from process monitoring (resource usage) and knows in advance (broadly) how to schedule them to run to their optimum.
Agents: system monitoring / intrusion detection / prevention system (IDS / IPS) with heuristical operation. Most of them have a kernel part (kernel module) but are basically daemons run with superuser privileges in the userland. They are tasked to find and kill processes which attempt to violate system security.
The trainman: kernel-userland gateway / message passing queue. You've got to go through him if you want to validly pass data between userland and kernel. You also might be stuck in the queue forever.
The Merovingian: networking / IPC stack. It's his business to know everything going on between processes. Has a bug manifesting in occasional input / output data corruption.
Vampires / ghosts: compatibility shims for older API / KPI versions. Their code is rudimentary and, for historical reasons, interfaces with parts of kernel normal processes shouldn't (i.e. they have lots of layering violations).
The Architect: kernel’s monitoring infrastructure (or a hypervisor), tasked with monitoring processes, killing those that wedge and restarting those that crashed. Since it's a realtime high-availability OS, the debugging and monitoring infrastructure has the absolute highest priority and is "blessed" to be infallible (thus, to limit the possibility of error, is very limited in its complexity). It's been misconfigured to be overzealous, does availability checking too often, taking too many resources, and so interferes with the normal operation of the operating system.
Keymaster: security / privilege subsystem. It's stable, but unfortunately relies on the VM system and the IPC system which are buggy, and can be exploited by processes to gain more privileges from him.
THE PLOT: There's a design bug between the VM (virtual memory) system, the process management system and the scheduler, manifesting under high system load (lots of processes, high memory pressure). It is a compound error, which results in at least three things:
Memory pages can get corrupt or miss-assigned to processes that don't own them. Since kernel and userland share memory, processes on either side can end up with memory pages from the other, revealing sensitive data and making way for security escalations. Mixing up the memory pages bypasses address space protection between the processes.
The IPC subsystem, bad as it already is, gets even worse when its data structures get corrupted or the memory load gets so high it deadlocks waiting for buffers.
The system monitor (Architect) goes berserk, killing and restarting processes in a loop, unaware that it makes the things worse by building additional memory pressure and process load, eventually greatly helping spread the VM pages corruption between the processes.
Agent Smith: privileged IPC daemon with part of it implemented as a kernel module. It's so closely tied with the kernel module part that it shares data structures with it without sanity checking. Once it was killed by another privileged process, but it was in the middle of a syscall so when the monitor restarted him, the corruption which was already done to its process descriptor resulted in most of its program being executed in the kernel context. It continued to work in this corrupted state for a long time, wedged in a loop, erroneously tagging processes as security breaches and overwriting some of their memory pages with its own.
Neo: Initially a userland network server process, the VM corruption resulted in it being assigned both superuser privileges and high priority (CPU time). Eventually, it got its executable memory pages mixed up with the IDS process Smith, but not the data pages. Before long it also starts killing processes, including Smith and his corrupted copies.
THE ENDING: process Smith eventually tries to kill the scheduler process, but since it's itself scheduled by it, cannot do so reliably. The system gets wedged because the scheduler cannot perform its tasks anymore, including interrupt servicing, but the part of Smith's code in the scheduler's memory (which is accidentally also the part shared with process Neo) still runs. So there are only two processes running, and they both are trying to kill each other. Meanwhile, since interrupts are no longer being served, the hardware watchdog timer wakes up, inserts a NMI, which wakes up the monitoring system. It decides the system is in a critical state and proceeds to kill all processes, then restarts them to bring the system up again. The End.
Post mortem analysis: There appears to be an inherent flaw in the design of the operating system, especially in the virtual memory, IPC and monitoring subsystems, resulting in a global memory corruption among processes and critical failure of address space protection for a small number of processes.
Recommendation: More fine tuning is needed to settle out the proper process priorities, reduce priority inversions and imbalance. VM system probably needs to be rewritten and IDS system replaced with a less resource intensive version. System monitor needs to be modified not to start extensive operations if the system load is above a threshold.
There! An interpretation of The Matrix without involving "free will" in any way.

Our SciFi Future

Discussion about this post