A Brief Introduction to the Linux Graphics Stack

Introduction

The basic concepts in the Linux graphics stack are introduced. It aims to help a new user or developer to form a useful mental model in a graphics-enabled system to aid troubleshooting. It’s not to be confused with a rigorous historical or technical reference (but bug reports are still welcome).

This article is based on a slideshow I originally wrote and presented in 2014. It’s now rewritten as a blog post after a reader’s request.

This is only a first draft, and currently unlisted at the homepage. Thus the question: How did you find this article??????

1D Graphics

Teletypes

In the beginning, there were teletypewriters (tty), invented before the early 20th century for telegraph communication. In the 1960s, they’ve reached a highly-developed form - the original serial devices. When computers became popular around this time, they were naturally adopted by Unix as the system console.

This is the origin of the print keyword in programming languages, the origin of line-based editors like ed and ex.

The Unix kernel uses the device file /dev/ttyX to represent a typewriter - it’s a simple device, just a serial port for reading or writing characters, either visible characters or control characters.

Terminals

Soon, teletypes were replaced by terminals. The earliest terminals were nothing but teletype simulators using a CRT screen, thus it was nicknamed a glass tty. But they did save a lot of dead trees.

Within a few years, next-gen smart terminals appeared. They made full-screen editors like vi possible. In addition to simulating teletypes, they provided features like moving cursors, drawing lines, highlighting text, putting a chararcter at location (x, y), and so on. These features were encoded as a sequence of commands that begins with the ASCII control chararcter ESC, thus named escape sequences.

Soon there was an explosion of incompatible escape sequences. Unix programmers maintained a database to manage the situation. BSD used termcap, and System V used terminfo. Applications (e.g. Unix shell sh, csh or bash) can call the standard library, and the library picks the corresponding espace sequences from the database. A high-level library curses (later ncurses) was also developed for drawing lines, boxs, boxs, windows and buttons. This was the first-gen UI library.

Linux Virtual Terminal and Console

After microcomputers with video cards became the standard, terminals are all but disappeared. Instead, their display functionalities are implemented by terminal emulators, in other words, software that emulates whatever operations supported by these terminal hardware, then renders the visual output as text or image on the screen.

The Linux console we are all familiar with (the one you see by pressing Control + Alt + F{1..6}, or during boot time with scrolling logs), in this sense, is nothing but yet another terminal emulator, not unlike xterm or gnome-terminal. It’s just too useful so it runs in kernel space.

This subsystem is called the Virtual Terminal (VT). When the kernel option CONFIG_VT is enabled, it provides:

  1. Teletype devices, such as /dev/tty1, not backed by a real serial port but exists purely for emulation.

  2. Basic software implementation of a VT100/ANSI terminal.

  3. Bitmap fonts rendering, including custom fonts.

Now the question is, how does the Virtual Terminal itself gets its video output? Using another subsystem called the console.

The original console implementation used the VGA text mode on IBM PC compatibles, named vgacon. It still exists, but it’s rarely used today. It has very limited functionalities (e.g. low resolution) and only works on x86. It was soon replaced by fbcon in the early 2000s, which renders the visual output as 2D images on top of a Linux framebuffer, using the framebuffer subsystem.

This brings us to the realm of 2D graphics.

2D Graphics

Linux Framebuffer

Generally speaking, all 2D video hardware provides a framebuffer. Framebuffer is a piece of memory located in the VRAM. Pixel data written to this memory region is displayed as an 2D image on the screen.

Without kernel infrastructures, the only way to display graphics is controlling the hardware from userspace as root, directly manipulating its I/O ports, registers, and memory (this is a recurring theme as we’re going to see). It’s a duplication of work, a security hazard, and doesn’t allow multiple programs to coexist.

Enter Linux framebuffer. This subsystem is an abstraction layer of the video card’s framebuffer, and enables the simplest form of video driver in Linux, fbdev. A fbdev driver basically:

  1. Initializating the graphics hardware, such as setting up I/O ports, mapping memory maps, and programs registers.

  2. Setting resolution, color depth, refresh rates, and other related hardware modes. This is called modesetting and we’ll return to this topic later.

  3. Pushing pixels to the video card for display, including optional fastpaths for a very basic 2D accerelation.

A driver can be as short as a hundred lines of code and had attracted many contributors. The drivers available in Linux is an extibition of obscure, historical and embedded graphics hardware - such as atafb, a driver for m68k Atari computers, which is still receiving bugfixes as of 2022. There is even a driver for the Nokia 5110/3310 1.5-inch monochrome LCD. At the very least, you get a vesafb, which uses VESA graphics mode and works on all x86 VGA cards (with reduced performance and functionality).

After finishing the driver, the first thing you notice is that the driver is getting used by Linux’s fbcon subsystem, which is responsible for rendering the output of the Virtual Terminal, so you get the system console automatically, perhaps in native 16:9 resolution, with boot messages and even the Linux Tux logo. Instead of staring at a black screen, at this point you can even do some meaningful work on this computer.

Another benefit is that any program can draw 2D images by opening the device /dev/fbX and copying a bunch of pixels to it, the kernel and driver handle the rest. Using the framebuffer was the easiest way to display images on command-line systems with no desktop environment - you can even watch a video.

fbdev

Some embedded system programmers found it was so useful that they wanted to push it even further. The DirectFB project aimed to implement a fully-functional graphics enviornment directly on top of a 2D framebuffer, at one time you could run a Qt WebKit browser.

As fun as it sounds, the fbdev driver framework is now deprecated for reasons we’ll soon see. It’s basically maintenance-only since 2013. Writing new fbdev drivers is strongly discouraged and will not be accepted by any maintainer.

X Window

Moving up in the graphics stack, how did the X Window system fit inside this picture? Long story short, the X server works like the kernel, and the X client works like a userspace application. It’s the X server that manages the keyboard, mouse, and video hardware.

The X server can be further classified into two parts. Device Independent X (DIX) is the main X server and contains perhaps 90% of the code. On the other hand, device drivers are located in the Device Dependent X (DDX), mainly 2D video drivers. It’s where all the xf86-video-* packages belong to.

In the early 2000s, Linux (and other Unix kernels) didn’t have the infrastructure or API for fine-grained graphics hardware control from userspace, such as graphics initialization, modesetting or sending drawing commands. So the entire graphics driver stack resides solely in the userspace X server, almost as another operating system running in parallel. The kernel was completely out of the loop.

X11 2D

To be fair, there was a xf86-video-fbdev fallback driver, which basically ran the entire X server directly on top of a Linux framebuffer without raw hardware access, with reduced performance and functionality.

But for most users, when the X server starts, it would obtain total control of the video card as root, reinitialize the hardware, reprogram all the registers and take over the video card. The X server could even run the VBIOS of the card through its own 16-bit x86 emulator. At bare minimum, a driver implements code for pushing pixels to the video card, providing the X server an abstract framebuffer, similar to fbdev. Optionally a driver also implements all the video functions there are in the world, such as monitor EDID detection, power management, 2D acceleration, multi-screen, etc.

3D Graphics

Kernel

DRM

With the emergence of 3D graphics and GPU computing, the original apporach became unsustainable. The X server is no longer the only userspace program that needs video card access. This necessitated an infrasturcture in the kernel, Direct Rendering Manager to provide an interface for userspace programs to allocate buffers in the VRAM, send instructions to the GPU’s command queues, ring buffers, and perform other related low-level operations.

These operations are exposed to the userspace via APIs, as ioctl() operations on a device file in /dev/dri (should’ve named /dev/drm). It’s worth noting that they are not standardized, and depends on the GPU’s hardware details.

The kernel is also responsible for necessary management and multiplexing, such as graphics memory management, resource locking, distinguishing privileged and unprivileged operations, to ensure the video hardware is used in a safe manner.

As a result, applications can take care of their own rendering without worrying about the global state of the GPU. Pure GPGPU computation workloads without graphics also get supported within this framework.

KMS

Simultaneously people also realized eliminating direct hardware access also requires an infrastructure for Kernel Mode-Setting for the same drivers. Although the original motivation came from the 3D graphics stack, it later became the standard for all video hardware.

A reexamination revealed several problems of userspace modesetting:

  1. Counterproductive code duplication. Modesetting on a modern video card is difficult, the system needs to handle many low-level tasks, including: monitor EDID detection, power management, DisplayPort and HDMI initialization, hotplugging, multi-screen support, among others. The kernel ought to provide it as a basic service to the userspace.

  2. A serious security hazard. The rooted X server is obviously a huge security vulnerability.

  3. A safety hazard, creating usability problems. Since the application essentially takes over the video card beyond the kernel’s control, incorrect or conflicting modesetting can leave the graphics hardware in an inconsistent state: A X server crash can be catastrophic, causing a total system lockup. Similar problems exist when the kernel crashes - the graphics simply hangs. The kernel doesn’t know how to reset the video card to show a console, it’s impossible to display a kernel panic message on the screen (Control + Alt + F{1..6} was only possible with cooperation from the X driver). From this perspective, the infamous Windows Blue Screen of Death is actually a great feature and shows it had a mature graphics stack - its equivalence was not even technically possible on Linux until the 2010s.

  4. Screen flickering due to unnecessary hardware reinitialization, some found it was an irratating problem, but it’s the least of the problems.

Kernel Modesetting solve these problems by performing modesetting operations within the kernel, and providing an unified API to the userspace. It’s implemented as a device driver in the KMS subsystem. These operations are exposed to the userspace via APIs as ioctl() operations on a device file in /dev/dri. Unlike DRM, KMS operations are vendor-neutral and standardized.

Overall

After DRM and KMS are implemented, this became the basic kernel architecture for graphics - all video drivers provide DRM and KMS functionalities via both subsystems, and all userspace accesses to the graphics hardware go through the DRM/KMS subsystem in the kernel.

DRM & KMS

The KMS subsystem also provides a framebuffer compatibility layer, so that legacy components on top of the fbdev subsystem continue to function. The most important user is fbcon in the kernel itself, which is responsible for displaying the Virtual Terminal, so you get boot messages and command-line access.

This is why one shouldn’t ever write a new fbdev driver again, DRM/KMS can do everything fbdev can. But none of KMS features such as monitor management, essential even on embedded systems, are supported by fbdev.

fbdev compat

Userspace

Mesa

Userspace driver is the next piece of the puzzle. The hardware only provides low-level and hardware-specific operations, accessible via the DRM subsystem. These operations are not universal. It’s the responsibility of a userspace driver to use them accordingly to implement 3D rendering.

This is where Mesa comes to play.

Mesa has both hardware independent and dependent parts. It’s crucial in the graphics stack by offering three key features, First, it offers hardware-specific DRI Drivers, which is the userspace graphics drivers needed for translating the OpenGL operations to hardware commands, which is then executed via DRM. Thus, developing a new graphics driver needs close cooperations between Linux and Mesa developers.

Mesa

Next, it also provides several cores libraries. The most important one is the libGL library, which implements full OpenGL API, of which all 3D programs are linked against - libGL is hardware independent, how OpenGL is rendered depends on what DRI Driver you’re using. Finally, it offers a high-quality software OpenGL renderer, LLVMpipe, which is important to serve as a reference implementation, and to serve as a fallback when 3D video cards are not available.

X Window

Finally, X Window system also needed to be extended to enabling X applications to use the new 3D capabilities.

GLX & EGL

OpenGL itself is a platform-independent standard, and it doesn’t specify any binding to the window system. A window system must have a way to allow an application to open and manage an OpenGL context, so all the rendering can take place within it. Windows and Mac OS X provided WGL and AGL for this purpose. Likewise, X Window provides the GLX extension.

This involves the modification of both the X protocol, the X server and the X client. The X protocol is modified to allow X server and X client to exchange GLX messages, and the X server gets a new module called libglx.so which implements this protocol. Finally, the X client (application) must also speak GLX, which is provided by Mesa as the libGLX library.

Recently, an alternative standard EGL also became available. It was originally created during the works on OpenGL ES, an OpenGL variant for embedded applications, but soon gets used by conventional OpenGL as well. Its main advantage is being a cross-platform standard independent of the window system. EGL was first adopted by Wayland, but it’s technically possible for X applications to use this API as well.

DRI

3D graphics is rendered directly on the GPU in its own buffer, managed by the kernel’s DRM infrastructure, without the intervention or knowledge from the X server. When it’s needed to display the rendered graphics within an X applications, the X server must be first informed about the existence of these kernel buffers, so they can be associated to drawable object by the X server (such as a window or a pixmap). Therefore, the X server added a DRI extension to accomplish this task. The DRI2 extension was used many years before DRI3 replaced it.

DRI3 is a simple procotol, which does essentially two things: convert a kernel DMA buffer to an X pixmap, or convert an X pixmap to a kernel DMA buffer.

In the general sense, the term Direct Rendering Infrastructure is a catch-all term that refers to the DRM subsystem in the kernel, the DRI driver in Mesa, and the extented X server, but in the narrow sense it’s an X extension only.

Finally, the original DDX drivers are still used in the X server for 2D rendering. The only difference is that hardware initialization is now performed through KMS, and 2D framebuffer management and acceleration is done using DRM. Also for 3D DRI to function, a bit of supporting code is required.

Mesa

Glamor & xf86-video-modesetting

A separate and argubly legacy 2D DDX driver in the X server was recoginzed as a maintenance burden.

The solution is twofold. First, just like it was possible to start X server on top of a legacy Linux fbdev framebuffer using xf86-video-fbdev, with the new standardized KMS subsystem, it’s now possible to write an universal DDX driver for all video cards, this driver is named xf86-video-modesetting. The next change is a generic OpenGL-based 2D acceleration framework for the X server, Glamor. Glamor is able to translate X’s 2D rendering operations automatically to OpenGL. Then the X server itself loads libGL and the 3D DRI driver, and let Mesa handle the rest. xf86-video-modesetting uses Glamor as its acceleration backend.

fbdev compat

This is how a graphics-enabled system functions.

Loose Ends

Direct Rendering and Indirect Rendering in X Window

Typically, 3D graphics work under direct rendering using the procedures described above. But it’s also possible to use indirect rendering with the X Window system.

There’s the common misconception that direct rendering is hardware accelerated, while indirect rendering is done by the software renderer. This is wrong. How indirect rendering works is actually a bit tricky.

First, the GLX extension in X is not just a protocol for setting OpenGL contexts up for 3D rendering to take place. Optionally it’s also capable of streaming OpenGL commands from the applications to the remote X server, and let the X server do the rendering there - usually localhost, or possibly on a separate machine. This is called indirect rendering, and demostrates X’s network transparency.

In direct rendering, The DRI driver gets loaded by the application, on the client side. When an application calls libGl, it activates Mesa and Mesa loads the DRI driver, which is then used to communicate with the Linux kernel to render 3D graphics directly to the hardware. All on the client side. The X server does not intervene.

In indirect rendering, Mesa automatically performs something sneaky under the hood. Instead of loading a DRI driver and doing the real rendering, instead, it now starts passing OpenGL commands to the X server via the GLX protocol. To the application it still uses the same libGL, but the behavior is entirely different.

Hardware-Accerelated Indirect Rendering

When the X server is asked to do indirect rendering, it can use whatever driver it’s loaded with. Usually it’s a software renderer, but it also can be a real 3D DRI Driver. Yes, confusingly the X server also has the ability to load its own copy of the DRI driver to do rendering on the server side. This feature is called AIGLX (Accelerated Indirect GLX). When the X server is backed by an actual 3D driver, the indirect rendering path could actually have acceptable performance.

One use is viewing 3D graphics locally from a remote machine. Another historically more important use was compositing window managers, a la Compiz. Before DRI2 was implemented, DRI1 had no ability to redirect a window to an offscreen buffer, so the indirect rendering code path got abused - if we stop applications from running OpenGL by themselves but force-redirect everything to the X server instead, now we gain full control over the rendering process and it’s now possible to redirect the results to a buffer. Then we use AIGLX to get acceptable performance.

AIGLX

Many people check whether their 3D acceleration works by observing this line in the X server log:

AIGLX: Loaded and initialized /usr/lib/dri/i915_dri.so

So they believe the 3D DRI driver is loaded and executed by the X server. It’s incorrect, this copy of the DRI driver is only used when an application passes OpenGL command to the X server for indirect rendering. This rarely happens on a desktop system (and the aging GLX protocol only supports streaming OpenGL 1.4, so it doesn’t run modern applications). It’s theoretically possible for AIGLX to fallback to the software renderer without losing 3D acceleration.

But because whether X can successfully load the 3D DRI driver is strongly correlated to whether a client-side application can successfully load the same DRI driver via Mesa, watching for the AIGLX message works as a diagnostic method.

Software-Only Direct Rendering

Conversely, it’s entirely possible to have direct rendering using a software renderer, so all 3D rendering is done via the Mesa software on the client side. The rendered buffers are then communicated back to X11 via DRI.

Direct Rendering with LLVMpipe

I don’t understand how it works under the hood yet. How it is possible for Mesa to pass the software result to X11 via DRI? DRI accepts DMA-BUF memory, and usually it’s mapped to the GPU. Does Mesa simply pass a pointer to RAM instead?

But as a matter of fact, on the machine I’m currently using to write this, I get:

direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Mesa/X.org (0xffffffff)
    Device: llvmpipe (LLVM 11.0.0, 256 bits) (0xffffffff)
    Version: 20.3.5
    Accelerated: no
    Video memory: 2927MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 4.5
    Max compat profile version: 3.1
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.2
OpenGL vendor string: Mesa/X.org
OpenGL renderer string: llvmpipe (LLVM 11.0.0, 256 bits)

Wayland

In the good old days before 3D, the architecture of X11 was clean and simple. The X client tells the X server, “draw this rectangle, display that text.” The X server calls the driver to draw it. Later, it tells the client, “the mouse has moved.” Network transparency was another added benefit. If the X client talks to the server via TCP/IP, it automaticaly gets displayed on another machine far away. Mind you, these are real windows, not some screen grabbing knock-offs - as hardcore X fans may say.

Unfortunately, proponents of Wayland argue that X’s architecture is no longer a perfect fit for a modern desktop system.

Windows are no longer combinations of simple geometric shapes like rectangles or squares, it’s more combinations of various 2D images, pre-rendered by themselves using their own 2D drawing libraries like Cairo. Thus, X’s own drawing functions are useless. Similarly, fonts are now rendering with Unicode & internationalization support, hinting, sub-pixel anti-aliasing by application themselves using a separate library, like Freetype. X’s font rendering became useless. Most desktop environments use a compositing window manager, which redirects the content of all windows to offscreen buffers, then the windows are redrawn with their own borders, smooth animations, with new positions calculated and managed by the window manager itself. X’s own windowing feature is useless.

X’s network transparency is in name only, and largely abanboned in practice. On a typical desktop machine, nearly all contents bypass the X server via the Direct Rendering Infrastructure. The application renders almost everything by themselves using the local GPU by calling the kernel, X only gets a pointer to that buffer when the rendering is done. Even back in the 2D era, Shared Memory Extension (SHM) was used for passing the pointer of a pre-rendered 2D image directly to the X server without its intervention.

Yes, one can technically still redirect a 2D or a 3D application over the network. But in practice, the plain X11 forwarding only works with simple 2D windows, or 3D applications with limited graphical activities. Even the fastest network cannot support a 3D program that updates millions of point per seconds to show a full-range, full-motion animation.

Interestingly, the VirtualGL developers made an amazing hack to solve this problem - for 2D, we still use the native X11 forwarding, but for 3D, we redirects all graphics to offscreen framebuffers, then we use intercept the libGL operations from the applications to know when the graphics are changed. Later, we read the rendered results as 2D images, compress and transport them over the network, and paint them as X pixmaps to their original locations on the window of the remote machine. I’ve used this many times and I found this is a gamechanger. However, technically it’s just screen grabbing with extra steps, isn’t it… It’s more so if you consider that even the 2D contents handled by X itself are mostly pre-rendered pixels and images.

The result is that nearly all subsystems in X11 have outlived their usefulness. Or at least this is the argument by Wayland proponents. My understanding (disclamer: I do not use Wayland) is that the basic idea behind Wayland is - if everything is already rendered by applications themselves, what we only need is just a minimalistic server which does two things only:

  1. Process user inputs and pass them to applications.

  2. Act as a compositing window manager.

As this article does not describe or show window managers and other details about components in a a desktop environment, the block diagram doesn’t look too different from the X window system.

fbdev compat

Deprecating Linux’s Virtual Terminal?

We’ve already seen that the Linux’s Virtual Terminal is nothing special but yet another terminal emulator, but just useful enough to run in the kernel. Furthermore, it’s basically a legacy system already at this point.

When you see the text from tty1 of the Linux console on your screen, what you’re actually seeing is:

  1. A teletypewriter.

  2. …which is a compatibility layer created by video terminal.

  3. …which is a compatibility layer created by Linux’s Virtual Terminal subsystem.

  4. …which is a compatibility layer created by the fbcon subsystem, to convert text to images on a 2D framebuffer.

  5. …which is an abstraction layer created by the fbdev subsystem.

  6. …which is a compatibility layer created by the DRM subsystem.

  7. …which performs the actual rendering using the 3D GPU.

Furthermore, the built-in Virtual Terminal is not even a great one. It only implemented a very limited DEC VT102 terminal. It has no international keyboard support, and does no support Unicode fonts. Graphics-wise, it has only limited modesetting support, extremely limited hardware accerelation, and no font anti-aliasing.

If the Virtual Terminal is just a terminal emulator, logically it should be possible to replace it with something else in the userspace (keep an old-fashion VT if you wish for troubleshooting). There was already a terminal emulator called fbterm with quite a lot of features, but it was still based on the legacy fbdev subsystem.

Since KMS/DRM became the standard, displaying 3D graphics is no longer the exclusive rights by a window system, any program, even running bare-metal from the command-line, should be able to do 3D as well as any X or Wayland applications. Thus, back in 2014 when I was writing the original slideshow, there was an ambitious project called kmscon, which is a terminal emulation built on top on KMS/DRM, and aimed to be a complete replacement for Linux’s Virtual Terminal for daily uses. It had full support from VT220 to VT510, full internationalization support via fontconfig and libxkbcommon, and features hardware-accerelation rendering, and even has multi-seat capability.

This project was soon picked up by systemd, and plans were made to make systemd-consoled as the standard Linux console for systemd-based systems. This move obviously generated a flood of new rants on systemd - “Oh no! They’re coming after our console!!” But I genuinely think the console is useful from the perspective of modernizing the graphics stack.

Unfortunately (or fortunately), due to the lack of developer interests, systemd-consoled never made it, and kmscon project seemed to have been mostly forgotten at this point.

See Also

Historical

Recent