Nvidia on Ubuntu
Because I wanted to run a local Artificial Intelligence platform called Ollama, I wanted to ensure that my GPU was fully utilized in the system since GPUs are the particular type of hardware best suited for these Vector calculations. And, I have a 'decent' GPU - Nvidia GeForce RTX 4060 (the best you could get at the time). In trying to install the latest Nvidia driver, I set off on a week's long journey of learning, frustration and perseverance discovering the inner workings of Ubuntu 24.04, Xorg, the Linux kernel and kernel modules, DRM, Secure Boot, initramfs and more.
I still do not have the Nvidia driver loaded - even after 40+ reboots and attempts. Instead I'm using the Nouveau driver but at least I have a working system and I believe now that I've finally figured out what needs to be done to disable Nouveau and install Nvidia - a project that I might tackle shortly. I'm just documenting the things that I encountered in this journey so that I can pick it back up at the right time.
X.org[edit]
Here is the Xorg.log file from my first boot into a clean system having no Nvidia drivers.
Nouveau[edit]
The XServer is loading the nouveau driver package xserver-xorg-video-nouveau.
From the package description:
This driver for the X.Org X server (see xserver-xorg for a further description) provides support for NVIDIA Riva, TNT, GeForce, and Quadro cards.
This package provides 2D support including EXA acceleration, Xv and RandR. 3D functionality is provided by the libgl1-mesa-dri package.
This package is built from the FreeDesktop.org xf86-video-nouveau driver.
Inspection[edit]
dpkg can show us what packages are installed with 'nouveau' in the name.
dpkg -l | grep -i nouveau
ii libdrm-nouveau2:amd64 2.4.122-1~ubuntu0.24.04.1 amd64 Userspace interface to nouveau-specific kernel DRM services -- runtime
ii xserver-xorg-video-nouveau 1:1.0.17-2build1 amd64 X.Org X server -- Nouveau display driver
And, lsmod can show us what kernel modules are loaded with 'nouveau' in the name.
lsmod | grep nouveau
nouveau 3096576 68
drm_gpuvm 45056 2 xe,nouveau
drm_exec 12288 3 drm_gpuvm,xe,nouveau
gpu_sched 61440 2 xe,nouveau
drm_ttm_helper 12288 2 xe,nouveau
ttm 110592 4 drm_ttm_helper,xe,i915,nouveau
drm_display_helper 237568 3 xe,i915,nouveau
mxm_wmi 12288 1 nouveau
i2c_algo_bit 16384 3 xe,i915,nouveau
video 77824 3 xe,i915,nouveau
wmi 28672 4 video,wmi_bmof,mxm_wmi,nouveau
modinfo tells us details about the kernel module, including the dependencies.
What files does the nouveau driver install?
dpkg -L xserver-xorg-video-nouveau
/. /usr /usr/lib /usr/lib/xorg /usr/lib/xorg/modules /usr/lib/xorg/modules/drivers /usr/lib/xorg/modules/drivers/nouveau_drv.so /usr/share /usr/share/bug /usr/share/bug/xserver-xorg-video-nouveau /usr/share/doc /usr/share/doc/xserver-xorg-video-nouveau /usr/share/doc/xserver-xorg-video-nouveau/README.Debian /usr/share/doc/xserver-xorg-video-nouveau/changelog.Debian.gz /usr/share/doc/xserver-xorg-video-nouveau/copyright /usr/share/man /usr/share/man/man4 /usr/share/man/man4/nouveau.4.gz /usr/share/bug/xserver-xorg-video-nouveau/script
NVidia[edit]
The installation guide (46 chapters) is at https://download.nvidia.com/XFree86/Linux-x86_64/570.153.02/README/
I've read the whole thing.
Replacing Nouveau[edit]
See NVIDIA driver 570.153.02 README common problems #nouveau where it says basically
- denylist it
- modify your initramfs
- modify Xorg to not load nouveau
Denylist[edit]
I tried denylisting the nouveau driver and preventing it from doing modesetting by creating disable-nouveau.conf however I was unsuccessful in installing Nvidia drivers even with that in place, and performing operations from a recovery console.
I've looked at the initramfs and don't see where it is loading nouveau. Although I do see where the temporary disable-nouveau.conf file I created is read in.
Modify Initial Ram Disk[edit]
The initial ram disk is a gzipped CPIO archive
file /boot/initrd.img-6.8.0-62-generic
/boot/initrd.img-6.8.0-62-generic: ASCII cpio archive (SVR4 with no CRC)
But how do you examine it? with lsiniramfs
There are quite a lot of files that get named in it. So, pipe it through a pager like less
X.org.conf[edit]
I've looked at Xorg but I'm not sure how / if it is responsible for requiring nouveau - but I can clearly see that the package is installed.
dpkg -l | grep -E "xorg|xserver"
ii python3-xkit 0.5.0ubuntu6 all library for the manipulation of xorg.conf files (Python 3)
ii x11-xserver-utils 7.7+10build2 amd64 X server utilities
ii xorg 1:7.7+23ubuntu3 amd64 X.Org X Window System
ii xorg-docs-core 1:1.7.1-1.2 all Core documentation for the X.org X Window System
ii xorg-sgml-doctools 1:1.11-1.1 all Common tools for building X.Org SGML documentation
ii xserver-common 2:21.1.12-1ubuntu1.4 all common files used by various X servers
ii xserver-xephyr 2:21.1.12-1ubuntu1.4 amd64 nested X server
ii xserver-xorg 1:7.7+23ubuntu3 amd64 X.Org X server
ii xserver-xorg-core 2:21.1.12-1ubuntu1.4 amd64 Xorg X server - core server
ii xserver-xorg-input-all 1:7.7+23ubuntu3 amd64 X.Org X server -- input driver metapackage
ii xserver-xorg-input-libinput 1.4.0-1ubuntu24.04.1 amd64 X.Org X server -- libinput input driver
ii xserver-xorg-input-wacom 1:1.2.0-1ubuntu2 amd64 X.Org X server -- Wacom input driver
ii xserver-xorg-legacy 2:21.1.12-1ubuntu1.4 amd64 setuid root Xorg server wrapper
ii xserver-xorg-video-all 1:7.7+23ubuntu3 amd64 X.Org X server -- output driver metapackage
ii xserver-xorg-video-amdgpu 23.0.0-1build1 amd64 X.Org X server -- AMDGPU display driver
ii xserver-xorg-video-ati 1:22.0.0-1build1 amd64 X.Org X server -- AMD/ATI display driver wrapper
ii xserver-xorg-video-fbdev 1:0.5.0-2build2 amd64 X.Org X server -- fbdev display driver
ii xserver-xorg-video-intel 2:2.99.917+git20210115-1build1 amd64 X.Org X server -- Intel i8xx, i9xx display driver
ii xserver-xorg-video-nouveau 1:1.0.17-2build1 amd64 X.Org X server -- Nouveau display driver
ii xserver-xorg-video-qxl 0.1.6-1build1 amd64 X.Org X server -- QXL display driver
ii xserver-xorg-video-radeon 1:22.0.0-1build1 amd64 X.Org X server -- AMD/ATI Radeon display driver
ii xserver-xorg-video-vesa 1:2.6.0-1 amd64 X.Org X server -- VESA display driver
ii xserver-xorg-video-vmware 1:13.4.0-1build1 amd64 X.Org X server -- VMware display driver
Module Signing[edit]
On systems with Secure Boot enabled (mine), you most likely need to sign the module. See Signing NVIDIA Kernel Module. However, I didn't get an explicit message that signing was a problem; and I did see that the installation process signs the module with a generated key. I assume that the MOK process hooks into the trust system somehow.