linux · 2025-03-31 0

安装 nvidia-container-toolkit

一、安装 NVIDIA 驱动

1.确认有 NVIDIA 显卡

(如果没有输出,说明系统没检测到 NVIDIA 显卡)

zxm@zxm-pc:~$ lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GF108 [GeForce GT 730] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GF108 High Definition Audio Controller (rev a1)

2.检查内核是否加载了 NVIDIA 模块
(如果没有任何输出,说明驱动模块没有加载。)

lsmod | grep nvidia

3.查看推荐驱动

zxm@zxm-pc:~$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0 ==
modalias : pci:v000010DEd00000F02sv00001462sd00008A9Fbc03sc00i00
vendor   : NVIDIA Corporation
model    : GF108 [GeForce GT 730]
driver   : nvidia-driver-390 - distro non-free recommended
driver   : xserver-xorg-video-nouveau - distro free builtin

4.安装 NVIDIA 驱动

自动安装或重装 NVIDIA 驱动

ubuntu-drivers autoinstall

手动选择安装,版本号根据上面推荐选择

apt install nvidia-driver-390

5.运行

nvidia-smi

二、安装 nvidia-container-toolkit

1.先下载镜像 gpgkey

curl -fsSL https://mirrors.ustc.edu.cn/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

2.下载原始源配置文件到临时位置

curl -s -L -o /tmp/nvidia-container-toolkit.list.origin https://mirrors.ustc.edu.cn/libnvidia-container/stable/deb/nvidia-container-toolkit.list
  1. 使用 sed 替换源地址并添加 signed-by 字段
sed 's#deb https://nvidia.github.io#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://mirrors.ustc.edu.cn#g' /tmp/nvidia-container-toolkit.list.origin > /tmp/nvidia-container-toolkit.list.fixed

3.将修改后的文件复制到 APT 源目录(需 root 权限)

sudo cp /tmp/nvidia-container-toolkit.list.fixed /etc/apt/sources.list.d/nvidia-container-toolkit.list

4.查看结果确认是否正确

cat /etc/apt/sources.list.d/nvidia-container-toolkit.list

5.安装 nvidia-container-toolkit

apt-get install -y nvidia-container-toolkit

三、使用

1.配置 docker 使用 Nvidia driver

nvidia-ctk runtime configure --runtime=docker
systemctl restart docker

2.docker 容器使用 gpus

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama_1 ollama/ollama:0.5.13-rc6
version: "3"

services:
  ollama1:
    image: ollama/ollama:0.5.13-rc6
    container_name: ollama_1
    restart: no
    ports:
      - 11434:11434
    volumes:
      - "./.ollama:/root/.ollama"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]