docker19.03使用NVIDIA显卡

  • 2019 年 10 月 4 日
  • 筆記

docker19.03使用NVIDIA显卡

作者: 张首富  时间: 2019-09-06

前言

2019年7月的docker 19.03已经正式发布了,这次发布对我来说有两大亮点。 1,就是docker不需要root权限来启动喝运行了 2,就是支持GPU的增强功能,我们在docker里面想读取nvidia显卡再也不需要额外的安装nvidia-docker

安装nvidia驱动

确认已检测到NVIDIA卡:

$ lspci -vv | grep -i nvidia  00:04.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1)          Subsystem: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB]          Kernel modules: nvidiafb

这里不再详细介绍:如果不知道请移步ubuntu离线安装TTS服务

安装NVIDIA Container Runtime

$ cat nvidia-container-runtime-script.sh    curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey |     sudo apt-key add -  distribution=$(. /etc/os-release;echo $ID$VERSION_ID)  curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list |     sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list  sudo apt-get update

执行脚本

sh nvidia-container-runtime-script.sh
OK  deb https://nvidia.github.io/libnvidia-container/ubuntu18.04/$(ARCH) /  deb https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/$(ARCH) /  Hit:1 http://archive.canonical.com/ubuntu bionic InRelease  Get:2 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  InRelease [1139 B]  Get:3 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64  InRelease [1136 B]  Hit:4 http://security.ubuntu.com/ubuntu bionic-security InRelease  Get:5 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  Packages [4076 B]  Get:6 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64  Packages [3084 B]  Hit:7 http://us-east4-c.gce.clouds.archive.ubuntu.com/ubuntu bionic InRelease  Hit:8 http://us-east4-c.gce.clouds.archive.ubuntu.com/ubuntu bionic-updates InRelease  Hit:9 http://us-east4-c.gce.clouds.archive.ubuntu.com/ubuntu bionic-backports InRelease  Fetched 9435 B in 1s (17.8 kB/s)  Reading package lists... Done
$ apt-get install nvidia-container-runtime  Reading package lists... Done  Building dependency tree  Reading state information... Done  The following packages were automatically installed and are no longer required:    grub-pc-bin libnuma1  Use 'sudo apt autoremove' to remove them.  The following additional packages will be installed:  Get:1 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  libnvidia-container1 1.0.2-1 [59.1 kB]  Get:2 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  libnvidia-container-tools 1.0.2-1 [15.4 kB]  Get:3 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64  nvidia-container-runtime-hook 1.4.0-1 [575 kB]    ...  Unpacking nvidia-container-runtime (2.0.0+docker18.09.6-3) ...  Setting up libnvidia-container1:amd64 (1.0.2-1) ...  Setting up libnvidia-container-tools (1.0.2-1) ...  Processing triggers for libc-bin (2.27-3ubuntu1) ...  Setting up nvidia-container-runtime-hook (1.4.0-1) ...  Setting up nvidia-container-runtime (2.0.0+docker18.09.6-3) ...
which nvidia-container-runtime-hook  /usr/bin/nvidia-container-runtime-hook

安装docker-19.03

# step 1: 安装必要的一些系统工具  yum install -y yum-utils device-mapper-persistent-data lvm2  # Step 2: 添加软件源信息  yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo  # Step 3: 更新并安装 Docker-CE  yum makecache fast  yum -y install docker-ce-19.03.2  # Step 4: 开启Docker服务  systemctl start docker && systemctl enable docker

验证docker版本是否安装正常

$ docker version  Client: Docker Engine - Community   Version:           19.03.2   API version:       1.40   Go version:        go1.12.8   Git commit:        6a30dfc   Built:             Thu Aug 29 05:28:55 2019   OS/Arch:           linux/amd64   Experimental:      false    Server: Docker Engine - Community   Engine:    Version:          19.03.2    API version:      1.40 (minimum version 1.12)    Go version:       go1.12.8    Git commit:       6a30dfc    Built:            Thu Aug 29 05:27:34 2019    OS/Arch:          linux/amd64    Experimental:     false   containerd:    Version:          1.2.6    GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb   runc:    Version:          1.0.0-rc8    GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f   docker-init:    Version:          0.18.0    GitCommit:        fec3683

验证下-gpus选项

$ docker run --help | grep -i gpus        --gpus gpu-request               GPU devices to add to the container ('all' to pass all GPUs)

运行利用GPU的Ubuntu容器

 $ docker run -it --rm --gpus all ubuntu nvidia-smi  Unable to find image 'ubuntu:latest' locally  latest: Pulling from library/ubuntu  f476d66f5408: Pull complete  8882c27f669e: Pull complete  d9af21273955: Pull complete  f5029279ec12: Pull complete  Digest: sha256:d26d529daa4d8567167181d9d569f2a85da3c5ecaf539cace2c6223355d69981  Status: Downloaded newer image for ubuntu:latest  Tue May  7 15:52:15 2019  +-----------------------------------------------------------------------------+  | NVIDIA-SMI 390.116                Driver Version: 390.116                   |  |-------------------------------+----------------------+----------------------+  | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |  | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |  |===============================+======================+======================|  |   0  Tesla P4            Off  | 00000000:00:04.0 Off |                    0 |  | N/A   39C    P0    22W /  75W |      0MiB /  7611MiB |      0%      Default |  +-------------------------------+----------------------+----------------------+    +-----------------------------------------------------------------------------+  | Processes:                                                       GPU Memory |  |  GPU       PID   Type   Process name                             Usage      |  |=============================================================================|  |  No running processes found                                                 |  +-----------------------------------------------------------------------------+  :~$ 

故障排除

您是否遇到以下错误消息:

$ docker run -it --rm --gpus all debian  docker: Error response from daemon: linux runtime spec devices: could not select device driver "" with capabilities: [[gpu]].

上述错误意味着Nvidia无法正确注册Docker。它实际上意味着驱动程序未正确安装在主机上。这也可能意味着安装了nvidia容器工具而无需重新启动docker守护程序:您需要重新启动docker守护程序。

我建议你回去验证是否安装了nvidia-container-runtime或者重新启动Docker守护进程。

列出GPU设备

$ docker run -it --rm --gpus all ubuntu nvidia-smi -L  GPU 0: Tesla P4 (UUID: GPU-fa974b1d-3c17-ed92-28d0-805c6d089601)
$ docker run -it --rm --gpus all ubuntu nvidia-smi  --query-gpu=index,name,uui  d,serial --format=csv  index, name, uuid, serial  0, Tesla P4, GPU-fa974b1d-3c17-ed92-28d0-805c6d089601, 0325017070224

待验证,因为我现在没有GPU机器—已经验证完成,按照上述操作可以在docker里面成功的驱动nvidia显卡