自动化kolla-ansible部署openstack+GPU透传方法
自动化kolla-ansible部署openstack+GPU透传方法
欢迎加QQ群:1026880196 进行交流学习
1. CentOS7.x-8.x系列为虚拟机配置GPU直通
1. 编辑文件vim /etc/modules, 添加以下内容: pci_stub vfio vfio_iommu_type1 vfio_pci kvm kvm_intel 2. 在KVM主机上启用IOMMU #对于Intel芯片: GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on" #对于AMD芯片: GRUB_CMDLINE_LINUX_DEFAULT="iommu=pt iommu=1" vim /etc/default/grub GRUB_TIMEOUT=5 GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)" GRUB_DEFAULT=saved GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL_OUTPUT="console" GRUB_CMDLINE_LINUX="crashkernel=auto rhgb quiet intel_iommu=on" GRUB_DISABLE_RECOVERY="true"
3. 重新生成grub
EFI
grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg
非EFI
grub2-mkconfig -o /boot/grub2/grub.cfg
4. 将下列内容加入到blacklist中以避免被宿主机占用,编辑文件 vim /etc/modprobe.d/blacklist.conf blacklist snd_hda_intel blacklist amd76x_edac blacklist vga16fb blacklist nouveau blacklist rivafb blacklist nvidiafb blacklist rivatv blacklist nvidia 5. 查找显卡的Product ID 以及 Vendor ID: yum install pciutils -y lspci -nn | grep NVIDIA 如下: [root@stein-a ~]# 03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104GL [Quadro P4000] [10de:1bb1] (rev a1) 03:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1) 6. 编辑 vim /etc/modprobe.d/vfio.conf # create new: for [ids=***], specify [vendor-ID:device-ID] options vfio-pci ids=10de:1bb1,10de:10f0 7. 写入到系统启动项 echo 'vfio-pci' > /etc/modules-load.d/vfio-pci.conf 8. 重新生成initramfs mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak dracut -v /boot/initramfs-$(uname -r).img $(uname -r) 9. 重启系统 reboot 10. 验证 lspci -nnk -d 10de:1bb1 dmesg | grep -i vfio [root@stein-a ~]# lspci -nnk -d 10de:1bb1 03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104GL [Quadro P4000] [10de:1bb1] (rev a1) Subsystem: NVIDIA Corporation Device [10de:11a3] Kernel driver in use: vfio-pci Kernel modules: nouveau [root@stein-a ~]# dmesg | grep -i vfio [ 2.503115] VFIO - User Level meta-driver version: 0.3 [ 2.515645] vfio_pci: add [10de:1bb1[ffff:ffff]] class 0x000000/00000000 [ 2.515752] vfio_pci: add [10de:10f0[ffff:ffff]] class 0x000000/00000000 [root@stein-a ~]#
2. Ubuntu18.04系列为虚拟机配置GPU直通
1. 编辑文件vim /etc/modules, 添加以下内容: pci_stub vfio vfio_iommu_type1 vfio_pci kvm kvm_intel 2. 在KVM主机上启用IOMMU #对于Intel芯片: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on" #对于AMD芯片: GRUB_CMDLINE_LINUX_DEFAULT="iommu=pt iommu=1" vim /etc/default/grub GRUB_DEFAULT=0 GRUB_TIMEOUT_STYLE=hidden GRUB_TIMEOUT=0 GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian` GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on" GRUB_CMDLINE_LINUX=""
3. 重新生成grub
EFI
grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg
非EFI
grub2-mkconfig -o /boot/grub2/grub.cfg
4. 将下列内容加入到blacklist中以避免被宿主机占用,编辑文件 vim /etc/modprobe.d/blacklist.conf blacklist snd_hda_intel blacklist amd76x_edac blacklist vga16fb blacklist nouveau blacklist rivafb blacklist nvidiafb blacklist rivatv blacklist nvidia 5. 查找显卡的Product ID 以及 Vendor ID: apt install pciutils -y lspci -nn | grep NVIDIA 如下: [root@stein-a ~]# lspci -nn | grep NVIDIA 03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104GL [Quadro P4000] [10de:1bb1] (rev a1) 03:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1) 6. 编辑 vim /etc/modprobe.d/vfio.conf # create new: for [ids=***], specify [vendor-ID:device-ID] options vfio-pci ids=10de:1bb1,10de:10f0 7. 写入到系统启动项 echo 'vfio-pci' > /etc/modules-load.d/vfio-pci.conf 8. 重新生成initramfs dracut -v /boot/initramfs-$(uname -r).img $(uname -r) 9. 重启系统 reboot 10. 验证 lspci -nnk -d 10de:1bb1 dmesg | grep -i vfio root@kvm:~# lspci -nnk -d 10de:1bb1 dmesg | grep -i vfio 03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104GL [Quadro P4000] [10de:1bb1] (rev a1) Subsystem: NVIDIA Corporation GP104GL [Quadro P4000] [10de:11a3] Kernel driver in use: vfio-pci Kernel modules: nvidiafb, nouveau root@kvm:~# dmesg | grep -i vfio [ 3.838714] VFIO - User Level meta-driver version: 0.3 [ 3.846238] vfio-pci 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none [ 3.866370] vfio_pci: add [10de:1bb1[ffffffff:ffffffff]] class 0x000000/00000000 [ 3.886375] vfio_pci: add [10de:10f0[ffffffff:ffffffff]] class 0x000000/00000000
#如果你单机部署的,在单机下配置。 #如果你是高可用部署的,在三台控制节点配置 1. 添加pci vim /etc/kolla/config/nova/nova-compute.conf [libvirt] inject_password=true cpu_mode=host-passthrough virt_type = kvm [pci] passthrough_whitelist: { "vendor_id": "10de", "product_id": "1bb1" } 2. 修改nova.conf vim /etc/kolla/config/nova.conf [DEFAULT] service_down_time = 120 cpu_allocation_ratio = 4.0 disk_allocation_ratio=1.0 ram_allocation_ratio = 1.0 reserved_host_disk_mb = 4096 reserved_host_memory_mb = 4096 allow_resize_to_same_host = True remove_unused_base_images = False image_cache_manager_interval = 0 resume_guests_state_on_host_boot = True [PCI] alias: { "vendor_id":"10de", "product_id":"1bb1", "device_type":"type-PCI", "name":"quadro-p4000" } [filter_scheduler] enabled_filters = RetryFilter, AvailabilityZoneFilter, ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter, PciPassthroughFilter available_filters = nova.scheduler.filters.all_filters 3. GPU 类型实例创建 openstack flavor create --vcpus 4 --ram 8192 --disk 30 --property "pci_passthrough:alias"="quadro-p4000:1" g1.4c.8m.p4000
3. CentOS7.x系列 安装显卡驱动
1. 查看是否含有英伟达显卡 lspci | grep -i NVIDIA #下面说明有1块英伟达的显卡 [root@train-all ~]# lspci | grep -i NVIDIA 04:00.0 VGA compatible controller: NVIDIA Corporation GP104GL [Quadro P4000] (rev a1) 04:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1) [root@train-all ~]# 2. 添加ELRepo源 rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org 3. 安装ELRepo rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm 4. 安装nvidia-detect yum install nvidia-detect -y 5. 运行nvidia-detect nvidia-detect -v 6. 查找驱动程序 yum search kmod-nvidia 7. 安装驱动程序 yum install kmod-nvidia.x86_64 -y 8. 查看禁用Nouveau lsmod | grep nouveau #若没有输出 则说明禁用成功,否则执行下面的命令 9. 在/etc/modprobe.d/blacklist-nouveau.conf中创建一个文件,其内容如下: vi /etc/modprobe.d/blacklist-nouveau.conf 添加 blacklist nouveau options nouveau modeset=0 10. 重新生成内核initramfs dracut --force 11. 重启系统 reboot 12. 测试 nvidia-smi