分散式存儲系統之Ceph集群RBD基礎使用

  前文我們了解了Ceph集群cephx認證和授權相關話題,回顧請參考//www.cnblogs.com/qiuhom-1874/p/16748149.html;今天我們來聊一聊ceph集群的RBD介面使用相關話題;

  RBD是ceph集群向外提供存儲服務的一種介面,該介面是基於ceph底層存儲集群librados api構建的介面;即RBD是構建在librados之上向外提供存儲服務的;對於客戶端來說RBD主要是將rados集群之上的某個存儲池裡的空間通過librados抽象為一塊或多塊獨立image,這些Image在客戶端看來就是一塊塊硬碟;那對於RBD抽象出來的硬碟,客戶端該怎麼使用呢?

  RBD使用場景

  我們知道Linux主機是由內核空間和用戶空間組成;對於Linux主機來說,要想訪問底層的硬碟設備,通常都是基於內核的驅動將對應磁碟設備識別成一塊存儲空間,然後通過用戶空間程式執行內核空間的函數來實現操作磁碟;即操作磁碟的操作,最終只有內核有許可權;這也意味著,如果一台Linux主機想要使用ceph之上通過RBD抽象出來的磁碟,對應內核空間必須得有一個模組(rbd.ko)能夠驅動RBD抽象的硬碟;即Linux內核里的這個模組就是扮演著RBD的客戶端;除了以上場景我們用Linux主機的內核模組來驅動RBD抽象出來的硬碟之外,還有一種場景;我們知道kvm虛擬機在創建虛擬機的時候可以指定使用多大記憶體,幾顆cpu,使用何種磁碟等等資訊;其中磁碟設備可以是一個文件系統里的一個磁碟鏡像文件虛擬出來的硬碟;即kvm虛擬機是可以通過某種方式來載入我們指定的磁碟鏡像;同樣的邏輯,kvm虛擬機要想使用RBD虛擬出來的磁碟,它該怎麼連入ceph使用RBD磁碟呢?其實很簡單,kvm虛擬機自身並沒有連入ceph的能力,它可以藉助libvirt,libvirt可以通過RBD協議連入ceph集群,將rbd虛擬出的磁碟鏡像供給kvm虛擬機使用;而RBD協議自身就是一個c/s架構,rbd的服務是由librbd提供,它不同於其他服務需要監聽某個套接字,librbd它不需要監聽任何套接字;不同於Linux主機的是,kvm虛擬機是通過libvirt這個工具使用RBD協議連接至ceph集群,而Linux主機使用使用內核模組(RBD)連接ceph集群;很顯然一個是用戶空間程式,一個是內核空間程式;不管是何種客戶端工具以何種方式連入ceph集群使用RBD磁碟;客戶端經由RBD存儲到rados之上的數據,都會先找到對應存儲池,然後被librados切分為等額大小的數據對象,分散的存放在rados的osd對應的磁碟上;這意味著我們需要先在ceph之上創建RBD存儲池,同時對應存儲池需要被初始化為一個RBD存儲才能正常被客戶端使用;

  RBD管理命令

  RBD相關的管理有如image的創建、刪除、修改和列出等基礎CRUD操作,也有分組、鏡像、快照和回收站等相的管理需求,這些均能夠通過rbd命令完成;其命令格式為:rbd [-c ceph.conf] [-m monaddr] [–cluster cluster-name] [-p|–pool pool] [command…]

  1、創建並初始化RBD存儲池

  創建存儲池:ceph osd pool create {pool-name} {pg-num} {pgp-num}

  啟用rbd應用:ceph osd pool application enable {pool-name} rbd

  rbd初始化:rbd pool init -p {pool-name}

[root@ceph-admin ~]# ceph osd pool create ceph-rbdpool 64 64
pool 'ceph-rbdpool' created
[root@ceph-admin ~]# ceph osd pool ls
testpool
rbdpool
.rgw.root
default.rgw.control
default.rgw.meta
default.rgw.log
cephfs-metadatpool
cephfs-datapool
erasurepool
ceph-rbdpool
[root@ceph-admin ~]# ceph osd pool application enable ceph-rbdpool rbd
enabled application 'rbd' on pool 'ceph-rbdpool'
[root@ceph-admin ~]# ceph osd pool ls detail
pool 1 'testpool' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 last_change 153 flags hashpspool stripe_width 0 compression_algorithm zstd compression_max_blob_size 10000000 compression_min_blob_size 10000 compression_mode passive
pool 2 'rbdpool' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 142 lfor 0/140 flags hashpspool,selfmanaged_snaps max_bytes 1024000000 max_objects 50 stripe_width 0 application rbd
        removed_snaps [1~3]
pool 3 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 84 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 4 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 87 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 5 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 89 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 6 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 91 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 7 'cephfs-metadatpool' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 148 flags hashpspool,pool_snaps stripe_width 0 application cephfs
pool 8 'cephfs-datapool' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 99 flags hashpspool stripe_width 0 application cephfs
pool 10 'erasurepool' erasure size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 last_change 130 flags hashpspool stripe_width 8192
pool 11 'ceph-rbdpool' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 184 flags hashpspool stripe_width 0 application rbd

[root@ceph-admin ~]# rbd pool init -p ceph-rbdpool
[root@ceph-admin ~]# 

  2、創建並查看image

  命令格式: rbd create –size <megabytes> –pool <pool-name> <image-name>

[root@ceph-admin ~]# rbd create --size 5G ceph-rbdpool/vol01
[root@ceph-admin ~]# rbd ls -p ceph-rbdpool
vol01
[root@ceph-admin ~]# 

  獲取指定鏡像的詳細資訊

  命令格式:rbd info [–pool <pool>] [–image <image>] [–image-id <image-id>] [–format<format>] [–pretty-format] <image-spec>

[root@ceph-admin ~]# rbd info ceph-rbdpool/vol01
rbd image 'vol01':
        size 5 GiB in 1280 objects
        order 22 (4 MiB objects)
        id: 149196b8b4567
        block_name_prefix: rbd_data.149196b8b4567
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
        op_features: 
        flags: 
        create_timestamp: Tue Oct  4 00:48:18 2022
[root@ceph-admin ~]# rbd info ceph-rbdpool/vol01 --format json --pretty-format
{
    "name": "vol01",
    "id": "149196b8b4567",
    "size": 5368709120,
    "objects": 1280,
    "order": 22,
    "object_size": 4194304,
    "block_name_prefix": "rbd_data.149196b8b4567",
    "format": 2,
    "features": [
        "layering",
        "exclusive-lock",
        "object-map",
        "fast-diff",
        "deep-flatten"
    ],
    "op_features": [],
    "flags": [],
    "create_timestamp": "Tue Oct  4 00:48:18 2022"
}
[root@ceph-admin ~]# 

  提示:size M GiB in N objects,其中M代表image空間大小;N代表將對應M大小的空間分割為多少個對象(分割的數量由條帶大小決定,即單個對象的大小,默認為4M);order 22 (4 MiB objects)是塊大小(條帶)的標識序號,有效範圍為12-25,分別對應著4K-32M之間的大小;22=10+10+2,即2的10次方乘以2的10次方乘以2的2次方,2的10次方位元組就是1024B,即1k,如果order為10,則塊大小為1k,由此邏輯圖推算22就是4M;ID代表當前image的標識符;block_name_prefix表示當前image相關的object的名稱前綴;format表示image的格式,其中2表示v2;features表示當前image啟用的功能特性,其值是一個以逗號分隔的字元串列表,例如layering、exclusive-lock等;op_features表示可選的功能特性;

  image特性

  layering: 是否支援克隆;

  striping: 是否支援數據對象間的數據條帶化;

  exclusive-lock: 是否支援分散式排他鎖機制以限制同時僅能有一個客戶端訪問當前image;

  object-map: 是否支援object點陣圖,主要用於加速導入、導出及已用容量統計等操作,依賴於exclusive-lock特性;

  fast-diff: 是否支援快照間的快速比較操作,依賴於object-map特性;

  deep-flatten: 是否支援克隆分離時解除在克隆image時創建的快照與其父image之間的關聯關係;

  journaling: 是否支援日誌IO,即是否支援記錄image的修改操作至日誌對象;依賴於exclusive-lock特性;

  data-pool: 是否支援將image的數據對象存儲於糾刪碼存儲池,主要用於將image的元數據與數據放置於不同的存儲池;

  管理image

  Image特性管理

  J版本起,image默認支援的特性有layering、exclusive-lock、object-map、fast-diff和deep-flatten五個;我們可以使用 rbd create命令的–feature選項支援創建時自定義支援的特性;現有image的特性可以使用rbd feature enable或rbd feature disable命令修改;

[root@ceph-admin ~]# rbd feature disable ceph-rbdpool/vol01 object-map fast-diff deep-flatten   
[root@ceph-admin ~]# rbd info ceph-rbdpool/vol01
rbd image 'vol01':
        size 5 GiB in 1280 objects
        order 22 (4 MiB objects)
        id: 149196b8b4567
        block_name_prefix: rbd_data.149196b8b4567
        format: 2
        features: layering, exclusive-lock
        op_features: 
        flags: 
        create_timestamp: Tue Oct  4 00:48:18 2022
[root@ceph-admin ~]# 

  提示:如果是使用Linux內核模組rbd來連入ceph集群使用RBD虛擬出來的磁碟,對應object-map fast-diff deep-flatten這三個特性在Linux上是不支援的,所以我們需要將其禁用掉;  

  使用Linux內核rbd模組連入ceph集群使用RBD磁碟

  1、在客戶端主機上安裝ceph-common程式

[root@ceph-admin ~]# yum install -y ceph-common
Loaded plugins: fastestmirror
Repository epel is listed more than once in the configuration
Repository epel-debuginfo is listed more than once in the configuration
Repository epel-source is listed more than once in the configuration
Determining fastest mirrors
 * base: mirrors.aliyun.com
 * extras: mirrors.aliyun.com
 * updates: mirrors.aliyun.com
Ceph                                                                       | 1.5 kB  00:00:00     
Ceph-noarch                                                                | 1.5 kB  00:00:00     
base                                                                       | 3.6 kB  00:00:00     
ceph-source                                                                | 1.5 kB  00:00:00     
epel                                                                       | 4.7 kB  00:00:00     
extras                                                                     | 2.9 kB  00:00:00     
updates                                                                    | 2.9 kB  00:00:00     
(1/2): epel/x86_64/updateinfo                                              | 1.0 MB  00:00:08     
(2/2): epel/x86_64/primary_db                                              | 7.0 MB  00:00:52     
Package 2:ceph-common-13.2.10-0.el7.x86_64 already installed and latest version
Nothing to do
[root@ceph-admin ~]# 

  提示:安裝上述程式包,需要先配置好ceph和epel源;

  2、在ceph集群上創建客戶端用戶用於連入ceph集群,並授權

[root@ceph-admin ~]# ceph auth get-or-create client.test mon 'allow r' osd 'allow * pool=ceph-rbdpool'
[client.test]
        key = AQB0Gztj63xwGhAAq7JFXnK2mQjBfhq0/kB5uA==
[root@ceph-admin ~]# ceph auth get client.test
exported keyring for client.test
[client.test]
        key = AQB0Gztj63xwGhAAq7JFXnK2mQjBfhq0/kB5uA==
        caps mon = "allow r"
        caps osd = "allow * pool=ceph-rbdpool"
[root@ceph-admin ~]# 

  提示:對於rbd客戶端來說,要想連入ceph集群,首先它對mon需要有對的許可權,其次要想在osd之上存儲數據,可以授權為*,表示可讀可寫,但需要限定在對應存儲池之上;

  導出client.test用戶的keyring文件,並傳給客戶端

[root@ceph-admin ~]# ceph --user test -s
2022-10-04 01:31:24.776 7faddac3e700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.test.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2022-10-04 01:31:24.776 7faddac3e700 -1 monclient: ERROR: missing keyring, cannot use cephx for authentication
[errno 2] error connecting to the cluster
[root@ceph-admin ~]# ceph auth get client.test
exported keyring for client.test
[client.test]
        key = AQB0Gztj63xwGhAAq7JFXnK2mQjBfhq0/kB5uA==
        caps mon = "allow r"
        caps osd = "allow * pool=ceph-rbdpool"
[root@ceph-admin ~]# ceph auth get client.test -o /etc/ceph/ceph.client.test.keyring
exported keyring for client.test
[root@ceph-admin ~]# ceph --user test -s      
  cluster:
    id:     7fd4a619-9767-4b46-9cee-78b9dfe88f34
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
    mgr: ceph-mgr01(active), standbys: ceph-mon01, ceph-mgr02
    mds: cephfs-1/1/1 up  {0=ceph-mon02=up:active}
    osd: 10 osds: 10 up, 10 in
    rgw: 1 daemon active
 
  data:
    pools:   10 pools, 464 pgs
    objects: 250  objects, 3.8 KiB
    usage:   10 GiB used, 890 GiB / 900 GiB avail
    pgs:     464 active+clean
 
[root@ceph-admin ~]# 

  提示:這裡需要說明一下,我這裡是用admin host主機來充當客戶端來使用,本地/etc/ceph/目錄下保存的以後集群的配置文件;所以客戶端主機上必須要有對應授權keyring文件,以及集群配置文件才能正常連入ceph集群;如果我們在客戶端主機上能夠使用ceph -s 命令指定對應用戶能夠查看到集群狀態,說明對應keyring和配置文件是沒有問題的;

  3、客戶端映射image

[root@ceph-admin ~]# fdisk -l

Disk /dev/sda: 53.7 GB, 53687091200 bytes, 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000a7984

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     1050623      524288   83  Linux
/dev/sda2         1050624   104857599    51903488   8e  Linux LVM

Disk /dev/mapper/centos-root: 52.1 GB, 52072284160 bytes, 101703680 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/centos-swap: 1073 MB, 1073741824 bytes, 2097152 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

[root@ceph-admin ~]# rbd map --user test ceph-rbdpool/vol01
/dev/rbd0
[root@ceph-admin ~]# fdisk -l

Disk /dev/sda: 53.7 GB, 53687091200 bytes, 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000a7984

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     1050623      524288   83  Linux
/dev/sda2         1050624   104857599    51903488   8e  Linux LVM

Disk /dev/mapper/centos-root: 52.1 GB, 52072284160 bytes, 101703680 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/centos-swap: 1073 MB, 1073741824 bytes, 2097152 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/rbd0: 5368 MB, 5368709120 bytes, 10485760 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4194304 bytes / 4194304 bytes

[root@ceph-admin ~]# 

  提示:我們使用rbd map指定用戶,指定存儲池和image即可連入ceph集群,將指定存儲池裡的image映射為本地的磁碟設備;

  查看映射的image

[root@ceph-admin ~]# rbd showmapped
id pool         image snap device    
0  ceph-rbdpool vol01 -    /dev/rbd0 
[root@ceph-admin ~]#

  提示:這種手動命令行連入ceph的方式,一旦客戶端重啟,對應連接就斷開了,所以如果我們需要開機自動連入ceph集群使用rbd磁碟,我們還需要將對應資訊寫進/etc/rc.d/rc.local文件中,並給該文件加上可執行許可權即可;

  手動斷開映射

[root@ceph-admin ~]# rbd unmap ceph-rbdpool/vol01
[root@ceph-admin ~]# rbd showmapped     
[root@ceph-admin ~]# 

  調整image的大小

  命令格式:rbd resize [–pool <pool>] [–image <image>] –size <size> [–allow-shrink] [–no-progress] <image-spec>

  增大空間:rbd resize [–pool <pool>] [–image <image>] –size <size>

  減少空間:rbd resize [–pool <pool>] [–image <image>] –size <size> [–allow-shrink]

[root@ceph-admin ~]# rbd create --size 2G ceph-rbdpool/vol02
[root@ceph-admin ~]# rbd ls  -p ceph-rbdpool                
vol01
vol02
[root@ceph-admin ~]# rbd ls  -p ceph-rbdpool -l
NAME   SIZE PARENT FMT PROT LOCK 
vol01 5 GiB          2           
vol02 2 GiB          2           
[root@ceph-admin ~]# rbd resize --size 10G ceph-rbdpool/vol02
Resizing image: 100% complete...done.
[root@ceph-admin ~]# rbd ls  -p ceph-rbdpool -l              
NAME    SIZE PARENT FMT PROT LOCK 
vol01  5 GiB          2           
vol02 10 GiB          2           
[root@ceph-admin ~]# rbd resize --size 8G ceph-rbdpool/vol02  
Resizing image: 0% complete...failed.
rbd: shrinking an image is only allowed with the --allow-shrink flag
[root@ceph-admin ~]# rbd resize --size 8G ceph-rbdpool/vol02 --allow-shrink
Resizing image: 100% complete...done.
[root@ceph-admin ~]# rbd ls  -p ceph-rbdpool -l             
NAME   SIZE PARENT FMT PROT LOCK 
vol01 5 GiB          2           
vol02 8 GiB          2           
[root@ceph-admin ~]# 

  提示:縮減空間大小不能少見到小於已用空間大小;

  刪除image

  命令格式:rbd remove [–pool <pool>] [–image <image>] [–no-progress] <image-spec>

[root@ceph-admin ~]# rbd ls  -p ceph-rbdpool -l
NAME   SIZE PARENT FMT PROT LOCK 
vol01 5 GiB          2           
vol02 8 GiB          2           
[root@ceph-admin ~]# rbd rm ceph-rbdpool/vol02
Removing image: 100% complete...done.
[root@ceph-admin ~]# rbd ls  -p ceph-rbdpool -l
NAME   SIZE PARENT FMT PROT LOCK 
vol01 5 GiB          2           
[root@ceph-admin ~]# 

  提示:這種方式刪除image以後,對應鏡像就真的被刪除了,如果有數據想恢復就不行了;所以這種方式不推薦;RBD提供回收站的功能,我們可以先將要刪除的image移入回收站,如果確實不要了,可以再從回收站刪除即可;

  將image移入回收站

[root@ceph-admin ~]# rbd ls  -p ceph-rbdpool -l
NAME   SIZE PARENT FMT PROT LOCK 
vol01 5 GiB          2           
[root@ceph-admin ~]# rbd trash mv ceph-rbdpool/vol01 
[root@ceph-admin ~]# rbd trash ls ceph-rbdpool
149196b8b4567 vol01
[root@ceph-admin ~]# rbd ls  -p ceph-rbdpool -l      
[root@ceph-admin ~]# 

  將image從回收站刪除

[root@ceph-admin ~]# rbd trash ls ceph-rbdpool                    
149196b8b4567 vol01
149e26b8b4567 vol02
[root@ceph-admin ~]# rbd trash rm --pool ceph-rbdpool --image-id 149e26b8b4567
Removing image: 100% complete...done.
[root@ceph-admin ~]# rbd trash ls ceph-rbdpool
149196b8b4567 vol01
[root@ceph-admin ~]# 

  提示:上述命令是刪除回收站里指定image,如果想要清空回收站直接使用rbd trash purge 指定存儲池,表示清空回收站指定存儲池裡的鏡像;

  將image從回收站恢復(移回到原有存儲池)

[root@ceph-admin ~]# rbd ls -p ceph-rbdpool -l                              
[root@ceph-admin ~]# rbd trash ls ceph-rbdpool                              
149196b8b4567 vol01
[root@ceph-admin ~]# rbd trash restore --pool ceph-rbdpool --image-id 149196b8b4567
[root@ceph-admin ~]# rbd trash ls ceph-rbdpool
[root@ceph-admin ~]# rbd ls -p ceph-rbdpool -l
NAME   SIZE PARENT FMT PROT LOCK 
vol01 5 GiB          2           
[root@ceph-admin ~]#

  image快照

  什麼是快照呢?所謂快照我們可以理解為一種數據備份手段;做快照之所以快,是因為我們在做快照時不需要複製數據,只有數據發生改變時,對應快照才會把我們需要修改的數據複製到現有快照之上,然後再做修改;修改過後的數據會被保存到現有快照之上;對於原卷上的數據,是不會發生變化的;我們在讀取沒有發生變化的數據,是直接到原卷上讀取;簡單講快照就是給原卷做了一層可寫層,把原卷的內容保護起來,修改數據時,從原卷複製數據到快照,然後再修改,修改後的數據直接保存到快照,所以我們讀取修改後的數據是直接讀取快照上的數據;未修改的數據,訪問還是直接到原卷訪問;這也是為什麼快照很快和快照比原卷小的原因;

  創建快照命令格式: rbd snap create [–pool <pool>] –image <image> –snap <snap> 或者rbd snap create [<pool-name>/]<image-name>@<snapshot-name>

[root@ceph-admin ~]# mount /dev/rbd0 /mnt
[root@ceph-admin ~]# cd /mnt
[root@ceph-admin mnt]# ls
[root@ceph-admin mnt]# echo "hello ceph" >>test.txt
[root@ceph-admin mnt]# ls
test.txt
[root@ceph-admin mnt]# cat test.txt 
hello ceph
[root@ceph-admin mnt]# rbd snap create ceph-rbdpool/vol01@vol01-snap
[root@ceph-admin mnt]# rbd snap list ceph-rbdpool/vol01
SNAPID NAME        SIZE TIMESTAMP                
     4 vol01-snap 5 GiB Tue Oct  4 23:26:09 2022 
[root@ceph-admin mnt]# 

  提示:我們把ceph上ceph-rbdpool存儲池裡的vol01映射到本地當作硬碟使用,格式化分區以後,將對應磁碟掛載到/mnt下,然後在/mnt下新建了一個test.txt的文件,然後在管理端給對應存儲池裡的image做了一個快照;這裡需要注意的是在創建映像快照之前應停止image上的IO操作,且image上存在文件系統時,還要確保其處於一致狀態;

  在客戶端上刪除數據,並卸載磁碟和磁碟映射

[root@ceph-admin mnt]# ls
test.txt
[root@ceph-admin mnt]# cat test.txt 
hello ceph
[root@ceph-admin mnt]# rm -rf test.txt
[root@ceph-admin mnt]# ls
[root@ceph-admin mnt]# cd 
[root@ceph-admin ~]# umount /mnt
[root@ceph-admin ~]# rbd unmap /dev/rbd0
[root@ceph-admin ~]# rbd showmapped
[root@ceph-admin ~]# 

  回滾快照

  命令格式:rbd snap rollback [–pool <pool>] –image <image> –snap <snap> [–no-progress];

[root@ceph-admin ~]# rbd snap list ceph-rbdpool/vol01
SNAPID NAME        SIZE TIMESTAMP                
     4 vol01-snap 5 GiB Tue Oct  4 23:26:09 2022 
[root@ceph-admin ~]# rbd snap rollback ceph-rbdpool/vol01@vol01-snap
Rolling back to snapshot: 100% complete...done.
[root@ceph-admin ~]# 

  提示:這裡需要注意將映像回滾到快照意味著會使用快照中的數據重寫當前版本的image,而且執行回滾所需的時間將隨映像大小的增加而延長;

  在客戶端映射image,並掛載磁碟,看看對應數據是否恢復?

[root@ceph-admin ~]# rbd map --user test ceph-rbdpool/vol01
/dev/rbd0
[root@ceph-admin ~]# mount /dev/rbd0 /mnt
[root@ceph-admin ~]# cd /mnt
[root@ceph-admin mnt]# ls
test.txt
[root@ceph-admin mnt]#cat test.txt 
hello ceph
[root@ceph-admin mnt]#

  提示:可以看到現在再次掛載上磁碟,被刪除的數據就被找回來了;

  限制快照數量

  命令格式:rbd snap limit set [–pool <pool>] [–image <image>] [–limit <limit>]

  解除限制:rbd snap limit clear [–pool <pool>] [–image <image>]

[root@ceph-admin ~]# rbd snap limit set ceph-rbdpool/vol01 --limit 3
[root@ceph-admin ~]# rbd snap limit set ceph-rbdpool/vol01 --limit 5
[root@ceph-admin ~]# rbd snap limit clear ceph-rbdpool/vol01
[root@ceph-admin ~]# 

  提示:修改限制直接重新設置新的限制即可;

  刪除快照

  命令格式:rbd snap rm [–pool <pool>] [–image <image>] [–snap <snap>] [–no-progress] [–force]

[root@ceph-admin ~]# rbd snap list ceph-rbdpool/vol01
SNAPID NAME        SIZE TIMESTAMP                
     4 vol01-snap 5 GiB Tue Oct  4 23:26:09 2022 
[root@ceph-admin ~]# rbd snap rm ceph-rbdpool/vol01@vol01-snap
Removing snap: 100% complete...done.
[root@ceph-admin ~]# rbd snap list ceph-rbdpool/vol01         
[root@ceph-admin ~]#

  提示:Ceph OSD會以非同步方式刪除數據,因此刪除快照並不能立即釋放磁碟空間;

  清理快照:刪除一個image的所有快照,可以使用rbd snap purge命令,格式如下

  命令格式: rbd snap purge [–pool <pool>] –image <image> [–no-progress]

[root@ceph-admin ~]# rbd snap create ceph-rbdpool/vol01@vol01-snap
[root@ceph-admin ~]# rbd snap create ceph-rbdpool/vol01@vol01-snap2
[root@ceph-admin ~]# rbd snap create ceph-rbdpool/vol01@vol01-snap3
[root@ceph-admin ~]# rbd snap list ceph-rbdpool/vol01             
SNAPID NAME         SIZE TIMESTAMP                
     6 vol01-snap  5 GiB Tue Oct  4 23:43:22 2022 
     7 vol01-snap2 5 GiB Tue Oct  4 23:43:30 2022 
     8 vol01-snap3 5 GiB Tue Oct  4 23:43:32 2022 
[root@ceph-admin ~]# rbd snap purge ceph-rbdpool/vol01
Removing all snapshots: 100% complete...done.
[root@ceph-admin ~]# rbd snap list ceph-rbdpool/vol01 
[root@ceph-admin ~]# 

  快照分層

  Ceph支援在一個塊設備快照的基礎上創建一到多個COW或COR(Copy-On-Read)類型的克隆,這種中間快照層(snapshot layering)機制提了一種極速創建image的方式;用戶可以創建一個基礎image並為其創建一個只讀快照層,而後可以在此快照層上創建任意個克隆進行讀寫操作,甚至能夠進行多級克隆;例如,實踐中可以為Qemu虛擬機創建一個image並安裝好基礎作業系統環境作為模板,對其創建創建快照層後,便可按需創建任意多個克隆作為image提供給多個不同的VM(虛擬機)使用,或者每創建一個克隆後進行按需修改,而後對其再次創建下游的克隆;通過克隆生成的image在其功能上與直接創建的image幾乎完全相同,它同樣支援讀、寫、克隆、空間擴縮容等功能,惟一的不同之處是克隆引用了一個只讀的上游快照,而且此快照必須要置於「保護」模式之下;COW是為默認的類型,僅在數據首次寫入時才需要將它複製到克隆的image中;COR則是在數據首次被讀取時複製到當前克隆中,隨後的讀寫操作都將直接基於此克隆中的對象進行;有點類似虛擬機的鏈接克隆和完全克隆;

 

 

  在RBD上使用分層克隆的方法非常簡單:創建一個image,對image創建一個快照並將其置入保護模式,而克隆此快照即可;創建克隆的image時,需要指定引用的存儲池、鏡像和鏡像快照,以及克隆的目標image的存儲池和鏡像名稱,因此,克隆鏡像支援跨存儲池進行;

  快照保護命令格式:rbd snap protect [–pool <pool>] –image <image> –snap <snap>

[root@ceph-admin ~]# rbd snap create ceph-rbdpool/vol01@vol01-snap3
[root@ceph-admin ~]# rbd snap list ceph-rbdpool/vol01              
SNAPID NAME         SIZE TIMESTAMP                
    12 vol01-snap3 5 GiB Tue Oct  4 23:49:25 2022 
[root@ceph-admin ~]# rbd snap protect ceph-rbdpool/vol01@vol01-snap3
[root@ceph-admin ~]#

  克隆快照

  命令格式:rbd clone [–pool <pool>] –image <image> –snap <snap> –dest-pool <dest-pool> [–dest <dest> 或者rbd clone [<pool-name>/]<image-name>@<snapshot-name> [<pool-name>/]<image-name>

[root@ceph-admin ~]# rbd clone ceph-rbdpool/vol01@vol01-snap3 ceph-rbdpool/image1
[root@ceph-admin ~]# rbd ls ceph-rbdpool
image1
vol01
[root@ceph-admin ~]# rbd ls ceph-rbdpool -l
NAME               SIZE PARENT                         FMT PROT LOCK 
image1            5 GiB ceph-rbdpool/vol01@vol01-snap3   2           
vol01             5 GiB                                  2           
vol01@vol01-snap3 5 GiB                                  2 yes       
[root@ceph-admin ~]# 

  提示:克隆快照,最終生成的是對應存儲池裡的image;所以我們需要指定對應目標的存儲池和image名稱;這裡需要注意克隆快照,前提是快照需做保護,否則不予被克隆;

  列出快照的子項

  命令格式:rbd children [–pool <pool>] –image <image> –snap <snap>

[root@ceph-admin ~]# rbd snap list ceph-rbdpool/vol01
SNAPID NAME         SIZE TIMESTAMP                
    12 vol01-snap3 5 GiB Tue Oct  4 23:49:25 2022 
[root@ceph-admin ~]# rbd children ceph-rbdpool/vol01@vol01-snap3
ceph-rbdpool/image1
[root@ceph-admin ~]# 

  展平克隆的image

  克隆的映像會保留對父快照的引用,刪除子克隆對父快照的引用時,可通過將資訊從快照複製到克隆,進行image的「展平」操作;展平克隆所需的時間隨著映像大小的增加而延長;要刪除某擁有克隆子項的快照,必須先平展其子image;命令格式: rbd flatten [–pool <pool>] –image <image> –no-progress

[root@ceph-admin ~]# rbd ls ceph-rbdpool -l         
NAME               SIZE PARENT                         FMT PROT LOCK 
image1            5 GiB ceph-rbdpool/vol01@vol01-snap3   2           
vol01             5 GiB                                  2           
vol01@vol01-snap3 5 GiB                                  2 yes       
[root@ceph-admin ~]# rbd flatten ceph-rbdpool/image1 
Image flatten: 100% complete...done.
[root@ceph-admin ~]# rbd ls ceph-rbdpool -l          
NAME               SIZE PARENT FMT PROT LOCK 
image1            5 GiB          2           
vol01             5 GiB          2           
vol01@vol01-snap3 5 GiB          2 yes       
[root@ceph-admin ~]# rbd children ceph-rbdpool/vol01@vol01-snap3
[root@ceph-admin ~]# 

  提示:可以看到我們把子鏡像進行展品操作以後,對父鏡像的資訊就沒了;查看對應快照的子項也沒有了;這意味著現在展品的子鏡像是一個獨立的鏡像;

  取消快照保護

  命令格式:命令:rbd snap unprotect [–pool <pool>] –image <image> –snap <snap>

[root@ceph-admin ~]# rbd snap list ceph-rbdpool/vol01               
SNAPID NAME         SIZE TIMESTAMP                
    12 vol01-snap3 5 GiB Tue Oct  4 23:49:25 2022 
[root@ceph-admin ~]# rbd snap rm ceph-rbdpool/vol01@vol01-snap3
Removing snap: 0% complete...failed.
rbd: snapshot 'vol01-snap3' is protected from removal.
2022-10-05 00:05:04.059 7f5e35e95840 -1 librbd::Operations: snapshot is protected
[root@ceph-admin ~]# rbd snap unprotect ceph-rbdpool/vol01@vol01-snap3
[root@ceph-admin ~]# rbd snap rm ceph-rbdpool/vol01@vol01-snap3       
Removing snap: 100% complete...done.
[root@ceph-admin ~]# rbd snap list ceph-rbdpool/vol01                 
[root@ceph-admin ~]#

  提示:被保護的快照是不能被刪除的必須先取消保護快照,然後才能刪除它;其次用戶無法刪除克隆所引用的快照,需要先平展其每個克隆,然後才能刪除快照;

  KVM使用rbd image

  1、準備kvm環境,在kvm宿主機上除了安裝ceph-common和epel源之外,我們還需要安裝安裝libvirt與qemu-kvm;

yum install qemu-kvm qemu-kvm-tools libvirt virt-manager virt-install

  提示:我這裡直接使用adminhost來做kvm宿主機,所以ceph-common、epel源都是準備好的,只需安裝qemu-kvm qemu-kvm-tools libvirt virt-manager virt-install即可;

  2、啟動libvirtd守護進程

[root@ceph-admin ~]# systemctl start libvirtd    
[root@ceph-admin ~]# systemctl status libvirtd -l
● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2022-10-05 01:27:34 CST; 1min 32s ago
     Docs: man:libvirtd(8)
           //libvirt.org
 Main PID: 940 (libvirtd)
    Tasks: 19 (limit: 32768)
   CGroup: /system.slice/libvirtd.service
           ├─ 940 /usr/sbin/libvirtd
           ├─1237 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
           └─1238 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper

Oct 05 01:27:35 ceph-admin.ilinux.io dnsmasq[1232]: listening on virbr0(#3): 192.168.122.1
Oct 05 01:27:35 ceph-admin.ilinux.io dnsmasq[1237]: started, version 2.76 cachesize 150
Oct 05 01:27:35 ceph-admin.ilinux.io dnsmasq[1237]: compile time options: IPv6 GNU-getopt DBus no-i18n IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth nettlehash no-DNSSEC loop-detect inotify
Oct 05 01:27:35 ceph-admin.ilinux.io dnsmasq-dhcp[1237]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h
Oct 05 01:27:35 ceph-admin.ilinux.io dnsmasq-dhcp[1237]: DHCP, sockets bound exclusively to interface virbr0
Oct 05 01:27:35 ceph-admin.ilinux.io dnsmasq[1237]: reading /etc/resolv.conf
Oct 05 01:27:35 ceph-admin.ilinux.io dnsmasq[1237]: using nameserver 192.168.0.1#53
Oct 05 01:27:35 ceph-admin.ilinux.io dnsmasq[1237]: read /etc/hosts - 15 addresses
Oct 05 01:27:35 ceph-admin.ilinux.io dnsmasq[1237]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
Oct 05 01:27:35 ceph-admin.ilinux.io dnsmasq-dhcp[1237]: read /var/lib/libvirt/dnsmasq/default.hostsfile
[root@ceph-admin ~]# 

  提示:啟動libvirtd之前,需要先確定對應主機是否啟用了kvm模組,其次對應主機的cpu是否開啟了虛擬化;或者看上面的啟動日誌里是否有錯誤,沒有錯誤說明libvirtd正常工作;

  3、在ceph集群上授權相關用戶帳號

[root@ceph-admin ~]# ceph auth get-or-create client.libvirt mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=ceph-rbdpool'
[client.libvirt]
        key = AQBIXTxjpeYoAhAAw/ZMROyxd3E0b8i3xlOkgw==
[root@ceph-admin ~]# ceph auth get client.libvirt
exported keyring for client.libvirt
[client.libvirt]
        key = AQBIXTxjpeYoAhAAw/ZMROyxd3E0b8i3xlOkgw==
        caps mon = "allow r"
        caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=ceph-rbdpool"
[root@ceph-admin ~]# 

  4、將client.libvirt用戶資訊導入為libvirtd上的一個secret

    4.1、先創建一個xml文件

[root@ceph-admin ~]# cat client.libvirt-secret.xml
<secret ephemeral='no' private='no'>
        <usage type='ceph'>
                <name>client.libvirt secret</name>
        </usage>
</secret>
[root@ceph-admin ~]# 

    4.2、用virsh命令創建此secret,命令會返回創建的secret的UUID

[root@ceph-admin ~]# virsh secret-define --file client.libvirt-secret.xml 
Secret 92897e61-5935-43ad-abd6-9f97a5652f05 created

[root@ceph-admin ~]# 

  提示:上述步驟是生成一個libvirt在ceph之上用於認證時存儲密鑰的secret;裡面只包含了類型為ceph和secret的說明;

  5、將ceph的client.libvirt的密鑰導入到剛創建的secret

[root@ceph-admin ~]# virsh secret-set-value --secret 92897e61-5935-43ad-abd6-9f97a5652f05 --base64 $(ceph auth get-key client.libvirt) 
Secret value set

[root@ceph-admin ~]# virsh secret-get-value --secret 92897e61-5935-43ad-abd6-9f97a5652f05 
AQBIXTxjpeYoAhAAw/ZMROyxd3E0b8i3xlOkgw==
[root@ceph-admin ~]# ceph auth print-key client.libvirt
AQBIXTxjpeYoAhAAw/ZMROyxd3E0b8i3xlOkgw==[root@ceph-admin ~]#

  提示:上述步驟是將ceph授權的用戶密鑰和secret做綁定,並生成一個在libvirt中用於在ceph之上認證的screct,即libvirt拿著這個screct到ceph集群上做認證;這裡面就包含cpeh授權的帳號密碼資訊;

  6、準備image

[root@ceph-admin ~]# ls 
CentOS-7-x86_64-Minimal-1708.iso  client.abc.keyring            client.libvirt-secret.xml
ceph-deploy-ceph.log              client.admin.cluster.keyring  client.test.keyring
centos7.xml                        client.admin.keyring          client.usera.keyring
[root@ceph-admin ~]# rbd ls ceph-rbdpool
image1
test
vol01
[root@ceph-admin ~]# rbd import ./CentOS-7-x86_64-Minimal-1708.iso ceph-rbdpool/centos7
Importing image: 100% complete...done.
[root@ceph-admin ~]# rbd ls ceph-rbdpool -l
NAME                 SIZE PARENT                         FMT PROT LOCK 
centos7           792 MiB                                  2           
image1              5 GiB                                  2           
test                5 GiB ceph-rbdpool/vol01@vol01-snap3   2           
vol01               5 GiB                                  2           
vol01@vol01-snap3   5 GiB                                  2 yes       
[root@ceph-admin ~]# 

  提示:我這裡是為了方便測試,直接將centos7導入的ceph-rbdpool存儲池裡;

  7、創建VM

[root@ceph-admin ~]# cat centos7.xml 
<domain type='kvm'>
        <name>centos7</name>
        <memory>131072</memory>
        <currentMemory unit='KiB'>65536</currentMemory>
        <vcpu>1</vcpu>
        <os>
                <type arch='x86_64'>hvm</type>
        </os>
        <clock sync="localtime"/>
        <devices>
                <emulator>/usr/libexec/qemu-kvm</emulator>
                <disk type='network' device='disk'>
                        <source protocol='rbd' name='ceph-rbdpool/centos7'>
                                <host name='192.168.0.71' port='6789'/>
                        </source>
                        <auth username='libvirt'>
                                <secret type='ceph' uuid='92897e61-5935-43ad-abd6-9f97a5652f05'/>
                        </auth>
                        <target dev='vda' bus='virtio'/>
                </disk>
                <interface type='network'>
                        <mac address='52:54:00:25:c2:45'/>
                        <source network='default'/>
                        <model type='virtio'/>
                </interface>
                <serial type='pty'>
                        <target type='isa-serial' port='0'>
                                <model name='isa-serial'/>
                        </target>
                </serial>
                <console type='pty'>
                        <target type='virtio' port='0'/>
                </console>
                <graphics type='vnc' port='-1' autoport='yes'>
                        <listen type='address' address='0.0.0.0'/>
                </graphics>
        </devices>
</domain>
[root@ceph-admin ~]# 

  提示:上述是創建VM的配置文件,我們在裡面定義好磁碟設備相關資訊和其他資訊就可以根據這個配置文件創建一個符合我們定義在配置文件中內容的VM;

  創建虛擬機

[root@ceph-admin ~]# virsh define centos7.xml 
Domain centos7 defined from centos7.xml

[root@ceph-admin ~]# 

  查看虛擬機

[root@ceph-admin ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     centos7                        shut off

[root@ceph-admin ~]# 

  啟動虛擬機

[root@ceph-admin ~]# virsh start centos7
Domain centos7 started

[root@ceph-admin ~]# virsh list --all   
 Id    Name                           State
----------------------------------------------------
 2     centos7                        running

[root@ceph-admin ~]# 

  查看虛擬機磁碟

[root@ceph-admin ~]# virsh domblklist centos7
Target     Source
------------------------------------------------
vda        ceph-rbdpool/centos7

[root@ceph-admin ~]# 

  提示:這裡可以看到對應虛擬機的磁碟已經成功載入;

  查看kvm宿主機埠,看看對應vnc埠是否監聽?

[root@ceph-admin ~]# ss -tnl
State      Recv-Q Send-Q                    Local Address:Port                                   Peer Address:Port              
LISTEN     0      5                         192.168.122.1:53                                                *:*                  
LISTEN     0      128                                   *:22                                                *:*                  
LISTEN     0      100                           127.0.0.1:25                                                *:*                  
LISTEN     0      1                                     *:5900                                              *:*                  
LISTEN     0      128                                   *:111                                               *:*                  
LISTEN     0      128                                [::]:22                                             [::]:*                  
LISTEN     0      100                               [::1]:25                                             [::]:*                  
LISTEN     0      128                                [::]:111                                            [::]:*                  
[root@ceph-admin ~]# 

  提示:vnc監聽在宿主機的5900埠,如果有多台虛擬機都啟用了vnc,那麼對應第二台虛擬機就監聽在5901埠,依次類推;

  連接kvm宿主機的vnc埠,看看對應虛擬機啟動情況

  提示:可以看到我們剛才導入ceph-rbdpool的鏡像,現在已被kvm虛擬機正常載入,並讀取到內容;ok,到此kvm使用rbd image的測試就完成了;