DEVICESCAN failed: aborted matching pattern /dev/discs/disc*
- 2020 年 3 月 31 日
- 筆記
DEVICESCAN failed: glob(3) aborted matching pattern /dev/discs/disc*
問題描述
在實際環境中,發現一個報錯如下
Mar 28 10:09:36 localhost smartd[9865]: DEVICESCAN failed: glob(3) aborted matching pattern /dev/discs/disc*
該問題出現概率不大,但是由於基數大了,還是會不時就會出現,讓你想忽略都沒法忽略。今天之前一直以為該問題是hardware
導致,如硬盤問題等。但是今天做了一版for PXE的ramdisk OS,竟然可以在原本正常機器上百分百複製該現象,故懷疑該問題另有緣由。
smartd(8)
smartd is a daemon that monitors the Self-Monitoring, Analysis and Reporting Technology (SMART) system built into many ATA-3 and later ATA, IDE and SCSI-3 hard drives. The purpose of SMART is to monitor the reliability of the hard drive and predict drive failures, and to carry out different types of drive self-tests. This version of smartd is compatible with ATA/ATAPI-7 and earlier standards. smartd will attempt to enable SMART monitoring on ATA devices (equivalent to smartctl -s on) and polls these and SCSI devices every 30 minutes (configurable), logging SMART errors and changes of SMART Attributes via the SYSLOG interface. The default location for these SYSLOG notifications and warnings is /var/log/messages.
分析
message 信息如下,可知smartd服務開啟時,未掃描到硬盤,之後才初始化硬盤。
Mar 28 10:09:36 localhost smartd[9865]: Configuration file /etc/smartmontools/smartd.conf was parsed, found DEVICESCAN, scanning devices Mar 28 10:09:36 localhost smartd[9865]: DEVICESCAN failed: glob(3) aborted matching pattern /dev/discs/disc* Mar 28 10:09:36 localhost smartd[9865]: In the system's table of devices NO devices found to scan Mar 28 10:09:36 localhost smartd[9865]: Monitoring 0 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices ... Mar 28 10:09:38 localhost kernel: scsi 0:0:0:0: Attached scsi generic sg0 type 0 Mar 28 10:09:38 localhost kernel: AMD64 EDAC driver v3.4.0 Mar 28 10:09:38 localhost kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Mar 28 10:09:38 localhost kernel: ata1.00: Enabling discard_zeroes_data Mar 28 10:09:38 localhost kernel: sd 0:0:0:0: [sda] 937703088 512-byte logical blocks: (480 GB/447 GiB) Mar 28 10:09:38 localhost kernel: sd 0:0:0:0: [sda] 4096-byte physical blocks Mar 28 10:09:38 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off Mar 28 10:09:38 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Mar 28 10:09:38 localhost kernel: ata1.00: Enabling discard_zeroes_data Mar 28 10:09:38 localhost kernel: ata1.00: Enabling discard_zeroes_data Mar 28 10:09:38 localhost kernel: sd 0:0:0:0: [sda] Attached SCSI removable disk
開機後查看smartd服務狀態和硬盤狀態如下,系統登入後該服務依舊未能監控硬盤
[root@localhost ~]# systemctl status smartd.service ● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon Loaded: loaded (/usr/lib/systemd/system/smartd.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2020-03-28 10:24:35 CST; 45min ago Docs: man:smartd(8) man:smartd.conf(5) Main PID: 8613 (smartd) CGroup: /system.slice/smartd.service └─8613 /usr/sbin/smartd -n -q never Mar 28 10:24:35 localhost.localdomain systemd[1]: Started Self Monitoring and Reporting Technology (SMART) Daemon. Mar 28 10:24:36 localhost.localdomain smartd[8613]: smartd 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-957.el7.x86_64] (local build) Mar 28 10:24:36 localhost.localdomain smartd[8613]: Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org Mar 28 10:24:36 localhost.localdomain smartd[8613]: Opened configuration file /etc/smartmontools/smartd.conf Mar 28 10:24:36 localhost.localdomain smartd[8613]: Configuration file /etc/smartmontools/smartd.conf was parsed, found DEVICESCAN, scanning devices Mar 28 10:24:36 localhost.localdomain smartd[8613]: DEVICESCAN failed: glob(3) aborted matching pattern /dev/discs/disc* Mar 28 10:24:36 localhost.localdomain smartd[8613]: In the system's table of devices NO devices found to scan Mar 28 10:24:36 localhost.localdomain smartd[8613]: Monitoring 0 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices [root@localhost ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 1 447.1G 0 disk
嘗試修復
為修復此問題,此處嘗試延後smartd,使smartd在初始化硬盤之後啟動。 原始 smartd.service 配置文件如下
[root@server ~]# systemctl cat smartd.service [Unit] Description=Self Monitoring and Reporting Technology (SMART) Daemon Documentation=man:smartd(8) man:smartd.conf(5) After=syslog.target [Service] EnvironmentFile=-/etc/sysconfig/smartmontools ExecStart=/usr/sbin/smartd -n $smartd_opts ExecReload=/bin/kill -HUP $MAINPID StandardOutput=syslog [Install] WantedBy=multi-user.target [root@localhost FX2010700017L]#
為保證 smartd.service 在初始化硬盤後啟動,此處在配置文件的After=
中增加multi-user.target
,讓smartd在初始化終端之後啟動(由於此系統為runlevel3
啟動,若為runlevel5
則需改為graphical.target
)。然後重新打包ramdisk OS
[root@localhost ~]# cat /usr/lib/systemd/system/smartd.service [Unit] Description=Self Monitoring and Reporting Technology (SMART) Daemon Documentation=man:smartd(8) man:smartd.conf(5) After=syslog.target multi-user.target [Service] EnvironmentFile=-/etc/sysconfig/smartmontools ExecStart=/usr/sbin/smartd -n $smartd_opts ExecReload=/bin/kill -HUP $MAINPID StandardOutput=syslog [Install] WantedBy=multi-user.target [root@localhost FX2010700017L]#
從新的ramdisk OS啟動後查看messages log和smartd服務,如下,smartd在成功終端初始化完成後啟動,並在服務啟動時掃描到硬盤。至此,問題解決。
[root@localhost ~]# cat /var/log/messages ... Mar 28 15:31:43 localhost kernel: scsi 0:0:0:0: Attached scsi generic sg0 type 0 Mar 28 15:31:43 localhost kernel: ata1.00: Enabling discard_zeroes_data Mar 28 15:31:43 localhost kernel: sd 0:0:0:0: [sda] 937703088 512-byte logical blocks: (480 GB/447 GiB) Mar 28 15:31:43 localhost kernel: sd 0:0:0:0: [sda] 4096-byte physical blocks Mar 28 15:31:43 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off Mar 28 15:31:43 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Mar 28 15:31:43 localhost kernel: ata1.00: Enabling discard_zeroes_data Mar 28 15:31:43 localhost kernel: ata1.00: Enabling discard_zeroes_data Mar 28 15:31:43 localhost kernel: sd 0:0:0:0: [sda] Attached SCSI removable disk ... ar 28 15:31:47 localhost systemd: Reached target Multi-User System. Mar 28 15:31:47 localhost systemd: Started Self Monitoring and Reporting Technology (SMART) Daemon. Mar 28 15:31:47 localhost systemd: Started Stop Read-Ahead Data Collection 10s After Completed Startup. Mar 28 15:31:47 localhost systemd: Starting Update UTMP about System Runlevel Changes... Mar 28 15:31:47 localhost systemd: Started Update UTMP about System Runlevel Changes. Mar 28 15:31:47 localhost systemd: Startup finished in 52.156s (kernel) + 8.846s (userspace) = 5min 1.858s. Mar 28 15:31:47 localhost smartd[42226]: smartd 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-957.el7.x86_64] (local build) Mar 28 15:31:47 localhost smartd[42226]: Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org Mar 28 15:31:47 localhost smartd[42226]: Opened configuration file /etc/smartmontools/smartd.conf Mar 28 15:31:47 localhost smartd[42226]: Configuration file /etc/smartmontools/smartd.conf was parsed, found DEVICESCAN, scanning devices Mar 28 15:31:47 localhost smartd[42226]: Device: /dev/sda, type changed from 'scsi' to 'sat' Mar 28 15:31:47 localhost smartd[42226]: Device: /dev/sda [SAT], opened Mar 28 15:31:47 localhost smartd[42226]: Device: /dev/sda [SAT], 480 GB Mar 28 15:31:47 localhost systemd: Created slice User Slice of root. Mar 28 15:31:47 localhost systemd-logind: New session 1 of user root. Mar 28 15:31:47 localhost systemd: Started Session 1 of user root. Mar 28 15:31:47 localhost smartd[42226]: Device: /dev/sda [SAT], not found in smartd database. Mar 28 15:31:47 localhost smartd[42226]: Device: /dev/sda [SAT], can't monitor Offline_Uncorrectable count - no Attribute 198 Mar 28 15:31:47 localhost smartd[42226]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list. Mar 28 15:31:47 localhost smartd[42226]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices ... [root@localhost ~]# systemctl status smartd.service ● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon Loaded: loaded (/usr/lib/systemd/system/smartd.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2020-03-28 15:31:47 CST; 15min ago Docs: man:smartd(8) man:smartd.conf(5) Main PID: 42226 (smartd) CGroup: /system.slice/smartd.service └─42226 /usr/sbin/smartd -n -q never Mar 28 15:31:47 localhost.localdomain smartd[42226]: Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org Mar 28 15:31:47 localhost.localdomain smartd[42226]: Opened configuration file /etc/smartmontools/smartd.conf Mar 28 15:31:47 localhost.localdomain smartd[42226]: Configuration file /etc/smartmontools/smartd.conf was parsed, found DEVICESCAN, scanning devices Mar 28 15:31:47 localhost.localdomain smartd[42226]: Device: /dev/sda, type changed from 'scsi' to 'sat' Mar 28 15:31:47 localhost.localdomain smartd[42226]: Device: /dev/sda [SAT], opened Mar 28 15:31:47 localhost.localdomain smartd[42226]: Device: /dev/sda [SAT], INTEL SSDSCKKB480G8, S/N:PHYH951400VK480K, WWN:5-5cd2e4-1520d72fb, FW:XC311120, 480 GB Mar 28 15:31:47 localhost.localdomain smartd[42226]: Device: /dev/sda [SAT], not found in smartd database. Mar 28 15:31:47 localhost.localdomain smartd[42226]: Device: /dev/sda [SAT], can't monitor Offline_Uncorrectable count - no Attribute 198 Mar 28 15:31:47 localhost.localdomain smartd[42226]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list. Mar 28 15:31:47 localhost.localdomain smartd[42226]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices