DEVICESCAN failed: aborted matching pattern /dev/discs/disc*

  • 2020 年 3 月 31 日
  • 筆記

DEVICESCAN failed: glob(3) aborted matching pattern /dev/discs/disc*

问题描述

在实际环境中,发现一个报错如下

Mar 28 10:09:36 localhost smartd[9865]: DEVICESCAN failed: glob(3) aborted matching pattern /dev/discs/disc*  

该问题出现概率不大,但是由于基数大了,还是会不时就会出现,让你想忽略都没法忽略。今天之前一直以为该问题是hardware导致,如硬盘问题等。但是今天做了一版for PXE的ramdisk OS,竟然可以在原本正常机器上百分百复制该现象,故怀疑该问题另有缘由。

smartd(8)

smartd is a daemon that monitors the Self-Monitoring, Analysis and Reporting Technology (SMART) system  built into many ATA-3 and later ATA, IDE and SCSI-3 hard drives. The purpose of SMART is to monitor the  reliability of the hard drive and predict drive failures, and to carry out different types of drive  self-tests. This version of smartd is compatible with ATA/ATAPI-7 and earlier standards.    smartd will attempt to enable SMART monitoring on ATA devices (equivalent to smartctl -s on) and polls  these and SCSI devices every 30 minutes (configurable), logging SMART errors and changes of SMART  Attributes via the SYSLOG interface. The default location for these SYSLOG notifications and warnings  is /var/log/messages.  

分析

message 信息如下,可知smartd服务开启时,未扫描到硬盘,之后才初始化硬盘。

Mar 28 10:09:36 localhost smartd[9865]: Configuration file /etc/smartmontools/smartd.conf was parsed, found DEVICESCAN, scanning devices  Mar 28 10:09:36 localhost smartd[9865]: DEVICESCAN failed: glob(3) aborted matching pattern /dev/discs/disc*  Mar 28 10:09:36 localhost smartd[9865]: In the system's table of devices NO devices found to scan  Mar 28 10:09:36 localhost smartd[9865]: Monitoring 0 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices  ...  Mar 28 10:09:38 localhost kernel: scsi 0:0:0:0: Attached scsi generic sg0 type 0  Mar 28 10:09:38 localhost kernel: AMD64 EDAC driver v3.4.0  Mar 28 10:09:38 localhost kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11  Mar 28 10:09:38 localhost kernel: ata1.00: Enabling discard_zeroes_data  Mar 28 10:09:38 localhost kernel: sd 0:0:0:0: [sda] 937703088 512-byte logical blocks: (480 GB/447 GiB)  Mar 28 10:09:38 localhost kernel: sd 0:0:0:0: [sda] 4096-byte physical blocks  Mar 28 10:09:38 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off  Mar 28 10:09:38 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA  Mar 28 10:09:38 localhost kernel: ata1.00: Enabling discard_zeroes_data  Mar 28 10:09:38 localhost kernel: ata1.00: Enabling discard_zeroes_data  Mar 28 10:09:38 localhost kernel: sd 0:0:0:0: [sda] Attached SCSI removable disk  

开机后查看smartd服务状态和硬盘状态如下,系统登入后该服务依旧未能监控硬盘

[root@localhost ~]# systemctl status smartd.service  ● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon     Loaded: loaded (/usr/lib/systemd/system/smartd.service; enabled; vendor preset: enabled)     Active: active (running) since Sat 2020-03-28 10:24:35 CST; 45min ago       Docs: man:smartd(8)             man:smartd.conf(5)   Main PID: 8613 (smartd)     CGroup: /system.slice/smartd.service             └─8613 /usr/sbin/smartd -n -q never    Mar 28 10:24:35 localhost.localdomain systemd[1]: Started Self Monitoring and Reporting Technology (SMART) Daemon.  Mar 28 10:24:36 localhost.localdomain smartd[8613]: smartd 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-957.el7.x86_64] (local build)  Mar 28 10:24:36 localhost.localdomain smartd[8613]: Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org  Mar 28 10:24:36 localhost.localdomain smartd[8613]: Opened configuration file /etc/smartmontools/smartd.conf  Mar 28 10:24:36 localhost.localdomain smartd[8613]: Configuration file /etc/smartmontools/smartd.conf was parsed, found DEVICESCAN, scanning devices  Mar 28 10:24:36 localhost.localdomain smartd[8613]: DEVICESCAN failed: glob(3) aborted matching pattern /dev/discs/disc*  Mar 28 10:24:36 localhost.localdomain smartd[8613]: In the system's table of devices NO devices found to scan  Mar 28 10:24:36 localhost.localdomain smartd[8613]: Monitoring 0 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices  [root@localhost ~]# lsblk  NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT  sda    8:0    1 447.1G  0 disk  

尝试修复

为修复此问题,此处尝试延后smartd,使smartd在初始化硬盘之后启动。 原始 smartd.service 配置文件如下

[root@server ~]# systemctl cat smartd.service  [Unit]  Description=Self Monitoring and Reporting Technology (SMART) Daemon  Documentation=man:smartd(8) man:smartd.conf(5)  After=syslog.target    [Service]  EnvironmentFile=-/etc/sysconfig/smartmontools  ExecStart=/usr/sbin/smartd -n $smartd_opts  ExecReload=/bin/kill -HUP $MAINPID  StandardOutput=syslog    [Install]  WantedBy=multi-user.target  [root@localhost FX2010700017L]#  

为保证 smartd.service 在初始化硬盘后启动,此处在配置文件的After=中增加multi-user.target,让smartd在初始化终端之后启动(由于此系统为runlevel3启动,若为runlevel5则需改为graphical.target)。然后重新打包ramdisk OS

[root@localhost ~]# cat /usr/lib/systemd/system/smartd.service  [Unit]  Description=Self Monitoring and Reporting Technology (SMART) Daemon  Documentation=man:smartd(8) man:smartd.conf(5)  After=syslog.target multi-user.target    [Service]  EnvironmentFile=-/etc/sysconfig/smartmontools  ExecStart=/usr/sbin/smartd -n $smartd_opts  ExecReload=/bin/kill -HUP $MAINPID  StandardOutput=syslog    [Install]  WantedBy=multi-user.target  [root@localhost FX2010700017L]#  

从新的ramdisk OS启动后查看messages log和smartd服务,如下,smartd在成功终端初始化完成后启动,并在服务启动时扫描到硬盘。至此,问题解决。

[root@localhost ~]# cat /var/log/messages  ...  Mar 28 15:31:43 localhost kernel: scsi 0:0:0:0: Attached scsi generic sg0 type 0  Mar 28 15:31:43 localhost kernel: ata1.00: Enabling discard_zeroes_data  Mar 28 15:31:43 localhost kernel: sd 0:0:0:0: [sda] 937703088 512-byte logical blocks: (480 GB/447 GiB)  Mar 28 15:31:43 localhost kernel: sd 0:0:0:0: [sda] 4096-byte physical blocks  Mar 28 15:31:43 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off  Mar 28 15:31:43 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA  Mar 28 15:31:43 localhost kernel: ata1.00: Enabling discard_zeroes_data  Mar 28 15:31:43 localhost kernel: ata1.00: Enabling discard_zeroes_data  Mar 28 15:31:43 localhost kernel: sd 0:0:0:0: [sda] Attached SCSI removable disk  ...  ar 28 15:31:47 localhost systemd: Reached target Multi-User System.  Mar 28 15:31:47 localhost systemd: Started Self Monitoring and Reporting Technology (SMART) Daemon.  Mar 28 15:31:47 localhost systemd: Started Stop Read-Ahead Data Collection 10s After Completed Startup.  Mar 28 15:31:47 localhost systemd: Starting Update UTMP about System Runlevel Changes...  Mar 28 15:31:47 localhost systemd: Started Update UTMP about System Runlevel Changes.  Mar 28 15:31:47 localhost systemd: Startup finished in 52.156s (kernel) + 8.846s (userspace) = 5min 1.858s.  Mar 28 15:31:47 localhost smartd[42226]: smartd 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-957.el7.x86_64] (local build)  Mar 28 15:31:47 localhost smartd[42226]: Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org  Mar 28 15:31:47 localhost smartd[42226]: Opened configuration file /etc/smartmontools/smartd.conf  Mar 28 15:31:47 localhost smartd[42226]: Configuration file /etc/smartmontools/smartd.conf was parsed, found DEVICESCAN, scanning devices  Mar 28 15:31:47 localhost smartd[42226]: Device: /dev/sda, type changed from 'scsi' to 'sat'  Mar 28 15:31:47 localhost smartd[42226]: Device: /dev/sda [SAT], opened  Mar 28 15:31:47 localhost smartd[42226]: Device: /dev/sda [SAT], 480 GB  Mar 28 15:31:47 localhost systemd: Created slice User Slice of root.  Mar 28 15:31:47 localhost systemd-logind: New session 1 of user root.  Mar 28 15:31:47 localhost systemd: Started Session 1 of user root.  Mar 28 15:31:47 localhost smartd[42226]: Device: /dev/sda [SAT], not found in smartd database.  Mar 28 15:31:47 localhost smartd[42226]: Device: /dev/sda [SAT], can't monitor Offline_Uncorrectable count - no Attribute 198  Mar 28 15:31:47 localhost smartd[42226]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list.  Mar 28 15:31:47 localhost smartd[42226]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices  ...  [root@localhost ~]# systemctl status smartd.service  ● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon     Loaded: loaded (/usr/lib/systemd/system/smartd.service; enabled; vendor preset: enabled)     Active: active (running) since Sat 2020-03-28 15:31:47 CST; 15min ago       Docs: man:smartd(8)             man:smartd.conf(5)   Main PID: 42226 (smartd)     CGroup: /system.slice/smartd.service             └─42226 /usr/sbin/smartd -n -q never    Mar 28 15:31:47 localhost.localdomain smartd[42226]: Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org  Mar 28 15:31:47 localhost.localdomain smartd[42226]: Opened configuration file /etc/smartmontools/smartd.conf  Mar 28 15:31:47 localhost.localdomain smartd[42226]: Configuration file /etc/smartmontools/smartd.conf was parsed, found DEVICESCAN, scanning devices  Mar 28 15:31:47 localhost.localdomain smartd[42226]: Device: /dev/sda, type changed from 'scsi' to 'sat'  Mar 28 15:31:47 localhost.localdomain smartd[42226]: Device: /dev/sda [SAT], opened  Mar 28 15:31:47 localhost.localdomain smartd[42226]: Device: /dev/sda [SAT], INTEL SSDSCKKB480G8, S/N:PHYH951400VK480K, WWN:5-5cd2e4-1520d72fb, FW:XC311120, 480 GB  Mar 28 15:31:47 localhost.localdomain smartd[42226]: Device: /dev/sda [SAT], not found in smartd database.  Mar 28 15:31:47 localhost.localdomain smartd[42226]: Device: /dev/sda [SAT], can't monitor Offline_Uncorrectable count - no Attribute 198  Mar 28 15:31:47 localhost.localdomain smartd[42226]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list.  Mar 28 15:31:47 localhost.localdomain smartd[42226]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices