DEVICESCAN failed: aborted matching pattern /dev/discs/disc*
- 2020 年 3 月 31 日
- 筆記
DEVICESCAN failed: glob(3) aborted matching pattern /dev/discs/disc*
问题描述
在实际环境中,发现一个报错如下
Mar 28 10:09:36 localhost smartd[9865]: DEVICESCAN failed: glob(3) aborted matching pattern /dev/discs/disc*
该问题出现概率不大,但是由于基数大了,还是会不时就会出现,让你想忽略都没法忽略。今天之前一直以为该问题是hardware
导致,如硬盘问题等。但是今天做了一版for PXE的ramdisk OS,竟然可以在原本正常机器上百分百复制该现象,故怀疑该问题另有缘由。
smartd(8)
smartd is a daemon that monitors the Self-Monitoring, Analysis and Reporting Technology (SMART) system built into many ATA-3 and later ATA, IDE and SCSI-3 hard drives. The purpose of SMART is to monitor the reliability of the hard drive and predict drive failures, and to carry out different types of drive self-tests. This version of smartd is compatible with ATA/ATAPI-7 and earlier standards. smartd will attempt to enable SMART monitoring on ATA devices (equivalent to smartctl -s on) and polls these and SCSI devices every 30 minutes (configurable), logging SMART errors and changes of SMART Attributes via the SYSLOG interface. The default location for these SYSLOG notifications and warnings is /var/log/messages.
分析
message 信息如下,可知smartd服务开启时,未扫描到硬盘,之后才初始化硬盘。
Mar 28 10:09:36 localhost smartd[9865]: Configuration file /etc/smartmontools/smartd.conf was parsed, found DEVICESCAN, scanning devices Mar 28 10:09:36 localhost smartd[9865]: DEVICESCAN failed: glob(3) aborted matching pattern /dev/discs/disc* Mar 28 10:09:36 localhost smartd[9865]: In the system's table of devices NO devices found to scan Mar 28 10:09:36 localhost smartd[9865]: Monitoring 0 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices ... Mar 28 10:09:38 localhost kernel: scsi 0:0:0:0: Attached scsi generic sg0 type 0 Mar 28 10:09:38 localhost kernel: AMD64 EDAC driver v3.4.0 Mar 28 10:09:38 localhost kernel: Request for unknown module key 'Mellanox Technologies signing key: 61feb074fc7292f958419386ffdd9d5ca999e403' err -11 Mar 28 10:09:38 localhost kernel: ata1.00: Enabling discard_zeroes_data Mar 28 10:09:38 localhost kernel: sd 0:0:0:0: [sda] 937703088 512-byte logical blocks: (480 GB/447 GiB) Mar 28 10:09:38 localhost kernel: sd 0:0:0:0: [sda] 4096-byte physical blocks Mar 28 10:09:38 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off Mar 28 10:09:38 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Mar 28 10:09:38 localhost kernel: ata1.00: Enabling discard_zeroes_data Mar 28 10:09:38 localhost kernel: ata1.00: Enabling discard_zeroes_data Mar 28 10:09:38 localhost kernel: sd 0:0:0:0: [sda] Attached SCSI removable disk
开机后查看smartd服务状态和硬盘状态如下,系统登入后该服务依旧未能监控硬盘
[root@localhost ~]# systemctl status smartd.service ● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon Loaded: loaded (/usr/lib/systemd/system/smartd.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2020-03-28 10:24:35 CST; 45min ago Docs: man:smartd(8) man:smartd.conf(5) Main PID: 8613 (smartd) CGroup: /system.slice/smartd.service └─8613 /usr/sbin/smartd -n -q never Mar 28 10:24:35 localhost.localdomain systemd[1]: Started Self Monitoring and Reporting Technology (SMART) Daemon. Mar 28 10:24:36 localhost.localdomain smartd[8613]: smartd 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-957.el7.x86_64] (local build) Mar 28 10:24:36 localhost.localdomain smartd[8613]: Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org Mar 28 10:24:36 localhost.localdomain smartd[8613]: Opened configuration file /etc/smartmontools/smartd.conf Mar 28 10:24:36 localhost.localdomain smartd[8613]: Configuration file /etc/smartmontools/smartd.conf was parsed, found DEVICESCAN, scanning devices Mar 28 10:24:36 localhost.localdomain smartd[8613]: DEVICESCAN failed: glob(3) aborted matching pattern /dev/discs/disc* Mar 28 10:24:36 localhost.localdomain smartd[8613]: In the system's table of devices NO devices found to scan Mar 28 10:24:36 localhost.localdomain smartd[8613]: Monitoring 0 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices [root@localhost ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 1 447.1G 0 disk
尝试修复
为修复此问题,此处尝试延后smartd,使smartd在初始化硬盘之后启动。 原始 smartd.service 配置文件如下
[root@server ~]# systemctl cat smartd.service [Unit] Description=Self Monitoring and Reporting Technology (SMART) Daemon Documentation=man:smartd(8) man:smartd.conf(5) After=syslog.target [Service] EnvironmentFile=-/etc/sysconfig/smartmontools ExecStart=/usr/sbin/smartd -n $smartd_opts ExecReload=/bin/kill -HUP $MAINPID StandardOutput=syslog [Install] WantedBy=multi-user.target [root@localhost FX2010700017L]#
为保证 smartd.service 在初始化硬盘后启动,此处在配置文件的After=
中增加multi-user.target
,让smartd在初始化终端之后启动(由于此系统为runlevel3
启动,若为runlevel5
则需改为graphical.target
)。然后重新打包ramdisk OS
[root@localhost ~]# cat /usr/lib/systemd/system/smartd.service [Unit] Description=Self Monitoring and Reporting Technology (SMART) Daemon Documentation=man:smartd(8) man:smartd.conf(5) After=syslog.target multi-user.target [Service] EnvironmentFile=-/etc/sysconfig/smartmontools ExecStart=/usr/sbin/smartd -n $smartd_opts ExecReload=/bin/kill -HUP $MAINPID StandardOutput=syslog [Install] WantedBy=multi-user.target [root@localhost FX2010700017L]#
从新的ramdisk OS启动后查看messages log和smartd服务,如下,smartd在成功终端初始化完成后启动,并在服务启动时扫描到硬盘。至此,问题解决。
[root@localhost ~]# cat /var/log/messages ... Mar 28 15:31:43 localhost kernel: scsi 0:0:0:0: Attached scsi generic sg0 type 0 Mar 28 15:31:43 localhost kernel: ata1.00: Enabling discard_zeroes_data Mar 28 15:31:43 localhost kernel: sd 0:0:0:0: [sda] 937703088 512-byte logical blocks: (480 GB/447 GiB) Mar 28 15:31:43 localhost kernel: sd 0:0:0:0: [sda] 4096-byte physical blocks Mar 28 15:31:43 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off Mar 28 15:31:43 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Mar 28 15:31:43 localhost kernel: ata1.00: Enabling discard_zeroes_data Mar 28 15:31:43 localhost kernel: ata1.00: Enabling discard_zeroes_data Mar 28 15:31:43 localhost kernel: sd 0:0:0:0: [sda] Attached SCSI removable disk ... ar 28 15:31:47 localhost systemd: Reached target Multi-User System. Mar 28 15:31:47 localhost systemd: Started Self Monitoring and Reporting Technology (SMART) Daemon. Mar 28 15:31:47 localhost systemd: Started Stop Read-Ahead Data Collection 10s After Completed Startup. Mar 28 15:31:47 localhost systemd: Starting Update UTMP about System Runlevel Changes... Mar 28 15:31:47 localhost systemd: Started Update UTMP about System Runlevel Changes. Mar 28 15:31:47 localhost systemd: Startup finished in 52.156s (kernel) + 8.846s (userspace) = 5min 1.858s. Mar 28 15:31:47 localhost smartd[42226]: smartd 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-957.el7.x86_64] (local build) Mar 28 15:31:47 localhost smartd[42226]: Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org Mar 28 15:31:47 localhost smartd[42226]: Opened configuration file /etc/smartmontools/smartd.conf Mar 28 15:31:47 localhost smartd[42226]: Configuration file /etc/smartmontools/smartd.conf was parsed, found DEVICESCAN, scanning devices Mar 28 15:31:47 localhost smartd[42226]: Device: /dev/sda, type changed from 'scsi' to 'sat' Mar 28 15:31:47 localhost smartd[42226]: Device: /dev/sda [SAT], opened Mar 28 15:31:47 localhost smartd[42226]: Device: /dev/sda [SAT], 480 GB Mar 28 15:31:47 localhost systemd: Created slice User Slice of root. Mar 28 15:31:47 localhost systemd-logind: New session 1 of user root. Mar 28 15:31:47 localhost systemd: Started Session 1 of user root. Mar 28 15:31:47 localhost smartd[42226]: Device: /dev/sda [SAT], not found in smartd database. Mar 28 15:31:47 localhost smartd[42226]: Device: /dev/sda [SAT], can't monitor Offline_Uncorrectable count - no Attribute 198 Mar 28 15:31:47 localhost smartd[42226]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list. Mar 28 15:31:47 localhost smartd[42226]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices ... [root@localhost ~]# systemctl status smartd.service ● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon Loaded: loaded (/usr/lib/systemd/system/smartd.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2020-03-28 15:31:47 CST; 15min ago Docs: man:smartd(8) man:smartd.conf(5) Main PID: 42226 (smartd) CGroup: /system.slice/smartd.service └─42226 /usr/sbin/smartd -n -q never Mar 28 15:31:47 localhost.localdomain smartd[42226]: Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org Mar 28 15:31:47 localhost.localdomain smartd[42226]: Opened configuration file /etc/smartmontools/smartd.conf Mar 28 15:31:47 localhost.localdomain smartd[42226]: Configuration file /etc/smartmontools/smartd.conf was parsed, found DEVICESCAN, scanning devices Mar 28 15:31:47 localhost.localdomain smartd[42226]: Device: /dev/sda, type changed from 'scsi' to 'sat' Mar 28 15:31:47 localhost.localdomain smartd[42226]: Device: /dev/sda [SAT], opened Mar 28 15:31:47 localhost.localdomain smartd[42226]: Device: /dev/sda [SAT], INTEL SSDSCKKB480G8, S/N:PHYH951400VK480K, WWN:5-5cd2e4-1520d72fb, FW:XC311120, 480 GB Mar 28 15:31:47 localhost.localdomain smartd[42226]: Device: /dev/sda [SAT], not found in smartd database. Mar 28 15:31:47 localhost.localdomain smartd[42226]: Device: /dev/sda [SAT], can't monitor Offline_Uncorrectable count - no Attribute 198 Mar 28 15:31:47 localhost.localdomain smartd[42226]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list. Mar 28 15:31:47 localhost.localdomain smartd[42226]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices