Locating disks on large enclosures containing 90+ drives can be problematic, since their status LED's are not visible, and often are not even activated by ZFS volume errors. The following applies to tux (270 drives), noss1,noss2,noss3 (90 drives each). All the necessary dependencies and the script itself have been installed in their respective /root/sysadmin/drive_utils directories. The ledmon and ledctl utilities can work, but require a block id (/dev/sdxx). The system typically only gives UUID's, and these block id's are not even persistent between reboots, so they have to be regenerated every time a disk intervention is required. Only scsi UUID's are persistent. A zpool status command will typically show errors on the volumes, with scsi id's. The usual zfs replacement routines can be applied, but the physical drives need to be positively identified in the enclosures. The following script will give all the drives' scsi UUID's, block id's, and physical device serial numbers, for all the controllers present, and all the drives in them: Located in /root/sysadmin/drive_utils as map_drives.sh ******************************************************************************************** #!/bin/bash # JWS - March 2019 johann.swart@up.ac.za # Script to find and associate hard drive block id's with their UUID's and serial numbers # For example - ZFS, and other system functions will always return a UUID in a fault report, but some utilities require a /dev/sdXX id. # Also, a serial number of the actual drive must be present to confirm its identity. # The output of this script is quite large on the JBOD and Lustre enclosures, but can be grep'ed with the UUID from the system. # The drive will then be located by flashing with ledctl utilities, which requires a block id - /dev/sdXX # Since block id's are not persistent, they have to be generated from scratch each time, before the main script is run, so this temporary file is datestamped prefix=`date +%d%m%Y` file_name="disks_blkid_$prefix.txt" lsscsi|grep "/dev/sd"|grep "ATA"|awk ' { print $7 } ' | cut -c6- > $file_name while IFS= read -r block_device do udevadm info -q all -p /sys/block/$block_device | awk ' Â Â Â Â /DEVLINKS/Â { printf "%s\n", $2; } Â Â Â Â /DEVNAME/Â { printf "%s\n",$2; } Â Â Â Â /ID_SCSI_SERIAL/Â { printf "%s\n\n",$2; } Â Â Â Â Â Â ' done < $file_name ******************************************************************************************** Running this script will give a lot of output like: . . DEVLINKS=/dev/disk/by-id/scsi-35000cca255de35e9 DEVNAME=/dev/sdjh ID_SCSI_SERIAL=K1J4G74D DEVLINKS=/dev/disk/by-id/scsi-35000cca255de1ece DEVNAME=/dev/sdji ID_SCSI_SERIAL=K1J482AD DEVLINKS=/dev/disk/by-id/scsi-35000cca255dbf06f DEVNAME=/dev/sdjj ID_SCSI_SERIAL=K1HZGA8D . . which can be parsed for a particular UUID with (for example): ./map_drives.sh|grep -A 2 scsi-35000cca255dbf06f DEVLINKS=/dev/disk/by-id/scsi-35000cca255dbf06f DEVNAME=/dev/sdjj ID_SCSI_SERIAL=K1HZGA8D Once the block id (dev/sdjj) is obtained, it can be used to flash the status LED: ledctl locate=/dev/sdjj and to turn it off: ledctl locate_off=/dev/sdjj Note: The map_drives.sh script have to be run every time just before a drive is replaced, since the system status can change over time, and the incorrect drives can possibly be identified. Nobody tells you this, but ledctl and ledmon require OpenIPMI to be installed from the yum repos. Now that the serial number of the drive is known, it can be located in an enclosure by this script: (Pass the serial number as argument) locate_serial.sh ************************************************************************************************************ #!/bin/tcsh # JWS - March 2019 johann.swart@up.ac.za # Locates a drive on Tux/Lustre in a specific enclosure # Command line param gives serial number to search for. The serial can be obtained from map_drives.sh script set serial=$1 set enclosure = `storcli64 /call/eall/sall show all|grep -B 2 $serial|grep Drive|cut -c9` echo "Serial: "$serial echo "Enclosure (0,1,2): "$enclosure ************************************************************************************************************ eg: [root@tux drive_utils]# ./locate_serial.sh K1HZGA8D Serial: K1HZGA8D Enclosure (0,1,2): 2 |
Home‎ > ‎Server config‎ > ‎