RIP /volume3
Thursday, December 27th, 2018Yesterday, after about 40K hours of uptime, the HDD behind /volume3 on my NAS has died.
It didn’t go “poof”, but its health got bad enough for my NAS to warn me. The advice was plain and simple: backup everything and get rid of the crashed volume.
Fortunately, this was one of the volumes containing less valuable data so I didn’t lose anything important. I’ve also got local and remote backups of the more important things.
Still, losing a disk is never fun and leads to a lot of wasted time. After a few hours, I could recover most of the data on the disk apart from a few files lying across bad sectors.
Then, just out of curiosity I wanted to check the disk and try a repair of the volume.
First, I’ve shut down every service apart from the SSH daemon:
syno_poweroff_task -d
Then, I’ve identified the faulty disk/RAID array using the commands I’ve shared in a previous post:
cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md3 : active raid1 sde3[0] 3902296256 blocks super 1.2 [1/1] [U] md6 : active raid1 sdc3[0] 3902296256 blocks super 1.2 [1/1] [U] md5 : active raid1 sdf3[0] 3902296256 blocks super 1.2 [1/1] [U] md7 : active raid1 sdg3[0] 3902296256 blocks super 1.2 [1/1] [U] md2 : active raid1 sda3[0] 3902296256 blocks super 1.2 [1/1] [U] md9 : active raid1 sdh3[0] 7809204416 blocks super 1.2 [1/1] [U] md8 : active raid1 sdd3[0] 3902196416 blocks super 1.2 [1/1] [U] md4 : active raid1 sdb3[0](E) 1948792256 blocks super 1.2 [1/1] [E] md1 : active raid1 sda2[0] sdb2[1] sdc2[2] sdd2[3] sde2[4] sdf2[5] sdg2[6] sdh2[7] 2097088 blocks [8/8] [UUUUUUUU] md0 : active raid1 sda1[0] sdb1[2] sdc1[4] sdd1[6] sde1[1] sdf1[3] sdg1[5] sdh1[7] 2490176 blocks [8/8] [UUUUUUUU] unused devices:
As you can see above, the array in error was md4 with the associated sdb3 disk.
NOTE: I only have single-drive RAID “arrays”.
Then I took a look at the md4 array:
mdadm --detail /dev/md4 /dev/md4: Version : 1.2 Creation Time : Sun Sep 8 10:16:10 2013 Raid Level : raid1 Array Size : 1948792256 (1858.51 GiB 1995.56 GB) Used Dev Size : 1948792256 (1858.51 GiB 1995.56 GB) Raid Devices : 1 Total Devices : 1 Persistence : Superblock is persistent Update Time : Wed Dec 26 21:40:05 2018 State : clean Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Name : NAS:4 (local to host NAS) UUID : 096b0ec0:3aec6ef5:5f685a2b:5ff95e38 Events : 7 Number Major Minor RaidDevice State 0 8 19 0 active sync /dev/sdb3
And at the disk:
mdadm --examine /dev/sdb3 /dev/sdb3: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 096b0ec0:3aec6ef5:5f685a2b:5ff95e38 Name : NAS:4 (local to host NAS) Creation Time : Sun Sep 8 10:16:10 2013 Raid Level : raid1 Raid Devices : 1 Avail Dev Size : 3897584512 (1858.51 GiB 1995.56 GB) Array Size : 3897584512 (1858.51 GiB 1995.56 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : 46ed084c:686ee160:5fa3a986:574d1182 Update Time : Wed Dec 26 21:40:05 2018 Checksum : 9ffab586 - correct Events : 7 Device Role : Active device 0 Array State : A ('A' == active, '.' == missing) udevadm info --query=all --name=/dev/sdb3 P: /devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb3 N: sdb3 E: DEVNAME=/dev/sdb3 E: DEVPATH=/devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb3 E: DEVTYPE=partition E: ID_FS_LABEL=NAS:4 E: ID_FS_LABEL_ENC=NAS:4 E: ID_FS_TYPE=linux_raid_member E: ID_FS_USAGE=raid E: ID_FS_UUID=096b0ec0-3aec-6ef5-5f68-5a2b5ff95e38 E: ID_FS_UUID_ENC=096b0ec0-3aec-6ef5-5f68-5a2b5ff95e38 E: ID_FS_UUID_SUB=46ed084c-686e-e160-5fa3-a986574d1182 E: ID_FS_UUID_SUB_ENC=46ed084c-686e-e160-5fa3-a986574d1182 E: ID_FS_VERSION=1.2 E: ID_PART_ENTRY_DISK=8:16 E: ID_PART_ENTRY_NUMBER=3 E: ID_PART_ENTRY_OFFSET=9437184 E: ID_PART_ENTRY_SCHEME=dos E: ID_PART_ENTRY_SIZE=3897586881 E: ID_PART_ENTRY_TYPE=0xfd E: ID_PART_ENTRY_UUID=00003837-03 E: MAJOR=8 E: MINOR=19 E: PHYSDEVBUS=scsi E: PHYSDEVDRIVER=sd E: PHYSDEVPATH=/devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0 E: SUBSYSTEM=block E: SYNO_DEV_DISKPORTTYPE=SATA E: SYNO_INFO_PLATFORM_NAME=cedarview E: SYNO_KERNEL_VERSION=3.10 E: USEC_INITIALIZED=112532
Then I’ve unmounted the faulty volume and stopped the corresponding RAID array:
umount /volume3 mdadm –stop /dev/md4
After that, I’ve re-created the array:
mdadm -Cf /dev/md4 -e1.2 -n1 -l1 /dev/sdb3 -u096b0ec0:3aec6ef5:5f685a2b:5ff95e38
Finally, I ran a file system check:
fsck.ext4 -v -f -y /dev/mapper/vol3-origin
Where /dev/mapper/vol3-origin was an easy to use pointer to the device.
From the NAS’s point of view, everything is now fine (haha), but of course I can’t trust that disk anymore. Now I just have to wait a few days to get a replacement and set it up.
On the bright side, I’ll take the occasion to upgrade to a 10-12TB disk (assuming those are compatible with my Synology NAS..) ^_^. That way I’ll prepare a new disaster for today + 40K hours.. ;-)