Archive for December, 2018

RIP /volume3

Thursday, December 27th, 2018

Yesterday, after about 40K hours of uptime, the HDD behind /volume3 on my NAS has died.

It didn’t go “poof”, but its health got bad enough for my NAS to warn me. The advice was plain and simple: backup everything and get rid of the crashed volume.

Fortunately, this was one of the volumes containing less valuable data so I didn’t lose anything important. I’ve also got local and remote backups of the more important things.

Still, losing a disk is never fun and leads to a lot of wasted time. After a few hours, I could recover most of the data on the disk apart from a few files lying across bad sectors.

Then, just out of curiosity I wanted to check the disk and try a repair of the volume.

First, I’ve shut down every service apart from the SSH daemon:

syno_poweroff_task -d

Then, I’ve identified the faulty disk/RAID array using the commands I’ve shared in a previous post:

cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md3 : active raid1 sde3[0]
      3902296256 blocks super 1.2 [1/1] [U]

md6 : active raid1 sdc3[0]
      3902296256 blocks super 1.2 [1/1] [U]

md5 : active raid1 sdf3[0]
      3902296256 blocks super 1.2 [1/1] [U]

md7 : active raid1 sdg3[0]
      3902296256 blocks super 1.2 [1/1] [U]

md2 : active raid1 sda3[0]
      3902296256 blocks super 1.2 [1/1] [U]

md9 : active raid1 sdh3[0]
      7809204416 blocks super 1.2 [1/1] [U]

md8 : active raid1 sdd3[0]
      3902196416 blocks super 1.2 [1/1] [U]

md4 : active raid1 sdb3[0](E)
      1948792256 blocks super 1.2 [1/1] [E]

md1 : active raid1 sda2[0] sdb2[1] sdc2[2] sdd2[3] sde2[4] sdf2[5] sdg2[6] sdh2[7]
      2097088 blocks [8/8] [UUUUUUUU]

md0 : active raid1 sda1[0] sdb1[2] sdc1[4] sdd1[6] sde1[1] sdf1[3] sdg1[5] sdh1[7]
      2490176 blocks [8/8] [UUUUUUUU]

unused devices: 

As you can see above, the array in error was md4 with the associated sdb3 disk.

NOTE: I only have single-drive RAID “arrays”.

Then I took a look at the md4 array:

mdadm --detail /dev/md4
/dev/md4:
        Version : 1.2
  Creation Time : Sun Sep  8 10:16:10 2013
     Raid Level : raid1
     Array Size : 1948792256 (1858.51 GiB 1995.56 GB)
  Used Dev Size : 1948792256 (1858.51 GiB 1995.56 GB)
   Raid Devices : 1
  Total Devices : 1
    Persistence : Superblock is persistent

    Update Time : Wed Dec 26 21:40:05 2018
          State : clean
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : NAS:4  (local to host NAS)
           UUID : 096b0ec0:3aec6ef5:5f685a2b:5ff95e38
         Events : 7

    Number   Major   Minor   RaidDevice State
       0       8       19        0      active sync   /dev/sdb3

And at the disk:

mdadm --examine /dev/sdb3
/dev/sdb3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 096b0ec0:3aec6ef5:5f685a2b:5ff95e38
           Name : NAS:4  (local to host NAS)
  Creation Time : Sun Sep  8 10:16:10 2013
     Raid Level : raid1
   Raid Devices : 1

 Avail Dev Size : 3897584512 (1858.51 GiB 1995.56 GB)
     Array Size : 3897584512 (1858.51 GiB 1995.56 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 46ed084c:686ee160:5fa3a986:574d1182

    Update Time : Wed Dec 26 21:40:05 2018
       Checksum : 9ffab586 - correct
         Events : 7


   Device Role : Active device 0
   Array State : A ('A' == active, '.' == missing)

udevadm info --query=all --name=/dev/sdb3
P: /devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb3
N: sdb3
E: DEVNAME=/dev/sdb3
E: DEVPATH=/devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb3
E: DEVTYPE=partition
E: ID_FS_LABEL=NAS:4
E: ID_FS_LABEL_ENC=NAS:4
E: ID_FS_TYPE=linux_raid_member
E: ID_FS_USAGE=raid
E: ID_FS_UUID=096b0ec0-3aec-6ef5-5f68-5a2b5ff95e38
E: ID_FS_UUID_ENC=096b0ec0-3aec-6ef5-5f68-5a2b5ff95e38
E: ID_FS_UUID_SUB=46ed084c-686e-e160-5fa3-a986574d1182
E: ID_FS_UUID_SUB_ENC=46ed084c-686e-e160-5fa3-a986574d1182
E: ID_FS_VERSION=1.2
E: ID_PART_ENTRY_DISK=8:16
E: ID_PART_ENTRY_NUMBER=3
E: ID_PART_ENTRY_OFFSET=9437184
E: ID_PART_ENTRY_SCHEME=dos
E: ID_PART_ENTRY_SIZE=3897586881
E: ID_PART_ENTRY_TYPE=0xfd
E: ID_PART_ENTRY_UUID=00003837-03
E: MAJOR=8
E: MINOR=19
E: PHYSDEVBUS=scsi
E: PHYSDEVDRIVER=sd
E: PHYSDEVPATH=/devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0
E: SUBSYSTEM=block
E: SYNO_DEV_DISKPORTTYPE=SATA
E: SYNO_INFO_PLATFORM_NAME=cedarview
E: SYNO_KERNEL_VERSION=3.10
E: USEC_INITIALIZED=112532

Then I’ve unmounted the faulty volume and stopped the corresponding RAID array:

umount /volume3
mdadm –stop /dev/md4

After that, I’ve re-created the array:

mdadm -Cf /dev/md4 -e1.2 -n1 -l1 /dev/sdb3 -u096b0ec0:3aec6ef5:5f685a2b:5ff95e38

Finally, I ran a file system check:

fsck.ext4 -v -f -y /dev/mapper/vol3-origin

Where /dev/mapper/vol3-origin was an easy to use pointer to the device.

From the NAS’s point of view, everything is now fine (haha), but of course I can’t trust that disk anymore. Now I just have to wait a few days to get a replacement and set it up.

On the bright side, I’ll take the occasion to upgrade to a 10-12TB disk (assuming those are compatible with my Synology NAS..) ^_^. That way I’ll prepare a new disaster for today + 40K hours.. ;-)