Recovering a raid array in “[E]” state on a Synology nas
Tuesday, May 19th, 2015WARNING: If you encounter a similar issue, try to contact Synology first, they are ultra responsive and solved my issue in less than a business day (although I’m no enterprise customer). Commands that Synology provided me and that I mention below can wipe away all your data, so you’ve been warned :)
TL;DR: If you have a RAID array in [E] (DiskError) state (Synology-specific error state), then the only option seems to re-create the array and run a file system check/repair afterwards (assuming that your disks are fine to begin with).
Recently I’ve learned that Synology introduced Docker support in their 5.2 firmware (yay!), but unfortunately for me, just when I was about to try it out, I noticed an ugly ORANGE led on my NAS where I always like to see GREEN ones..
The NAS didn’t respond at all so I had no choice but to power it off. I first tried gently but that didn’t help so I had to do it the hard way. Once restarted, another disk had an ORANGE led and at that point I understood that I was in for a bit of command-line fun :(
The Web interface was pretty clear with me, my Volume2 was Crashed (that didn’t look like good news :o) and couldn’t be repaired (through the UI that is).
After fiddling around for a while through SSH, I discovered that my NAS created RAID 1 arrays for me (with one disk in each), which I wasn’t aware of; I actually never wanted to use RAID in my NAS!
I guess it makes sense for beginner users as it allows them to easily expand capacity/availability without having to know anything about RAID, but in my case I wasn’t concerned about availability and since RAID is no backup solution (hope you know why!), I didn’t want that at all, I have proper backups (on & off-site).
Well in any case I did have a crashed RAID 1 single disk array so I had to deal with it anyway.. :)
Here’s the output of some commands I ran which helped me better understand what was going on.
The /var/log/messages showed that something was wrong with the filesystem:
May 17 14:59:26 SynoTnT kernel: [ 49.817690] EXT4-fs warning (device dm-4): ext4_clear_journal_err:4877: Filesystem error recorded from previous mount: IO failure May 17 14:59:26 SynoTnT kernel: [ 49.829467] EXT4-fs warning (device dm-4): ext4_clear_journal_err:4878: Marking fs in need of filesystem check. May 17 14:59:26 SynoTnT kernel: [ 49.860638] EXT4-fs (dm-4): warning: mounting fs with errors, running e2fsck is recommended ...
Running e2fsck at that point didn’t help.
A check of the disk arrays gave me more information:
> cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [ md2 : active raid1 sda3[0] 3902296256 blocks super 1.2 [1/1] [U] md6 : active raid1 sdc3[0] 3902296256 blocks super 1.2 [1/1] [U] md5 : active raid1 sdf3[0] 3902296256 blocks super 1.2 [1/1] [U] md3 : active raid1 sde3[0](E) 3902296256 blocks super 1.2 [1/1] [E] md7 : active raid1 sdg3[0] 3902296256 blocks super 1.2 [1/1] [U] md4 : active raid1 sdb3[0] 1948792256 blocks super 1.2 [1/1] [U] md1 : active raid1 sda2[0] sdb2[2] sdc2[4] sde2[1] sdf2[3] sdg2[5] 2097088 blocks [8/6] [UUUUUU__] md0 : active raid1 sda1[0] sdb1[2] sdc1[4] sde1[1] sdf1[3] sdg1[5] 2490176 blocks [8/6] [UUUUUU__] unused devices:
As you can see above, the md3 array was active but in a weird [E] state. After Googling a bit I discovered that the [E] state is specific to Synology, as that guy explains here. Synology doesn’t provide any documentation around this marker; they only state in their documentation that we should contact them if a volume is Crashed.
Continuing, I took a detailed look at the md3 array and the ‘partition’ attached to it, which looked okay; so purely from a classic RAID array point of view, everything was alright!
> mdadm --detail /dev/md3 /dev/md3: Version : 1.2 Creation Time : Fri Jul 5 14:59:33 2013 Raid Level : raid1 Array Size : 3902296256 (3721.52 GiB 3995.95 GB) Used Dev Size : 3902296256 (3721.52 GiB 3995.95 GB) Raid Devices : 1 Total Devices : 1 Persistence : Superblock is persistent Update Time : Sun May 17 18:21:27 2015 State : clean Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Name : SynoTnT:3 (local to host SynoTnT) UUID : 2143565c:345a0478:e33ac874:445e6e7b Events : 22 Number Major Minor RaidDevice State 0 8 67 0 active sync /dev/sde3 > mdadm --examine /dev/sde3 /dev/sde3: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 2143565c:345a0478:e33ac874:445e6e7b Name : SynoTnT:3 (local to host SynoTnT) Creation Time : Fri Jul 5 14:59:33 2013 Raid Level : raid1 Raid Devices : 1 Avail Dev Size : 7804592833 (3721.52 GiB 3995.95 GB) Array Size : 7804592512 (3721.52 GiB 3995.95 GB) Used Dev Size : 7804592512 (3721.52 GiB 3995.95 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : a2e64ee9:f4030905:52794fc2:0532688f Update Time : Sun May 17 18:46:55 2015 Checksum : a05f59a0 - correct Events : 22 Device Role : Active device 0 Array State : A ('A' == active, '.' == missing)
See above, all clean!
So at this point I realized that I only had few options:
- hope that Synology would help me fix it
- try and fix it myself using arcane mdadm commands to recreate the array
- get a spare disk and copy my data to it before formatting the disk, re-creating the shares and putting the data back (booooringgggggg)
To be on the safe side, I saved a copy of the output for each command so that I had at least the initial state of the array. To be honest at this point I didn’t dare go further as I didn’t know what re-creating the raid array could do to my data if I did something wrong (which I probably would have :p).
Fortunately for me, my NAS is still supported and Synology fixed the issue for me (they connected remotely through SSH). I insisted to get the commands they used and here’s what they gave me:
> mdadm -Cf /dev/md3 -e1.2 -n1 -l1 /dev/sde3 -u2143565c:345a0478:e33ac874:445e6e7b > e2fsck -pvf -C0 /dev/md3
As you can see above, they’ve used mdadm to re-create the array, specifying the same options as those used to initially create it:
- force creation: -Cf
- the 1.2 RAID metatada (superblock) style: -e1.2
- the number of devices (1): -n1
- the RAID level (1): -l1
- the device id: /dev/sde3
- the UUID of the array to create (the same as the one that existed before!): -u2143565c….
The second command simply runs a file system check that repairs any errors automatically.
And tadaaaa, problem solved. Thanks Synology! :)
As a sidenote, here are some useful commands:
# Stop all NAS services except from SSH > syno_poweroff_task -d # Unmount a volume > umount /volume2 # Get detailed information about a given volume > udevadm info --query=all --name=/dev/mapper/vol2-origin P: /devices/virtual/block/dm-4 N: dm-4 E: DEVNAME=/dev/dm-4 E: DEVPATH=/devices/virtual/block/dm-4 E: DEVTYPE=disk E: ID_FS_LABEL=1.42.6-3211 E: ID_FS_LABEL_ENC=1.42.6-3211 E: ID_FS_TYPE=ext4 E: ID_FS_USAGE=filesystem E: ID_FS_UUID=19ff9f2b-2811-4941-914b-ef8ea3699d33 E: ID_FS_UUID_ENC=19ff9f2b-2811-4941-914b-ef8ea3699d33 E: ID_FS_VERSION=1.0 E: MAJOR=253 E: MINOR=4 E: SUBSYSTEM=block E: SYNO_DEV_DISKPORTTYPE=UNKNOWN E: SYNO_KERNEL_VERSION=3.10 E: SYNO_PLATFORM=cedarview E: USEC_INITIALIZED=395934
That’s it for today, time to play with Docker on my Synology NAS!
Sebastien, thanks for the howto.
One comment: I had to stop my array first with mdadm –stop /dev/md? before executing the mdadm -Cf
Just recovered mine a few minutes ago :)
Comment by billy — 2015-09-29 @ 20:09
Just had this same issue.
DS212J with a single 3TB disk. Sector errors has moved /dev/md2 into an E state.
Had a backup so no data loss.
Deleted the files that were on the duff sectors, recovered from backups.
These commands helped me no end to reset the RAID 1 array.
Comment by Neil — 2015-11-03 @ 00:12
well I cheated and added a drive as “hot spare” hoping it would trigger the repair operation
Comment by Paul — 2017-07-27 @ 13:40
Thanks mate, really helpful as my Synology just did that over night.
All steps were:
1- syno_poweroff_task -d
2- mdadm –stop /dev/mdX
3- mdadm -Cf /dev/…
4- the e2fck command didn’t work (error about bad magic number)
5- reboot now
And my Syno just came back without any issue.
Comment by Tbag — 2018-01-28 @ 17:31
I had the same issue as Tbag above. fsck command failed because of logical volumes IMO.
But a reboot after the mdadm -Cf command works.
Now I need to find a replacement drive ASAP.
Comment by Peter — 2018-02-23 @ 20:21
thankyou soso much
this helped me so much
one thing I changed to get it to work
1. umount /opt
2. umount /volume1
3. syno_poweroff_task -d
4. mdadm -–stop /dev/mdX
5. mdadm -Cf /dev/…
6. cat /proc/mdstat
6. reboot
stop needed —
volume1 is back E gone
nice one mate
Comment by jake — 2019-05-04 @ 12:59
I fixed it, too. Thank you!
Isn’t that a bug?
(mdadm -–stop /dev/md3, it’s my)
Comment by lucktu — 2019-09-16 @ 08:44
Hi All –
I really appreciate this thread – I have found it very valuable but I need some assistance.
I have a RAID 10, 6 drives, with Volume. Synology Support looked at the NAS and told me that drives #5 and #6 are bad, and they did not want to repair the volume.
How do I rebuild the RAID 10 array. I’ve followed the above, and have the following – ANY Advice & Guidance would be very appreciated.
mdadm –detail /dev/md2
/dev/md2:
Version : 1.2
Creation Time : Sun Dec 13 20:45:48 2020
Raid Level : raid10
Array Size : 11706589632 (11164.27 GiB 11987.55 GB)
Used Dev Size : 3902196544 (3721.42 GiB 3995.85 GB)
Raid Devices : 6
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Mon Jan 18 13:06:54 2021
State : clean, FAILED
Comment by Erik — 2021-01-18 @ 20:37
[…] started searching online for anyone else with similar issues and found a ton of dead ends and frankly, misinformation. Including some posts that claim Synology branded RAM is […]
Pingback by Almost Lost It All Again... - UcMadScientist.com — 2021-02-01 @ 03:19