[nmglug] LINUX Software RAID recovery event?
Tim Emerick
timothyemerick at yahoo.com
Fri Jan 11 10:09:23 PST 2008
Have you looked at the individual disks themselves for hardware failures using smartctl? I'm not sure if mdadm also looks at s.m.a.r.t. status or not.
----- Original Message ----
From: Aaron <eunichs at boim.com>
To: NMGLUG.org mailing list <nmglug at nmglug.org>
Cc: mike.barksdale at tautechnologies.com; nmlug nmlug <nmlug at nmlug.org>
Sent: Friday, January 11, 2008 9:02:33 AM
Subject: [nmglug] LINUX Software RAID recovery event?
I have some interesting log entries on my fileserver, which is running
software RAID5.
Here is my monitor command:
/sbin/mdadm --monitor --pid-file /var/run/mdadm/monitor.pid --daemonise
--scan --syslog
These are the suspect entries in daemon.log:
Jan 6 01:06:02 gromit mdadm: RebuildStarted event detected on md device
/dev/md0
Jan 6 01:16:02 gromit mdadm: Rebuild20 event detected on md device /dev/md0
Jan 6 01:26:02 gromit mdadm: Rebuild40 event detected on md device /dev/md0
Jan 6 01:35:02 gromit mdadm: Rebuild60 event detected on md device /dev/md0
Jan 6 01:45:02 gromit mdadm: Rebuild80 event detected on md device /dev/md0
Jan 6 01:55:07 gromit mountd[2775]: authenticated mount request from
penguin.localnet:949 for /media/backups (/media/backups)
Jan 6 01:56:03 gromit mdadm: RebuildFinished event detected on md
device /dev/md0
There were some console messages too, but I lost them.
Here is messages:
Jan 6 01:06:02 gromit kernel: md: syncing RAID array md0
Jan 6 01:06:02 gromit kernel: md: minimum _guaranteed_ reconstruction
speed: 1000 KB/sec/disc.
Jan 6 01:06:02 gromit kernel: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for reconstruction.
Jan 6 01:06:02 gromit kernel: md: using 128k window, over a total of
195318144 blocks.
Jan 6 01:20:34 gromit -- MARK --
Jan 6 01:40:35 gromit -- MARK --
Jan 6 01:56:02 gromit kernel: md: md0: sync done.
Jan 6 01:56:03 gromit kernel: RAID5 conf printout:
Jan 6 01:56:03 gromit kernel: --- rd:4 wd:4 fd:0
Jan 6 01:56:03 gromit kernel: disk 0, o:1, dev:sda2
Jan 6 01:56:03 gromit kernel: disk 1, o:1, dev:sdb2
Jan 6 01:56:03 gromit kernel: disk 2, o:1, dev:sdc2
Jan 6 01:56:03 gromit kernel: disk 3, o:1, dev:sdd2
Things look OK now AFAIK:
gromit:/var/log# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sda2[0] sdd2[3] sdc2[2] sdb2[1]
585954432 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
unused devices: <none>
Any advice about how to investigate what happened?
Is it possible that the RAID failed in some sense, but then re-build itself?
If so, is there a way to figure out if the failure was limited to one
disk...
so I can replace it as a preventative measure?
Let me know if there are other commands you would like to see to determine
the state of my RAID... or pointers to documentation on usefull commands
for such an investigation.
aaron
P.S. This is interesting from the SW RAID HOWTO:
6.4 Simulating data corruption
RAID (be it hardware- or software-), assumes that if a write to a disk
doesn't return an error, then the write was successful. Therefore, if
your disk corrupts data without returning an error, your data /will/
become corrupted. This is of course very unlikely to happen, but it is
possible, and it would result in a corrupt filesystem.
RAID cannot and is not supposed to guard against data corruption on the
media. Therefore, it doesn't make any sense either, to purposely corrupt
data (using |dd| for example) on a disk to see how the RAID system will
handle that. It is most likely (unless you corrupt the RAID superblock)
that the RAID layer will never find out about the corruption, but your
filesystem on the RAID device will be corrupted.
This is the way things are supposed to work. RAID is not a guarantee for
data integrity, it just allows you to keep your data if a disk dies
(that is, with RAID levels above or equal one, of course).
****** Would RAID6 be able to recover from 1 disk corrupted? ******
P.P.S: ----------------------------------
Here's some commands... they may help...
gromit:/var/log# mdadm --query /dev/md0
/dev/md0: 558.81GiB raid5 4 devices, 0 spares. Use mdadm --detail for
more detail.
gromit:/var/log# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Wed May 16 01:56:05 2007
Raid Level : raid5
Array Size : 585954432 (558.81 GiB 600.02 GB)
Device Size : 195318144 (186.27 GiB 200.01 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Fri Jan 11 08:17:20 2008
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : a0685bc3:1827f180:5decc0b6:e8019d73
Events : 0.20
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 8 18 1 active sync /dev/sdb2
2 8 34 2 active sync /dev/sdc2
3 8 50 3 active sync /dev/sdd2
gromit:/var/log# mdadm --query /dev/sda2
/dev/sda2: is not an md array
/dev/sda2: device 0 in 4 device active raid5 /dev/md0. Use mdadm
--examine for more detail.
gromit:/var/log# mdadm --query /dev/sdb2
/dev/sdb2: is not an md array
/dev/sdb2: device 1 in 4 device active raid5 /dev/md0. Use mdadm
--examine for more detail.
gromit:/var/log# mdadm --query /dev/sdc2
/dev/sdc2: is not an md array
/dev/sdc2: device 2 in 4 device active raid5 /dev/md0. Use mdadm
--examine for more detail.
gromit:/var/log# mdadm --query /dev/sdd2
/dev/sdd2: is not an md array
/dev/sdd2: device 3 in 4 device active raid5 /dev/md0. Use mdadm
--examine for more detail.
gromit:/var/log# mdadm --examine /dev/sdd2
/dev/sdd2:
Magic : a92b4efc
Version : 00.90.00
UUID : a0685bc3:1827f180:5decc0b6:e8019d73
Creation Time : Wed May 16 01:56:05 2007
Raid Level : raid5
Device Size : 195318144 (186.27 GiB 200.01 GB)
Array Size : 585954432 (558.81 GiB 600.02 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Fri Jan 11 08:17:20 2008
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : 41218f0c - correct
Events : 0.20
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 8 50 3 active sync /dev/sdd2
0 0 8 2 0 active sync /dev/sda2
1 1 8 18 1 active sync /dev/sdb2
2 2 8 34 2 active sync /dev/sdc2
3 3 8 50 3 active sync /dev/sdd2
gromit:/var/log# mdadm --examine /dev/sdc2
/dev/sdc2:
Magic : a92b4efc
Version : 00.90.00
UUID : a0685bc3:1827f180:5decc0b6:e8019d73
Creation Time : Wed May 16 01:56:05 2007
Raid Level : raid5
Device Size : 195318144 (186.27 GiB 200.01 GB)
Array Size : 585954432 (558.81 GiB 600.02 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Fri Jan 11 08:17:20 2008
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : 41218efa - correct
Events : 0.20
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 2 8 34 2 active sync /dev/sdc2
0 0 8 2 0 active sync /dev/sda2
1 1 8 18 1 active sync /dev/sdb2
2 2 8 34 2 active sync /dev/sdc2
3 3 8 50 3 active sync /dev/sdd2
gromit:/var/log# mdadm --examine /dev/sdb2
/dev/sdb2:
Magic : a92b4efc
Version : 00.90.00
UUID : a0685bc3:1827f180:5decc0b6:e8019d73
Creation Time : Wed May 16 01:56:05 2007
Raid Level : raid5
Device Size : 195318144 (186.27 GiB 200.01 GB)
Array Size : 585954432 (558.81 GiB 600.02 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Fri Jan 11 08:17:20 2008
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : 41218ee8 - correct
Events : 0.20
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 8 18 1 active sync /dev/sdb2
0 0 8 2 0 active sync /dev/sda2
1 1 8 18 1 active sync /dev/sdb2
2 2 8 34 2 active sync /dev/sdc2
3 3 8 50 3 active sync /dev/sdd2
gromit:/var/log# mdadm --examine /dev/sda2
/dev/sda2:
Magic : a92b4efc
Version : 00.90.00
UUID : a0685bc3:1827f180:5decc0b6:e8019d73
Creation Time : Wed May 16 01:56:05 2007
Raid Level : raid5
Device Size : 195318144 (186.27 GiB 200.01 GB)
Array Size : 585954432 (558.81 GiB 600.02 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Fri Jan 11 08:17:20 2008
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : 41218ed6 - correct
Events : 0.20
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 8 2 0 active sync /dev/sda2
0 0 8 2 0 active sync /dev/sda2
1 1 8 18 1 active sync /dev/sdb2
2 2 8 34 2 active sync /dev/sdc2
3 3 8 50 3 active sync /dev/sdd2
gromit:/var/log#
_______________________________________________
nmglug mailing list
nmglug at nmglug.org
https://nmglug.org/mailman/listinfo/nmglug
More information about the nmglug
mailing list