[nmglug] LINUX Software RAID recovery event?

Tim Emerick timothyemerick at yahoo.com
Fri Jan 11 10:09:23 PST 2008


Have you looked at the individual disks themselves for hardware failures using smartctl?  I'm not sure if mdadm also looks at s.m.a.r.t. status or not.

----- Original Message ----
From: Aaron <eunichs at boim.com>
To: NMGLUG.org mailing list <nmglug at nmglug.org>
Cc: mike.barksdale at tautechnologies.com; nmlug nmlug <nmlug at nmlug.org>
Sent: Friday, January 11, 2008 9:02:33 AM
Subject: [nmglug] LINUX Software RAID recovery event?

I have some interesting log entries on my fileserver, which is running
software RAID5.

Here is my monitor command:

/sbin/mdadm --monitor --pid-file /var/run/mdadm/monitor.pid --daemonise
--scan --syslog

These are the suspect entries in daemon.log:

Jan  6 01:06:02 gromit mdadm: RebuildStarted event detected on md device
/dev/md0
Jan  6 01:16:02 gromit mdadm: Rebuild20 event detected on md device /dev/md0
Jan  6 01:26:02 gromit mdadm: Rebuild40 event detected on md device /dev/md0
Jan  6 01:35:02 gromit mdadm: Rebuild60 event detected on md device /dev/md0
Jan  6 01:45:02 gromit mdadm: Rebuild80 event detected on md device /dev/md0
Jan  6 01:55:07 gromit mountd[2775]: authenticated mount request from
penguin.localnet:949 for /media/backups (/media/backups)
Jan  6 01:56:03 gromit mdadm: RebuildFinished event detected on md
device /dev/md0

There were some console messages too, but I lost them.

Here is messages:

Jan  6 01:06:02 gromit kernel: md: syncing RAID array md0
Jan  6 01:06:02 gromit kernel: md: minimum _guaranteed_ reconstruction
speed: 1000 KB/sec/disc.
Jan  6 01:06:02 gromit kernel: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for reconstruction.
Jan  6 01:06:02 gromit kernel: md: using 128k window, over a total of
195318144 blocks.
Jan  6 01:20:34 gromit -- MARK --
Jan  6 01:40:35 gromit -- MARK --
Jan  6 01:56:02 gromit kernel: md: md0: sync done.
Jan  6 01:56:03 gromit kernel: RAID5 conf printout:
Jan  6 01:56:03 gromit kernel:  --- rd:4 wd:4 fd:0
Jan  6 01:56:03 gromit kernel:  disk 0, o:1, dev:sda2
Jan  6 01:56:03 gromit kernel:  disk 1, o:1, dev:sdb2
Jan  6 01:56:03 gromit kernel:  disk 2, o:1, dev:sdc2
Jan  6 01:56:03 gromit kernel:  disk 3, o:1, dev:sdd2


Things look OK now AFAIK:

gromit:/var/log# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sda2[0] sdd2[3] sdc2[2] sdb2[1]
      585954432 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
    
unused devices: <none>

Any advice about how to investigate what happened?
Is it possible that the RAID failed in some sense, but then re-build itself?
If so, is there a way to figure out if the failure was limited to one
disk...
so I can replace it as a preventative measure?

Let me know if there are other commands you would like to see to determine
the state of my RAID...  or pointers to documentation on usefull commands
for such an investigation.


                aaron

P.S.  This is interesting from the SW RAID HOWTO:


    6.4 Simulating data corruption

RAID (be it hardware- or software-), assumes that if a write to a disk
doesn't return an error, then the write was successful. Therefore, if
your disk corrupts data without returning an error, your data /will/
become corrupted. This is of course very unlikely to happen, but it is
possible, and it would result in a corrupt filesystem.

RAID cannot and is not supposed to guard against data corruption on the
media. Therefore, it doesn't make any sense either, to purposely corrupt
data (using |dd| for example) on a disk to see how the RAID system will
handle that. It is most likely (unless you corrupt the RAID superblock)
that the RAID layer will never find out about the corruption, but your
filesystem on the RAID device will be corrupted.

This is the way things are supposed to work. RAID is not a guarantee for
data integrity, it just allows you to keep your data if a disk dies
(that is, with RAID levels above or equal one, of course).


****** Would RAID6 be able to recover from 1 disk corrupted? ******

P.P.S: ----------------------------------

Here's some commands... they may help...

gromit:/var/log# mdadm --query /dev/md0
/dev/md0: 558.81GiB raid5 4 devices, 0 spares. Use mdadm --detail for
more detail.
gromit:/var/log# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Wed May 16 01:56:05 2007
    Raid Level : raid5
    Array Size : 585954432 (558.81 GiB 600.02 GB)
    Device Size : 195318144 (186.27 GiB 200.01 GB)
  Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Jan 11 08:17:20 2008
          State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
  Spare Devices : 0

        Layout : left-symmetric
    Chunk Size : 64K

          UUID : a0685bc3:1827f180:5decc0b6:e8019d73
        Events : 0.20

    Number  Major  Minor  RaidDevice State
      0      8        2        0      active sync  /dev/sda2
      1      8      18        1      active sync  /dev/sdb2
      2      8      34        2      active sync  /dev/sdc2
      3      8      50        3      active sync  /dev/sdd2
gromit:/var/log# mdadm --query /dev/sda2
/dev/sda2: is not an md array
/dev/sda2: device 0 in 4 device active raid5 /dev/md0.  Use mdadm
--examine for more detail.
gromit:/var/log# mdadm --query /dev/sdb2
/dev/sdb2: is not an md array
/dev/sdb2: device 1 in 4 device active raid5 /dev/md0.  Use mdadm
--examine for more detail.
gromit:/var/log# mdadm --query /dev/sdc2
/dev/sdc2: is not an md array
/dev/sdc2: device 2 in 4 device active raid5 /dev/md0.  Use mdadm
--examine for more detail.
gromit:/var/log# mdadm --query /dev/sdd2
/dev/sdd2: is not an md array
/dev/sdd2: device 3 in 4 device active raid5 /dev/md0.  Use mdadm
--examine for more detail.
gromit:/var/log# mdadm --examine /dev/sdd2
/dev/sdd2:
          Magic : a92b4efc
        Version : 00.90.00
          UUID : a0685bc3:1827f180:5decc0b6:e8019d73
  Creation Time : Wed May 16 01:56:05 2007
    Raid Level : raid5
    Device Size : 195318144 (186.27 GiB 200.01 GB)
    Array Size : 585954432 (558.81 GiB 600.02 GB)
  Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Fri Jan 11 08:17:20 2008
          State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
  Spare Devices : 0
      Checksum : 41218f0c - correct
        Events : 0.20

        Layout : left-symmetric
    Chunk Size : 64K

      Number  Major  Minor  RaidDevice State
this    3      8      50        3      active sync  /dev/sdd2

  0    0      8        2        0      active sync  /dev/sda2
  1    1      8      18        1      active sync  /dev/sdb2
  2    2      8      34        2      active sync  /dev/sdc2
  3    3      8      50        3      active sync  /dev/sdd2
gromit:/var/log# mdadm --examine /dev/sdc2
/dev/sdc2:
          Magic : a92b4efc
        Version : 00.90.00
          UUID : a0685bc3:1827f180:5decc0b6:e8019d73
  Creation Time : Wed May 16 01:56:05 2007
    Raid Level : raid5
    Device Size : 195318144 (186.27 GiB 200.01 GB)
    Array Size : 585954432 (558.81 GiB 600.02 GB)
  Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Fri Jan 11 08:17:20 2008
          State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
  Spare Devices : 0
      Checksum : 41218efa - correct
        Events : 0.20

        Layout : left-symmetric
    Chunk Size : 64K

      Number  Major  Minor  RaidDevice State
this    2      8      34        2      active sync  /dev/sdc2

  0    0      8        2        0      active sync  /dev/sda2
  1    1      8      18        1      active sync  /dev/sdb2
  2    2      8      34        2      active sync  /dev/sdc2
  3    3      8      50        3      active sync  /dev/sdd2
gromit:/var/log# mdadm --examine /dev/sdb2
/dev/sdb2:
          Magic : a92b4efc
        Version : 00.90.00
          UUID : a0685bc3:1827f180:5decc0b6:e8019d73
  Creation Time : Wed May 16 01:56:05 2007
    Raid Level : raid5
    Device Size : 195318144 (186.27 GiB 200.01 GB)
    Array Size : 585954432 (558.81 GiB 600.02 GB)
  Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Fri Jan 11 08:17:20 2008
          State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
  Spare Devices : 0
      Checksum : 41218ee8 - correct
        Events : 0.20

        Layout : left-symmetric
    Chunk Size : 64K

      Number  Major  Minor  RaidDevice State
this    1      8      18        1      active sync  /dev/sdb2

  0    0      8        2        0      active sync  /dev/sda2
  1    1      8      18        1      active sync  /dev/sdb2
  2    2      8      34        2      active sync  /dev/sdc2
  3    3      8      50        3      active sync  /dev/sdd2
gromit:/var/log# mdadm --examine /dev/sda2
/dev/sda2:
          Magic : a92b4efc
        Version : 00.90.00
          UUID : a0685bc3:1827f180:5decc0b6:e8019d73
  Creation Time : Wed May 16 01:56:05 2007
    Raid Level : raid5
    Device Size : 195318144 (186.27 GiB 200.01 GB)
    Array Size : 585954432 (558.81 GiB 600.02 GB)
  Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Fri Jan 11 08:17:20 2008
          State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
  Spare Devices : 0
      Checksum : 41218ed6 - correct
        Events : 0.20

        Layout : left-symmetric
    Chunk Size : 64K

      Number  Major  Minor  RaidDevice State
this    0      8        2        0      active sync  /dev/sda2

  0    0      8        2        0      active sync  /dev/sda2
  1    1      8      18        1      active sync  /dev/sdb2
  2    2      8      34        2      active sync  /dev/sdc2
  3    3      8      50        3      active sync  /dev/sdd2
gromit:/var/log#



_______________________________________________
nmglug mailing list
nmglug at nmglug.org
https://nmglug.org/mailman/listinfo/nmglug



More information about the nmglug mailing list