[nmglug] sfotware raid more fragile than plain partitions?!??

Wed May 16 08:38:57 PDT 2007

Hi,

On Wed, May 16, 2007 at 05:57:07AM -0600, Aaron wrote:
> Is it possible that LINUX software RAID (1,4,5,6) is actually a LOT
> less reliable than a plain old partition, or perhaps even LVM?

I use software RAID a lot, and redundant RAID is definitely more
reliable than a single partition.  Compared with LVM, it depends
on what the underlying physical volumes (PVs) are (if the
underlying PVs are disks or partitions, then redundant RAID is
better; if the underlying PVs are redundant RAID block devices, then
RAID without LVM might actually be better, it depends upon ones
experience working with LVM).

> I have had some corrupted 'dirty' RAID volumes, TWO at a time...
> enough to kill a RAID5.

I don't understand this.  I seem to recall that a RAID volume was
labeled "dirty" with older mdadm (look at "State" in mdadm -D
output) if it was simply in use, something like that.  If it was
"dirty" and a hard shutdown occurred, a resync of the RAID would
have to occur (for RAID 5 and maybe RAID 1, two levels I have used a
lot).  Now it's true that if you lose two components (disks,
partitions, RAID block devices, whatever) of a software RAID 5 array
without a hot spare, that RAID 5 array is likely toast, and one will
have to carefully work to repair the array or recover data from the
array.  Are you making your software RAID 5 array out of software
RAID block devices?  I don't understand that statement.

> Journaling filesystems do a pretty good job of limiting damage
> from dirty shutdowns.
> I am afraid that software RAID (mdadm) may not have this property.

It might have to resync, that's all.  A journaling file system
protects the file system structure, and mdadm protects from disk
failure.

> Is it possible that a simple trip over the power cord could cause
> MULTIPLE parts of a RAID to become 'non-fresh', causing
> one to loose the entire RAID?   Even if a fsck on the overlying
> filesystems might recover most if not all of the data?

Not that I know of, unless there is underlying hardware (disk)
failure, or unless a mistake is made using mdadm at the command
line.

> How much might I have to spend on a hardware RAID to reduce
> the likelihood of this problem?
> How many hours might I spend learning about detailed operations
> of hardware RAID to recover from such a partial corruption?
> 
> I am really questioning weather or not RAID makes sense at all.
> Perhaps I should stick to asynchronous mirroring for robustness,

When used properly, software or hardware RAID makes great sense in
many situations.

Regards,
-- 
Gary Sandine <gars at laclinux.com>