Thu 04 November 2010
By bcl
In Blog .
tags: Fedora Linux RAID Backup
Failed hard drives are inevitable. Especially when the drive in
question was manufactured on November 27, 2001. You know the time
has come to replace it when your log files start filling up with
errors like this:
Oct 28 03:53:05 cat kernel: res 51/40:00:fc:33:4e/00:00:00:00:00/e0 Emask 0x9 (media error)
Oct 29 16:06:46 cat smartd[24427]: Device: /dev/sdb [SAT], FAILED SMART self-check. BACK UP DATA NOW!
Failure is inescapable. Everything fails eventually, computers, people,
electronics. This is the only constant in life. It is only a question of when.
In my case this 40GB drive had served me well in multiple computers and as part
of a RAID5 array for my Linux Journal article . In its final installation
it was part of a 2 disk RAID1 in cat, my webserver. cat runs Fedora 13 and a
minimal set of software for serving up my webpages, including this blog. cat
was built using spare parts, its job isn't hard and space requirements aren't
large. Good logging and reporting are important, they help you anticipate the
impending doom. On my systems I am running the smartd daemon to monitor drive
health as well as epylog to parse all my logfiles and email me nightly results.
Cat was setup running Fedora 13 on 2 drives with 3 partitions. /boot , /
and swap. / was setup as a 2 disk RAID1 and /boot was actually
/boot and /boot2 because at the time I was unsure if grub could boot
from a RAID (yes, it can, and that's another post entirely). The partitioning
looked like this:
[root@cat ~]# parted -l
Model: ATA Maxtor 5T040H4 (scsi)
Disk /dev/sda: 41.0GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 1049kB 525MB 524MB ext4 boot
2 525MB 2622MB 2097MB linux-swap(v1)
3 2622MB 41.0GB 38.4GB raid
When the errors showed up I jumped over to Amazon Prime and found a pretty good
deal on a pair of Seagate 500 GB Drives .
I had them the next day, but didn't have time to start the process of swapping
them in and expanding the storage. Instead I removed the failing drive from the
array using mdadm --manage --set-faulty /dev/md0 /dev/sdb3 , as well as
removing the references to it's /boot partition in /etc/fstab . I have
good nightly backups of the system and smartctl was reporting that the
remaining drive was running fine. The system is pretty much read-only so
nightly backups were sufficient to provide a good restore point in case the
final drive failed.
The replacement plan was to hook up the 2 new drives, which
use SATA instead of IDE, add them to the existing array and let
mdraid sync the data over from the old drive. At that point I would
have 3 drives in the array, all with 40G partitions. I would then
remove the old drive and grow the filesystem on the new drives to
take up all 500GB. Sometimes plans actually do work. The old drives
were EIDE and I had 2 SATA ports on the motherboard -- confirmed by
using dmidecode to grab the motherboard's model number to look
it up online. The only glitch there was that I had to enable the
SATA controller in BIOS before the drives were recognized. I used
parted to partition the drives into 3 partitions. They look
like this when finished:
[root@cat ~]# parted /dev/sda print
Model: ATA ST3500418AS (scsi)
Disk /dev/sda: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1049kB 1000MB 999MB primary ext4 boot, raid
2 1000MB 2000MB 999MB primary linux-swap(v1)
3 2000MB 500GB 498GB primary raid
Don't forget to set the boot flag on the /boot partition on
both drives. You never can tell when the BIOS might decide to
boot the other one, and if one fails you want the other to still be
bootable. GRUB can boot from a RAID partition as long as it is a
filesystem it supports, like ext2,3,4 and as long as mdraid
metadata v1,0 or earlier is used. This is because the metadata is
written to the end of the partition so grub never sees it. In v1.1
and later the RAID metadata is written to the start of the
partition and grub cannot find the filesystem. I setup /boot as
a 2 disk RAID1 like this:
mdadm --create --verbose /dev/md1 --level=raid1 --raid-devices=2 --metadata=1.0 /dev/sdb1 /dev/sdc1
I then copied over the /boot partition from the existing
system:
mkfs.ext4 /dev/md1
mount /dev/md1 /mnt
rsync -avc /boot /mnt
umount /mnt
Next is adding the new large partitions to the existing array. I
physically removed the failed drive so that it couldn't cause any
problems and added the new partitions like so:
[root@cat ~]# mdadm --manage /dev/md0 --add /dev/sdb3
mdadm: added /dev/sdb3
[root@cat ~]# mdadm --manage /dev/md0 --add /dev/sdc3
mdadm: added /dev/sdc3
mdraid immediately begins to sync the data from the 40GB drive over
to one of the new drives. Since it is a 2 drive array it leaves the
other partition as a spare. There is no need to create a filesystem
on the new partitions because they are being written with the data
from the old drive, which includes the filesystem. /proc/mdstat
looked like this during the sync:
[root@cat ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdc1[1] sdb1[0]
975860 blocks super 1.0 [2/2] [UU]
md0 : active raid1 sdc3[3](S) sdb3[2] sda3[0]
37458876 blocks super 1.1 [2/1] [U_]
[>....................] recovery = 0.6% (247296/37458876) finish=22.5min speed=27477K/sec
bitmap: 1/1 pages [4KB], 65536KB chunk
unused devices:
When that sync is finished I then manually failed the old 40GB drive - mdadm
--manage /dev/md0 --fail /dev/sda3 and waited for the data to be synched to
the other new drive and then removed the old drive from the array with this:
mdadm --manage /dev/md0 --remove /dev/sda3 . At this point I now have 40GB
of a 498GB partition being used. It would have worked just fine like that, but
it does seem like such a waste so I want to resize it. But first I made sure I
could boot the system with just the 2 new drives and their RAID1 /boot
partition.
That's when I goofed.
I had grabbed the UUID values (unique values that help Linux find the right
partition to mount) and updated my /etc/fstab with the new values. I also
updated the swap entries with their new UUID values (printed when you run
mkswap). You can always see the UUID of a partition by running blkid
/dev/sdX or blkid /dev/md0 . We used to refer to the drives in
/etc/fstab using their device names, like /dev/sda1, but changes in how
drives are mounted means that they may not always get the same letter
assignment. The UUID is unique and tied to the filesystem so you are guaranteed
to always get what you expect. No more nasty surprises when you plug in a USB
drive and reboot only to find the BIOS changed the drive order on you.
Oh, back to the goof. Well, in my excitement to see if GRUB really would boot
the RAID1 /boot partition I had neglected to actually write GRUB to the MBR
of the new drives. This caused the system to, well, not boot. The fix was
simple, slap the old 40GB drive in, use its MBR to boot and then write GRUB
using grub:
[root@cat ~]# grub
root (hd0,0)
setup (hd1)
root (hd0,0)
setup (hd2)
The root line should match what is in /etc/grub.conf and the setup (hdX) tells
it to write to that drive, which may be a different number when booting without
the old drive installed.
Next is resizing things. You need to resize the RAID container and then resize
the filesystem. The first time I tried this I ran into the Bitmap must be
removed before size can be changed error which sounds a bit ominous when you
aren't expecting it. What it means is that the bitmap that the array uses to
track what has been synced needs to be removed. It isn't big enough for the new
size anyway. To do that you run mdadm --grow /dev/md0 --bitmap none which
allows you to then actually grow it - mdadm --grow /dev/md0 --size max .
This will take a while. How long it takes depends on, things like drive
controller speed, CPU speed, drive speed and who knows what else. In my case it
took about 3 hours. You can monitor the progress by watching /proc/mdstat
using watch -n 20 cat /proc/mdstat .
When that is finished you want to add the bitmap back to the array, which is
done by running mdadm --grow /dev/md0 --bitmap internal . Now we are ready
to resize the filesystem. Back in the old days (cough) you had to reboot into a
rescue disk and run things like this on an unmounted filesystem. Those days are
long gone. We just need to run resize2fs /dev/md0 and sit back and watch it
grow. You can monitor with all the normal filesystem utilities. It shows the
new size in realtime - df -h .
The last step, as it should be with any filesystem changes, is to run a
filesystem check. touch /forcefsck and reboot and it will be handled at
boot time.
I have to thank the many resources found via google, but especially this howto
forge article on replacing disks in a RAID1 array , and the
kernel.org wiki entry on Growing a RAID .
(note: this is what worked for me, in my setup, yours will be
different and this information may or may not work for you. Make
sure you have good backups before doing anything with your
filesystems).
UPDATE: I think my original title was dumb. I've changed it.
There are comments .