RAID

Using RAID to Escape Disaster

Failed hard drives are inevitable. Especially when the drive in question was manufactured on November 27, 2001. You know the time has come to replace it when your log files start filling up with errors like this:

Oct 28 03:53:05 cat kernel:         res 51/40:00:fc:33:4e/00:00:00:00:00/e0 Emask 0x9 (media error)
Oct 29 16:06:46 cat smartd[24427]: Device: /dev/sdb [SAT], FAILED SMART self-check. BACK UP DATA NOW!

Failure is inescapable. Everything fails eventually, computers, people, electronics. This is the only constant in life. It is only a question of when. In my case this 40GB drive had served me well in multiple computers and as part of a RAID5 array for my Linux Journal article. In its final installation it was part of a 2 disk RAID1 in cat, my webserver. cat runs Fedora 13 and a minimal set of software for serving up my webpages, including this blog. cat was built using spare parts, its job isn't hard and space requirements aren't large. Good logging and reporting are important, they help you anticipate the impending doom. On my systems I am running the smartd daemon to monitor drive health as well as epylog to parse all my logfiles and email me nightly results.