A few (ahum) years ago I wrote an article for Linux Journal on building a
RAID system. While that exact system no longer exists, I do still have a RAID5
setup that I use with BackupPC to backup
all the systems on my LAN. As I wrote about in my KVM article, I have updated
my main Linux box to Fedora11. It had been out of backup rotation for about a
year, since I have mostly been using my Mac Mini and everything on the Linux
box was checked out of a remote Subversion
repository. I wanted to archive the old system's backup and add it to the
backup rotation again.
In all my years of using BackupPC I had somehow
missed the archive feature. I've used it to recover files by
writing them to a /tmp/ directory on a remote system or download a
tar of selected files but hadn't realized that you could also
create a gigantic tar of all the files in the current backup. To
get this setup I had to do several things:
- Add a new dns alias for the system to write the archive to. I
use the same system that BackupPC runs on for this.
- Add a new host in the 'Edit Hosts' page, I named it the same
thing as the new DNS alias
- Edit the new host's config and in the 'Xfer' page set
'XferMethod' to Archive instead of rsync
- Change the 'ArchiveSplit' option to 1000 to split the tar into
1G files to make it easy to handle
And presto! I could now dump archives of the backups to the local system and
then burn them to DVD. I also wanted to include a directory of all the files
along size the archive. Since the tar is actually split up into pieces you need
to join them together in order to get a full listing out of them. Since tar was
written to be used with streaming tapes this means all you need to do is cat
them to a tar process reading from standard input and write the output into a
file. Like this:
cat host.tar.bz2.a? | tar tvjf - > ./directory.txt
This streams all the archive files to tar which is reading from standard input
and writes the output to the directory.txt file. This can take quite a while.
So at this point in the day I finally had the old system image written to a
couple of DVD's. Now it was time to switch the backup back on and catch up with
the current system image. I added a few new directories to the list to backup
(I usually only backup /etc, /root and /home). This included my new libvirt
virtual images. In all it amount to about 96G worth of files. The LAN
bottleneck is the 100Mb NIC in the backup system. It was pushing around 45Mb
for several hours, chugging its way through the backup. Then something strange
happened.
The backup server turned off. No warning, just click. nothing. I rebooted it,
it ran its filesystem checks with no problems. I dug through the logfiles and
there was nothing in them to indicate a problem of any kind. So I restarted the
backup and it ran for about another 30 minutes before doing the same thing.
This system never dies on me, or at least ever since I put in an Antec 500W
power supply it doesn't just die.
I started with the obvious, checking for bus errors in the logs. I ran
memtest86 on it for a bit. Then I took a look at the BIOS health readings. Even
after being relativly idle for 15 minutes the CPU temp was at 63C. Now, this
system is a 2.9GHz Celeron D. The max temp is somewhere around 67C. So I was
probably baking the heck out of the CPU and it was doing a thermal shutdown.
Consumer CPUs like the Celeron just aren't designed for this kind of abuse. But
that never stops me from trying to squeeze every last cent out of a system.
The heatsink had what I'd call a moderate amount of dust on it, but it was
mostly on the top not crammed down in the fins like I have sometimes seen. I
pulled it odd and the CPU was glued to it with heatsink grease. I blew out the
dust (canned air is so much fun!), cleaned things off, gave it some new grease
and re-installed.
I fired up the backup and again after a short period of time it died. I finally
setup the sensors package on the system and it told the story -- it was still
overheating. The fan was only running at about 2.7k rpms so I swapped in a
spare Tornado fan, cranked it up to its maximum of 5300 rpm and restarted the
backup. The CPU now maxes out around 57C and the backups all run to completion
so things seem to be happy.
This also reminds me that I really need to blow the dust out of the heatsinks
in the other systems around here -- I don't think I've done that in over a
year. I really should have a regular maintenance schedule instead of waiting
for failures to happen. I guess I need the extra excitement or something.
There are comments.