I have just had an instructive exercise in disaster recovery involving Amanda.
I have been using amanda to back up, first to tape, now to virtual tape (hard drive), for years, and offsite backups for several years. However, I was running out of room on my main backup drive, a USB external drive. So I ordered three new one terabyte Seagate FreeAgent GoFlex drives, one for my main backup drive, and the other two for offsite use.
I like the drives. Being 2½" drives, they are powered from the host USB port. So no external power supply. They also fit in one's pocket, which makes it easy to transport them to one's offsite location. I also like the fact that they are SATA drives with adapters for various other interfaces, in this case USB. Should I ever want a faster interface, I can get an adapter cable (assuming Seagate still offers them…).
So far, so good. Unfortunately the main backup drive started going south five days ago. I tried various tricks to get it to straighten out and fly right, but to no avail. Today I finally gave up on it. It died during this morning's backups, leaving 12 GB on the holding disk. I have an unused 30 GB partition on each of the three new drives; I couldn't even fsck that.
I put the old main drive back in service. This involved creating a symlink called /media/backs to point to the real backup drive. That done, the actual drive in service is transparent to the offsite backup script. That completes the process I had begun with the symlink /media/offsite for the two (now four) offsite drives.
I had put the new main backup drive into service only a few days ago. So I had several days worth of backups on the offsite drives that weren't on the old main backup drive. I was able to copy those from the current offsite drive, e.g.:
cd /media/backs/amanda/DailySet1
cp -rp /media/offsite/myob/amanda/DailySet1/info .
cp -rp /media/offsite/myob/amanda/DailySet1/data .
for i in $(seq 1 4) ; do echo $i ; rm slot$i/* ; done
for i in $(seq 1 4) ; do echo $i ; cp -rp /media/offsite/myob/amanda/DailySet1/slot${i}/* slot${i} ; done
That done, several checks indicate that things are OK, including running amverify and amcheck. Now for the acid test: can I run amflush successfully?
This morning's backups were to slot 5. I should have checked: the tape status file, tapelist, had slot 5 marked as used before I ran amflush. I should have marked it as unused and recycled it. So for one complete dump cycle slot 5 will have old useless data in it. No great loss.
This morning's backup failed while writing a total backup to the virtual tape. Was all of that saved and successfully copied from the holding disk to the replacement virtual tape? The amflush report indicates that all of the dumps were successfully sent to tape, and amrecover was able to pull the complete dump. So, yes, amanda is robust enough to have its storage medium go south in the middle of a long backup and recover successfully.
If the holding area had also been on the backup disk, I would have lost this morning's backups entirely. In this case, that would be no great loss. I could have run amrmtape on the "tape" holding the failed backup, and amanda would have recovered. But this preserves a day's backups, adding to amanda's robustness.
Conclusions: