Microsoft Exchange 2003 - Disaster Horror Stories
Introduction to Exchange 2003 Disaster Recovery Horror Stories
One of life's truisms is, those who do not learn from the lessons of history, are destined to repeat the mistakes of the past. So, you may smile at the disasters of others, but do remember to put your own Exchange 2003 server in order.
Exchange 2003 Disaster Horror Stories
There are only two types of administrator. Those, like me, who have lost Exchange data, or those who are going to experience disaster recovery - probably very soon.
The two lessons from this disaster are: never position the server room on the top floor, and always employ reliable workmen. It all started when a holiday company had a small problem with a leaking pipe in their roof space. Unfortunately, the emergency plumber turned out to be an out-of-work car salesman. Later this character would appear on a TV. programs called, ' Police trap rogue tradesmen on video '.
Why the plumber drilled a hole in the ceiling directly above the Exchange server, was the most baffling part of the case. The balance of opinion favoured incompetence rather than malice. Be that as it may, he bodged a simple stop cock repair. As a result, in the middle of the night, water came cascading out of the header tank. Nature took its course and the water found the hole in the ceiling and poured onto the Exchange server beneath. As you can imagine, water did the mother-board no good at all, and the server was soon useless.
Incidentally, the cost of all that damage would have paid for a cluster server and a SAN storage unit.
Here is case with a double dose of bad luck. This particular online-order company had a wonderful disaster recovery plan. Moreover they tested it every 6 months and it always restored perfectly. The details included a hot standby server and the backup tapes were stored offsite in the deputy managers safe. Then one night came a thunderbolt. Lightning literally struck their building and everything including the computer room was destroyed in the ensuing fire.
After the initial shock subsided, everyone took up their disaster recovery roles, just as they had practiced. Insurance took care of the recovery costs, and in no time at all, the new servers arrived along with a mobile building and generator. The deputy manager despatched a courier to collect the backup tapes that his wife got from the safe.
Precisely what happened on that icy road will probably never be known, but the result was the poor courier slid under a lorry. Fortunately, the courier was not seriously hurt. Not so the tapes, they were flattened under the lorry's wheels.
The master recovery plan even had a section, which covered failure of the first backup tape, but it had never been tested. Of course, they had other tapes but they were over two days old, and for a variety of reasons, they could not recover the intervening transactions using backup. Eventually they had a team in India key in all the orders manually.
Here is a free tool to monitor your Exchange Server. Download and install the utility, then inspect your mail queues, monitor the Exchange server's memory, confirm there is enough disk space and check the CPU utilization.
This is the real deal - there is no catch. SolarWinds provides this fully-functioning freebie, as part of their commitment to supporting the network management community.
Many disasters have their root cause in human relationship problems. The manager ordered a box of backup tapes and showed the timid backup operator how to load the tape into the DAT drive. On Monday, backup performed perfectly. However, on Tuesday the operator could not remove Monday's tape. So, being shy but resourceful, she unscrewed the Tippex bottle and painted over Monday's date and then wrote Tuesday's date on the label.
This unauthorized procedure went on for months, with the layer of Tippex getting thicker and thicker. You can guess what happened next, the boss wanted to recover a mailstore from backup. Unfortunately, all that was on the tape was yesterday's differential backup. When he looked in the box, he saw 11 tapes still in their cellophane rappers. As for the last full backup, there was none. Woops. The sad point of that case was that each party blamed the other, and could not admit their own part in the fiasco.
Do you find that when you have an audience and you tell a good yarn, someone comes back with another twist on your story? Well, following the above case, someone told me a tale about the 'Rambo' operator.
The Rambo operator is the complete opposite of the timid operator, so when the tape refused to eject, Rambo rolled up his sleeves and yanked out the tape, along with the tape drive unit, half the panel and the floppy drive. Well, at least with a Rambo operator you soon know when you have a problem.
Incidentally, if you have a backup horror story, then do write in and let me know.
The 9/11 twin towers disaster is the most awful sight that I ever seen on television. However, recovering the computer systems went off with hardly a glitch. As time moved on, my thoughts went back to another disaster, April 24 1993 in London. In this terrorist attack an IRA extremist exploded a bomb which destroyed most of the financial institutions in Bishopsgate. This English disaster killed only one person but it devastated the financial computer community. Everyone was walking around like a zombie because they had lost their data and did not know what to do next.
Rumours started doing the rounds that the banks had no backups. People said that customer services were phoning their clients and asking, ' Excuse me, but er.... how much money have in your current account?' People laughed at these stories and said, 'if only they had asked me, I would have told them 1 million'. Some believed this story because the banks next question was, 'Can you send us your last bank statement!'
Well, I told this story for a few years. Then one day I spoke to fellow from Digital. He listened politely then said, 'Guy I was an engineer on that Bishopsgate job, and you were near the truth'. 'Yes of course they had backups, the manager had a garage full of tapes. However, our problem was that there was no computer in the world capable of restoring the tapes, the tape drive were so ancient there was no compatible machine to be found anywhere.'
The moral of this story is always check your tape hardware when you buy a replacement server. As with many of my other stories, there is often a related horror story.
I was working for James Capel as a computer operator with a 24 hr shift pattern in 1993. The computers were situated in the basement. My colleagues experienced the bomb first hand and I had to go in the next morning. The back of the building had been moved 9cm by the blast. A skylight in the computer room was blown out and the Tandem computers, which were earthquake proof, were still processing, even though some were at 45 degree angles. The problem was the air conditioning that had been knocked out and the internal heat was buckling the disks. So the big red button was hit and power to the room was killed.
Unlike your Bishopsgate story, we had backup tapes from the end of day prior to running the end of day suite of programs. On Saturday morning these were taken to our DR site about a mile away in Devonshire square. The tapes were loaded onto the DR suite of computers and day-end was run from there. To cut a long story short we were up and running Monday morning 8am ready to trade albeit on a smaller scale, but with no loss of data.
The best feature of this new this new version of SolarWinds VM Monitor is that it checks Windows Hyper-V. Naturally, it still works with virtual machines on VMware ESX Servers. VM Monitor is a clever desktop tool that not only tests that your server is online, but also displays the CPU and memory utilization for each node.
It's easy to install and to configure this virtual machine monitor, all you need the host server's IP address or hostname and the logon info. Give this virtual machine monitor a try - it's free.
Expect the unexpected. Lean from the disaster recovery mistakes of others. Where ever possible, systematically identify and then eliminate, single points of failure from your backup plan.