Fault tolerant and high availability software packages have been available for several types of UNIX in the past, but SCO operating systems have been generally ignored. The idea of a high availability system is usually to have two (or more) machines that are similarly configured with continual updates between the two to synchronize filesystems. If one machine fails for any reason, another machine instantly kicks in and takes over the role of the failed machine. For application and Web servers, high availability is a necessity for many corporation, justifying the investment in duplicate hardware and high-availability software. Savoir Technology has developed SavWareHA for SCO OSs specifically to provide this type of availability. (SavWareHA used to be marketed under the name Sentinel.)

SavWareHA is composed of a set of drivers that run on the primary and backup servers. The drivers install on top of existing systems and require no modification of existing drivers or filesystems, which simplifies the task of administrators. The SavWareHA Mirror driver is responsible for ensuring the filesystems on primary and standby machines are kept synchronized (using a mirroring scheme similar to RAID 1), the Net Disk Driver handles the network connection bewteen the two machines and the SavWareHA Monitor is the routine that handles failover situations. The Monitor daemon operators primarily on the standby machine, sending messages at intervals to the primary machine (the interval is administrator adjustable). If messages are not returned, the standby assumes an active role. While the standby handles logins and requests, the primary machine (if responding) undergoes an fsck and attempts are made to repair and reload the filesystem (assuming the filesystem was the culprit in the first place)

SavWareHA runs on SCO OpenServer 5, SCO Open Desktop 2.0 or later, and SCO Unix 3.2 or later. Each machine involved in the SavWareHA package must have a TCP/IP link between themselves. SavWareHA consumes only about 1MB of disk space on the primary and standby machines, and chews up a modest 60kB of RAM in normal use. Installation proceeds through custom and is uneventful. A kernel rebuild and reboot is necessary. There are several license types available for SavWareHA, including a demo mode which works for only three days before a reboot is necessary. Mirroring in the demo mode is limited to 200MB. The full registered version has no limitations, of course.

Configuration begins by setting up trusted access between the two machines (which must be done carefully because of potential security problems) and the designation of a primary and standby role. The two machines need not be similar in hardware configuration but the drives should be sufficient to allow proper mapping of filesystems. Obviously if the two machines are of different capabilities, the most powerful will be the primary. After choosing primary and standby, filesystems are chosen for mirroring. The process for setting up the mirroring takes a while but most of the time involves no administrator action.

We set up a test installation with two Pentium 450MHz servers with dual 9.1GB SCSI disks in each machine to test SavWareHA. After installing and letting the systems run for a couple of days we starting causing failures in the hard drive cables. When the failures reached a fairly low rate, the standby kicked in automatically, handling user requests transparently. While this went on, SavWareHA tried to repair the hard drive. We stopped mucky with the drive, and after verifying the filesystem the primary came back on line. After another day, we failed one drive completely and the standby was operational within a few seconds. Finally, we manually switched from primary to standby using a supplied option in the utilities, which may be required by some administrators for hardware upgrades, for example. After shutting down the primary we restarted the system and manually switched back. No problems were encountered with any of these tests. After every failure notices were received on the administrator console informing us of SavWareHA’s actions. All the interactions with SavWareHA are through a character-based system, which was less attractive than a Motif window but better for remote administration.

With the ability to provide automatic failover to a hot standby, SavWareHA adds an ability to SCO-based servers that works differently than clustering. We were impressed by SavWareHA and its handling of our tests, and expect this product to make SCO servers even more attractive for Web and application server roles. If you have high availability needs, SavWareHA is worth examining.