RoboMon: System management made easy?
Heroix’ RoboMon is a systems management software package that resides on a Windows NT machine and monitors domains or workgroups for errors, and in many cases, corrects them transparently. RoboMon has been around since 1989 and has now reached version 7. Part of the reason RoboMon works well is that you can establish the rules that govern its behavior, so when an error such as an application or sharing problem is encountered you have instructed RoboMon how to construct the correct solution. RoboMon actually does more than correct commonly encountered errors and monitor an enterprise network, but those are the uses that most administrators will purchase this package for.
RoboMon runs under Windows NT, UNIX, and OpenVMS. We tested under Windows NT 4.0 Server and Beta 3 of Windows 2000 Professional (nee Windows NT 5 Server). The test server was an ALR Revolution 2XL sporting two 266MHz Pentium-IIs, 128 MB RAM, and a DPT SmartRaid IV SCSI controller handling four 9GB hard drives. A Fast Ethernet network containing eight Windows NT 4.0 Server application servers, one Windows NT 4.0 Workstation, and four UNIX workstations (a mix of HP-UX, Solaris, and AIX) was the test bed. Thirty Windows 95 and Windows 98 clients used the NT Servers during the test period. (Our version of RoboMon was licensed only for ten servers, hence the rather small test load.) The Windows and Windows NT machines were configured first as a workgroup then as a domain with equal results as far as RoboMon was concerned. The UNIX NFS drives were mounted on four different Windows NT Application Servers using WRQ’s Reflection NFS Gateway. We used RoboMon on the network for four weeks, simulating over 8,000 failures of different types using Mercury Interactive’s WinRunner. In addition, we had over 350 user-generated errors as well in that period.
The RoboMon package contains the software on a CD-ROM and a thin, spiral bound User Guide that provides enough information to use the package. Minimum system requirements for RoboMon are Windows NT 3.51 or 4.0. If you are running Windows NT 3.51 you must have Service Pack 5 installed. RoboMon wants at least a 90MHz Pentium II and 32MB RAM minimum, but faster CPUs and more memory are highly recommended. 25MB disk space, and a 50MB free page file space is required (we had to create extra page space for RoboMon because we had frequent virtual memory warnings with our default setting of 114MB). RoboMon only works with TCP/IP. RoboMon also requires ODBC 3.5 or higher, and will install it if your system lacks it.
Installing the software is through the Administrator account. A license key is required to install RoboMon unless the CD-ROM has been hard-coded with an expiry date for an evaluation, as was our version. During the installation RoboMon asks for an e-mail account to which all event notices are sent. We started using our Administrator account for this purpose, but because of the large number of errors we were forcing on the system to test RoboMon our mailbox quickly became unwieldy. We moved to a separate mailbox just for the RoboMon event messages and found that approach much more workable. Of course, on a stable network that generates only a few events, any administrator e-mail account may suffice. The installation proceeds quickly through the Autorun procedure, with an HTML page appearing in your default browser to step you through the process. Five minutes later, you’re done and after a reboot, RoboMon is active.
Before diving into the rules-based capabilities of RoboMon, a few of the ancillary features are worth noting. First, the event viewer included with RoboMon provides not only events detected by RoboMon but also those from the Windows NT Event Viewer. These can be from local and remote events. The reporting of events is in real time, which can help you head off trouble before it escalates. Also noteworthy is the system performance measurement subsystem which shows tables and graphs of system performance and network events, useful not only for your own readability but also when this information has to be displayed for non-technical people (when you’re trying to justify server upgrades, for example!). RoboMon doesn’t have to be managed from the server. It allows you to manage all aspects of the software from any machine on the network, which is handy for placing RoboMon on a server in a secure location then managing it from your desktop.
RoboMon can monitor a number of sources of information gathered from the server and machines attached to the network, as well as the network itself. Monitoring of machines includes their performance information (the CPU’s usage, available memory, and amount of I/O) and disk usage. File and directory usage and bottlenecks can be reported. On the services side, RoboMon watches which services are used and where any slow-downs occur. Database usage can be reported for both Oracle and SQL Server. Access to the Internet Information Service and any pass-throughs to the Internet itself are monitored, as are printer usage and conditions. All of this information is gathered, in background, from a number of RoboMon-specific monitoring routines as well as Windows NT’s event logs and performance counters, as well as ODBC-compliant database reporting tools.
The administrator’s interface to RoboMon is through the Enterprise Manager window. This window is similar to the NT Explorer. The Enterprise Manager starts and stops any of the rules engines, as well as reports process status and allows changes to a process. Using drag-and-drop actions, rules on one machine can be propagated to other machines. Built into the Enterprise Manager are the Rule Designer (used to define rules logic and actions) and the Solutions Manager (used to modify rule trigger conditions and actions quickly).
The Enterprise Manager window uses the concept of an anterprise as the lowest common denominator for the systems and networks to be monitored. An enterprise may consist of multiple domains or workgroups. The left-hand pane of the Enterprise Manager shows all the machines in the enterprise, with machines groupable by domain or workgroup, or by any other logical groupings you develop – you are not restricted to using the actual domain or workgroup setups. (The Enterprise name can be renamed to suit your networks, as can any of the machine groupings.) Under each particular machine name there are four branches for processes, used to hold the pre-built rules. (Supplied default rulesets include Automation, which handles many NT common errors; Exchange, used for Exchange problems; Performance, which works with the Performance Monitor; and SQL Server, which works with SQL Server, surprisingly enough. Beneath each ruleset are further specific rules for each condition to be monitored.) The contents of one or more of the branches can be modified at will without affecting other machines on the network. This approach of showing Enterprise, then domains, machines (which have RoboMon clients running), processes, and then rules is logical and quickly becomes familiar to administrators.
For RoboMon to monitor a machine, it has to be sending data to the RoboMon engine. This is accomplished by using the Enterprise Manager to locate the target machine, and use a single button-click to add a Statistics Builder process for that machine. Once statistics are being recorded for the client, rules can be added. When RoboMon is first installed, there is only the machine on which RoboMon is installed showing up in the Enterprise Manager. By adding machines one at a time under real or virtual domains, the Enterprise can be populated. You can instruct RoboMon to browse the network for computers running RoboMon clients already, but this is handy only if the RoboMon server is being changed.
The second RoboMon interface is the Event Monitor that displays real-time network-wide events from both RoboMon and Windows NT. A set of filtering and tailoring tools allows you to modify the events that are displayed, as well as suppress repetitive events and those you know are occurring for which you have no actions. If you don’t want to use a GUI, RoboMon can be controlled through a character-based command-line interface instead. This may be handy for rapid actions conducted remotely over modems, for example. Supporting these two interfaces are a set of utilities, including graphing and reporting tools, a statistics routine, and a configuration manager for the event server.
The heart of RoboMon, as mentioned, is the rules engine. Actually, there are several background-based autonomous rules engines at work, all coupled together by the front-end GUI running in foreground when demanded by the administrator. The rules engines are inference engines that can execute commands and processes when the situation demands. Because of the design of these engines, each engine can operate separately on different target computers. As far as Windows NT is concerned, each of the rules engines is a separate process. Rules set up triggers for particular actions. These triggers can be anything that Windows NT’s Performance Monitor (or a client’s) records. Essentially, if it is measurable on a Performance Monitor, it can be used as a basis for a rule. Alternatively, rules can be based on file, directory, or disk volume activity.
What type of thing makes up a rule? Simple examples are monitoring a printer’s queue. If it hits a threshold level (which can be computed based on the printer’s history, not necessarily hard-coded), alerts can be generated for the administrator or redirection of print requests can occur. Simple bottleneck reports are easy to generate. For example, if access to a CD-ROM jukebox becomes a bottleneck, RoboMon can report the condition. The same applies to database volumes, Internet gateways, and Remote Access Servers. Rules can be programmed for near-continuous activity, or instructed to sleep between checks. For example, when checking printer queues, checking every five or ten minutes should be enough, instead of chewing up resources continually checking the queue.
RoboMon includes a set of default rule engines during installation, which are sufficient for many application problems. Of course, the real strength of RoboMon is the ability to customize the ruleset to your requirements. These do not require programming. Instead, the Rule Designer graphical interface can be used to develop remarkably sophisticated action sequences when an engine encounters a trigger condition.
The easiest way to create new rules is to modify an existing rule. Use the standard Windows menu choices to copy or edit directly from the Enterprise window and open the Rule Designer. There are five page tabs in the Rule Designer that allows for considerable detail about the rules. The Documentation tab is used to describe the rule; the Schedule tab determines when the rule should be executed; the Selections tab specifies what is to be monitored; the Condition tab sets up a true-false test for the rule to trigger; and the Actions tab dictates the rules to be applied when the conditions are evaluated.
Several different actions can be specified when an event triggers the rule. Notification is the simplest, sending you a message through e-mail, pager, or SNMP traps. A Corrective action has RoboMon execute your programmed steps to alleviate the problem. A Rule Interaction activity can disable, enable, or reschedule other rules based on one particular event, particularly useful when you have a cascading series of events that apply more radical steps each time to an increasing problem. Finally, a Variable Manipulation allows you to change the value of a RoboMon variable, affecting other rules dynamically. Rules are set up by specifying each component, one by one, building up the conditions or actions as needed. Through the Rules Designer most tasks that you would want RoboMon to handle can be built without knowing how to program a line of code.
The rules engine in RoboMon is hierarchical, in that you can determine the order of precedence of actions that the engine uses. This can be used simply to alert you to potential problems before taking any corrective action, or to follow a number of more drastic recovery actions in cases of severe problems. All of the data RoboMon gathers is stored in a Microsoft Access database unless you instruct it to use another system like SQL Server.
While all the development of rules, managing the Enterprise window, and setting up the Enterprise itself sounds like a time-consuming and involving task, it isn’t. In fact, RoboMon is remarkably easy to set up and configure, as well as manage. On our test network, it took less than half an hour to populate the Enterprise and set up the default rule sets. The statistics gathering process has no noticeable effect on the servers, although you will see network traffic increase a little when you are running real-time condition tests. Since most rules are set to operate at intervals instead of continually, this isn’t much of a real problem unless your network is near capacity. Configuring the SNMP components was the most difficult part of setting up RoboMon, but even so, a knowledgeable administrator will spend only a few minutes at this task.
When we first installed RoboMon we didn’t tamper with the default rule sets, finding them to be more the enough for our smallish network’s requirements. Playing around with some trigger conditions and the rule schedules was simple and even fun. Daily administration tasks really involved no more than checking the e-mail account for event notifications, and occasionally browsing the Event Monitor. We ended up leaving the Event Monitor running continually in a corner of our network console. The symbols and colors used in the Event Monitor quickly draw your eye to alerts and potential problems, making on-the-fly administration quite easy.
Messages in the Event Monitor are simple and descriptive, such as “The number of processes on the system is high”. With most of these types of informative messages, the rules don’t do anything more than inform you, but that’s usually enough to let you see how the systems are performing. If this type of warning persists over time, you know to upgrade the problem system or take some load off it. Other events can trigger corrective action. For example, when a network has more than one gateway to the Internet, you can have a rule redirect network users through the second gateway when the first is loading to capacity.
Statistics are easily gathered from the Event Monitor. We used the statistics to summarize performance results on our servers, as well as the ODBC accesses. After stats were gathered, we graphed the results using the supplied RoboMon Graph routine, providing clear indications of network loads and server performance.
If you get the impression we liked RoboMon, you’re right. Network administration monitoring was easier with RoboMon than any other package we’ve looked at. The rules engine provides so many options for administrators, that corrective action for most critical problems can be applied without you getting involved. RoboMon’s been around for many years already, starting from its VMS roots. This release continues to show the strengths of the package, and should ensure RoboMon will be around for many more years.
Up to ten NT machines, $1000 per machine
120 Wells Avenue
Summary: Network-wide monitoring for Windows-based platforms coupled with a clever rules-based corrective action capability make RoboMon one of the best administration tools for Windows NT we’ve seen.