Fault Meets Performance -- Comprehensive Infrastructure Management Part 2: The Solution

  • Written By: Fred Engel
  • Published: March 29 2002

A View of Historical Trends

Performance management technology emerged in 1995, giving IT managers the ability to see the errors, capacities and behaviors of their devices. While fault management tells IT staffs whether a device is up or down, performance management tells them more about these devices so that they can understand the context of what is going on.

Usually, performance relies on SNMP polling to assess the infrastructure's health. The infrastructure management system periodically sends messages to the various devices (i.e., polling) to sample their performance and to determine if they are operating within acceptable levels. The devices reply to the management system with the performance data that the system will store in a database. The more advanced systems will automatically correlate historical trend data and put it into reports. These reports give IT staffs a clear view of what's going on in the infrastructure.

By seeing how devices operate over time, IT managers can get a better view of how they normally run, and when something is wrong. The truly advanced management systems allow IT managers who see a problem to click on the alarm for that problem and view the important variables that shed light on why the problem occurred. The best solution for ensuring that infrastructures continuously deliver peak performance is to combine both fault and performance management.

This is Part 2 of a two part note.
Part 1 discussed the problem.

Marrying the Two

In the past, deployment of fault and performance technologies gave companies two independent systems. IT staffs tried to link them to get one view of their infrastructure's health. However, in most cases they had to look at two separate machines with separate screens to manually correlate the real-time fault data with the historical performance information to see what problems were arising and why. This took time and often did not work well, forcing IT staffs to focus too much of their day on babysitting infrastructure management software rather than on the company's core business functions.

Recent infrastructure management technology seamlessly marries fault and performance into one platform, giving IT managers one view of both fault and performance information about networks, systems and applications across the entire infrastructure. This technology not only enables IT managers to see when their financial application, for example, is failing in real time, it can also show them why so they can fix the problem before employees in accounts payable experience any problems conducting transactions.

Combining the two technologies helps IT staffs identify and resolve fault and performance issues much faster than if they relied on either real-time or historical trend analysis alone. Now, IT staffs can spend more time actually resolving problems before they affect customers and service rather than having to identify and research them to pinpoint the best approach to resolution. The latest technology also incorporates automatic fixes, so the infrastructure management system can identify a problem, notify the IT manager, and automatically correct the problem before anyone outside the IT department notices it.

What To Look For

Companies looking for integrated fault and performance infrastructure management solutions should look for vendors that have built their legacies in performance first. Why? The technology and research that goes into developing a successful performance application is much more complex than for fault.

A performance system needs to have been certified to operate with a broad range of vendors and technologies. It needs to have a large complement of information, from at-a-glance troubleshooting reports, to quality of service (QOS) service level agreement (SLA) reports to capacity planning reports. It needs to have a rich Web interface that allows all users to see their information as well. It should also automate all these functions so that the IT staff spends its time looking at information, not programming the system to show the relevant information. It needs to scale to handle the hundreds of thousands of devices that need to be managed.

A good performance company has spent its research and development dollars on figuring out what people want and need to see and programming that into the basic system. Companies starting from the fault side tend to have under-invested in their performance side. For example, polling, often thought to be an easy problem, is actually a difficult technology to master because you do not want to overwhelm devices with requests, or flood the network, or lose information because lines are noisy.

Fault management vendors adopting performance management technology would have to learn these complex tasks and build them into their platforms. Fault management, being a more established technology, is an easier technology to adapt. Fault technology, trap reception, fault handling, de-duplication, and work-flow management have become readily available commodities.

In the end, vendors who seamlessly integrate real-time problem detection with historical performance analysis will not only ensure their clients' infrastructures maintain maximum uptime, but more importantly, they will ensure that every network, application and system delivers peak performance.

This concludes Part 2 of a two part note.
Part 1 discussed the problem.

About the Author

Fred Engel is a recognized expert in networking technologies, and a member of several Internet Engineering Task Force (IETF) standards committees, which explore and define industry networking standards like RMON and RMON2. He was recently named one of the Massachusetts ECOM Top 10 for 2001, making him a true "pioneer in technology." Engel has also been nominated for Computerworld's Top IT Leaders of the new economy.

As executive vice president and CTO for Concord Communications, Fred invented the network reporting and analysis industry by developing eHealth for Concord Communications. Fred works with leading Fortune 1000 organizations, carriers and Internet service providers to deploy turnkey networking performance and analysis solutions. Before joining Concord in 1989, Fred was vice president of engineering for Technology Concepts/Bell Atlantic, where he led the design and implementation of the SS7 800 Database front-ends, now deployed nationwide. His experience also includes Digital Equipment Corp., where he led the company's first TCP/IP deployment.

Fred has taught computer science, statistics and survey research courses at the University of Connecticut, Boston College and Boston University. He frequently speaks at leading networking industry forums, including Networld+Interop, COMDEX, COMNET, the Federal Computer Conference and DECUS, as well as two keynote engagements in France: Network Developments and IEEE International Conference on Networking (ICN).

Fred can be reached at fengel@concord.com.

comments powered by Disqus