Use a Computerized Maintenance Management System to Improve Predictive Maintenance Performance

  • Written By: David Berger
  • Published On: August 6 2008


Originally Published - February 19, 2004

Speaking with numerous maintenance professionals across multiple industries, invariably the conversation seems to center on how to create an optimal balance of proactive and reactive maintenance and how to get there. The discussions often come disguised in modern-day labels, such as lean reliability-centered maintenance (RCM) and total productive maintenance (TPM). However, there's no mistaking the underlying issues: companies need to become better at operating a more stable, planned, and predictable maintenance environment at minimal cost.

Although the figures vary, most maintenance managers would agree they don't like playing the role of firefighter, with as many as 40 to 80 percent of requests for maintenance services being unplanned or even emergencies. There's much debate on the subject, but it appears that a reasonable target for the ratio of planned to unplanned maintenance is 80:20. Clearly, many manufacturing environments have far to go to achieve this target. Companies that have moved from a highly reactive environment to a more planned environment notice significant improvements, such as

  • significant reductions in total downtime
  • lower inventory of spare parts required in stores
  • increased production capacity, as fewer machines lie idle or are in the shop
  • less space requirements for spare parts and equipment that's down
  • fewer rush orders required
  • fewer quick fixes and less mistakes made
  • improved use of maintenance staff
  • less overtime needed to respond to emergencies
  • less stress with a planned shutdown
  • better yield and less scrap, waste, rework, etc.
  • lower total cost of ownership of assets
  • more predictable and stable production scheduling, so that customer responsiveness is improved

For companies moving in the right direction, however, there's still some uncertainty as to the optimal balance of reactive maintenance, preventive maintenance (PM), and predictive maintenance (PdM) for their particular environments. Generally, organizations that are well on their way to building a planned environment do so using a high proportion of PM as opposed to what's perceived as more expensive PdM technologies. Predictive-based technologies, however, are becoming less expensive. Today, you're seeing more general technology improvements, which change the point of optimal balance. Thus, even the more sophisticated maintenance departments feel the best way to determine the optimum is on a trial-and-error basis.

An enterprise asset management (EAM) or computerized maintenance management system (CMMS) is a useful tool to build an accurate equipment history and provide comprehensive analysis capability. With a realistic history, companies can balance the cost of replacing the equipment versus maintaining it through an optimal mix of reactive, PM, and PdM maintenance. A CMMS can help calculate the total cost of downtime and poor quality as part of the optimal balance calculation. Perhaps surprisingly, not many companies track these costs.

Additionally, the CMMS can help identify the root cause of maintenance-related failure or quality problems so that the frequency of maintenance can be reduced through prevention (such as the training of operators) or condition monitoring (such as a vibration analysis). This is critical to the success of any maintenance program.

Begin with Criticality Analysis

Moving too quickly to either end of the reactive/preventive/predictive continuum can be a costly exercise. For example, monitoring the condition of each and every light bulb in your facility so that they might be replaced just prior to failure is massive overkill. On the other hand, however, allowing a critical component of an expensive asset to run to failure is unthinkable. Maintainers have the option of monitoring conditions (e.g., vibration), whereby they can save millions of dollars in downtime costs. These are obvious examples. With most assets, however, it can be a long and painful process to identify the optimal mix of reactive, PM, and PdM maintenance.

To determine the most cost-effective approach to maintaining an asset, different questions are posed for each component. This is often referred to as criticality analysis and is part of an RCM program. Some of the more important questions include the following: What does this component do? What happens if it fails (for example, no impact versus catastrophic)? What's the most cost-effective maintenance program required (such as reactive, PM, or PdM)?

A few of the more sophisticated CMMS packages will assist in determining and documenting some or all of the answers to these questions. Examples of the type of data collected and analyzed include the following:

  • operating context for the asset being analyzed (e.g., a water cooling system maintains water between forty to forty-five degrees Fahrenheit twenty-four hours a day, seven days a week)

  • functions of the asset (e.g., maintain water temperature and contain water in the tank)

  • possible failures (e.g., water becomes too hot or too cold)

  • possible failure modes or causes (e.g., heat exchanger is fouled, shut-off valve is closed, or pump bearing is fatigued)

  • most probable failure effects for each failure mode (e.g., inefficient heat exchanger results in higher utility cost, extra cooling tower sections in operation, and eventual inability to deliver quality parts)

  • proposed maintenance task for each failure mode that includes using failure history, probability, and costs to compare financial and technical feasibility of reactive, preventive, or predictive actions (e.g., monitor heat exchanger efficiency)

As a result, if the failure of a component is catastrophic from a safety or cost perspective, it may be wise to carefully monitor the conditions that are found to best correlate with and predict the component's failure. This includes vibration, temperature, lubrication viscosity, etc. If the failure of a component has little or no impact, then a run-to-failure or reactive maintenance program is likely the optimal solution. Similarly, a PM program may be cost-effective. In this scenario, instituting a simple time- or meter-based routine offsets the impact of failure.

Role of CMMS in PdM

As we learn to integrate systems within the plant environment, the role of CMMS continues to change. The CMMS is emerging as a natural focal point of many manufacturing systems. CMMS vendors have been actively merging, acquiring, or forming partnerships with vendors of complementary software. This includes enterprise resource planning (ERP), PdM, human machine interfaces (HMI), manufacturing execution systems (MES), and shop-floor data collection (SFDC). To better monitor and control processes, the product, assets, and the manufacturing environment, more astute company executives are beginning to see the value in closely coupling the collection and analysis of data.

Many of the more sophisticated CMMS packages can link the collection of data and the control loop to correct a process or prevent the failure of an asset. This involves taking periodic samples from the shop-floor data being collected and comparing it to a predetermined standard. If the process deviates significantly from the standard, action is required. To determine root cause and the appropriate action, specific tools can be employed. This includes Pareto analysis, cause-and- effect diagrams, or simply brainstorming.

Described below are the multiple sources of data, various techniques for analyzing the information, and the means to take corrective action:

1. Data Collection

The first step in implementing a data collection system is to determine what needs to be measured and controlled. For example, if a plant has a number of injection moulding machines, what should be measured to determine whether or not the process is in control and the asset isn't going to fail? Perhaps the wear on a die needs to be measured or the cycle time determined.

  • PdM data: Vibration, lubrication, infrared, and other predictive data can be collected via portable units on a periodic basis and dumped into the CMMS. Alternatively, data can be pulled on a regular basis using permanently affixed shop-floor data collection systems.

  • Condition monitoring data: SFDC systems can pick up PdM information and collect production data, such as production count, number, and type of rejects, etc. This data can be fed directly or via a programmable controller into the CMMS for analysis. Data can also be fed manually, for example, after PM inspections.

  • RCM data: Collection of reliability information assumes your CMMS is capable of recording codes for problem, cause, action, and delay on every corrective or standing work order. A log of these codes for each component and equipment can be kept for analysis.

2. Data Analysis

  • Control charts: All processes have a natural tendency to vary slightly for any given attribute. These random variations are caused by numerous factors, such as ambient temperature and atmospheric pressure fluctuations. Each factor contributes only a small amount of variability. A normal or bell curve is produced by plotting the random variability. From the curve, an acceptable upper control limit (UCL) and lower control limit (LCL) can be determined (e.g., within two standard deviations of the mean or 95.5 percent of the sample values).

    Another type of variability is called assignable variation because the cause of the variation can be determined and, in turn, eliminated. Factors like tool wear, equipment that's out of adjustment, defective parts, and human error can contribute to assignable variability. To determine whether a process is in control, sample data can be plotted on a control chart. For example, random clustering around the average for a given attribute, such as machine cycle time, signals that a process is in control. If points are plotted outside the UCL and LCL, then the process is considered out of control and root cause analysis is required.

  • Run charts: Even if all sample points lie within the UCL and LCL, a process may still be exhibiting assignable variation. This may show as patterns of variation, such as cycles, trends, or bias. A run chart plots the sequence of observations with a given attribute. In most cases, patterns can be detected visually.

  • Pareto analysis: In the 1800s, Vilfredo Pareto observed that relatively few factors (approximately 20 percent) account for a large percent of problems (around 80 percent). This is commonly known as the 80-20 rule. For example, 80 percent of the machine breakdowns come from 20 percent of the machines, or 80 percent of the breakdowns for a given machine are caused by only 20 percent of all the possible causes.

    A Pareto diagram is a bar chart that shows the number of occurrences by category, arranged in order of frequency. Data can be sourced from all of the collection systems described above, or as a subsequent step to analyzing multiple control or run charts. The highest peak may be a logical starting point for eliminating problems. This isn't a guarantee, however, that you're addressing the root cause.

  • Cause-and-effect diagram: A useful tool for determining the root cause of a problem is the cause-and-effect diagram, which is sometimes referred to as the fishbone diagram or Ishikawa diagram that's named after the Japanese professor who developed the approach. The fishbone diagram is used to organize logical cause-and-effect relations that are discussed in brainstorming sessions. Each one of the causes can then be analyzed to determine the root cause.

3. Action and Control Loop

Corrective action can be a manual process, such as issuing a work order that prompts a maintenance worker to repair or replace a component part. An even simpler action would be to print out the condition the next time a report is requested. If response time is a key factor, however, neither of these actions may be sufficient. Where processes are more critical and require an automated corrective action, user-definable business rules built into the CMMS are used to trigger more sophisticated actions or events. This may involve actuating an automated procedure via a programmable controller, when a process tracks outside the acceptable control limits (for example, tool replacement on computer numerical control [CNC] equipment).

Alarms triggered by business rules can vary widely. Many CMMS packages have an alarm feature that alerts the user of a given condition. More sophisticated CMMS packages can notify users automatically via e-mail, phones, or even digital beepers.

Sometimes the action required involves the launching of a report, an event (e.g., the issuing of a work order), or even a third-party application (e.g., sending a purchase requisition via the Internet). The most comprehensive CMMS packages have linked condition monitoring to workflow software to define business rules and corresponding actions triggered.

For example, a work crew has missed performing a PM routine many days in a row. As a result, the next scheduled PM is already due. Workflow software and business rules can be used to automatically prevent any further logging of time and materials against work orders until the PM routine is completed. Additionally, workflow software can alert the appropriate planner or supervisor of the condition and possible consequences if the necessary action isn't taken. If action still isn't applied within a user-defined period, the workflow software can continue to escalate the problem by alarming sequentially higher levels of authority.

Thus, the CMMS acts as a central depository for the data collected and records of the action taken. More advanced CMMS software can assist in capturing data online, as well as analyzing, reporting, and acting on the data using the techniques described above.

About the Author

David Berger is with Western Management Consultants and is the founding president of the Plant Engineering and Maintenance Association of Canada. He has extensive experience as an executive of a tier one bank and for a manufacturing company, as well as many years of experience as a partner of a Big Five consulting firm and running his own consulting firm. Berger has been an adjunct professor at the Schulich School of Business in Toronto (Canada) for more than 20 years, teaching operations management for the MBA program. Berger has also done considerable work in the areas of strategy, IT, and process improvement for the manufacturing, financial services, and public sectors. Berger can be reached at

Reprinted with permission from Plant Engineering and Maintenance magazine.

comments powered by Disqus