Enterprise Asset Management Systems and the Aims of Modern Maintenance
Since the late 1980s, enterprise asset management (EAM) vendors throughout the world have pitched their products based partly on the ability to capture, manipulate, and analyze historical failure data. Part of the stated benefits case is often the ability to highlight the causes for poor performing assets, provide the volume and quality of information for determining how best to manage the assets, and informing decisions regarding end-of-life and other investment points.
Part One of the series Captured by Data.
This benefits case covers the principal drivers for most maintenance managers today and has been used to justify millions of dollars' worth of investment. It has also placed the modern EAM system at the centre of corporations that are driving to improve asset performance. On the surface it appears to be a logical approach for problems relating to asset performance, and using this approach companies do, of course, achieve results.
The implementation of these products, when bought for these reasons, often focuses on optimizing processes to capture the dynamic data on asset failures, which is then used throughout the system. Maintain, repair, and overhaul (MRO)-style inventory management algorithms, for example, use this information as one of the key inputs to determine minimum stocking levels, reorder points, and the corresponding reorder quantities. MRO is an acronym widely used within the EAM/ERP industry, and is associated with inventory management from an asset perspective rather than from a production perspective. The difference is that with enterprise resource planning (ERP)-style inventory management, the focus is on just-in-time methods, while MRO-style inventory management focuses on just-in-case, or probabilistic methods.
If we want to understand the validity of this line of thinking it is necessary to first explore the aims of maintenance, and how asset data can be used to further those aims.
Maintenance is a term generally used to define the routine activities to sustain standards of performance throughout the in-service, or operational, part of the asset life cycle. In doing this, the maintenance policy designer needs to take account of a range of factors. These include the complexities of the operating environment, the available resources for performing maintenance, and the ability of the asset to meet its current performance standards.
In the past, this would be the extent of the maintenance analysts' role. One of the realities they face is that at times assets are under a demand greater than, or extremely close to, their inherent capabilities. As a result analysts often find themselves recommending and analyzing activities of not only maintenance, but also other areas of asset management, namely those of asset modification and operations.
Safety and environmental compliance play their part in creating the drive for this activity, particularly given the changing legal and regulatory frameworks around these two areas; in some industries they are even the principal drivers. However, for most businesses the goal remains that of maximum value from their investment. This means getting the maximum performance possible from the assets, for the least amount spent.
In the original reports and appendices that produced reliability-centered maintenance (RCM), the authors defined critical failures, initially, as those failures with an impact on safety. Today the term critical failure is often used to group failures that will cause what companies consider to be high-impact consequences—a definition that is too variable for a general discussion. For the sake of simplicity, critical failure in this paper refers to all failures that will cause the asset to perform to a standard less than what is required of it. It should be acknowledged here that the definition of what is an acceptable, or unacceptable, standard of performance is an extremely complicated area, and one that would take several articles to cover in adequate detail.
If an asset management program is aimed at maximum cost-effectiveness over an asset's life, then it must look at the management of critical failures. By definition, this approach is centered on the reliability of the asset (or reliability-centered). Note that within asset management, cost-effectiveness is not merely about low direct costs. Rather, it is about the minimum costs for a given level of risk and performance (in other words, maximum value).
So, in essence, the role of the policy designer can be defined as the formulating cost-effective asset management programs, routine activities, and one-off procedural and design changes, to maintain standards of performance through reducing the likelihood of critical failures to an acceptable level, or eliminating then. This is also the essence of modern RCM.
The Data Dilemma
Immediately, we start to see a contradiction between the aims of maintenance, and the often quoted aims of EAM systems. Non-critical failures are those of low or negligible cost consequences only. These are acceptable, and can be allowed to occur. Therefore a policy that focuses on data capture and later analysis as its base can be used effectively. Over time the level of information will accumulate to allow asset owners, and policy designers, to determine the correct maintenance policy with a high degree of confidence.
Figure 1. Acceptable and unacceptable failures
However, critical failures, those that cause an asset to underperform, have unacceptable consequences and cannot always be managed in a similar way. For example, if a failure has high operational impact or economic consequences, then allowing it to fail prior to determining how to manage them is actively counterproductive to the aims of cost-effective asset management. Moreover, recent history reinforces the fact that failure of assets can lead to consequences for safety or breaches of environmental regulations. In the US, the Iowa Division of Labor Services, Occupational Safety and Health Bureau, issued a citation and notification of penalty to Cargill Meat Solutions on January 30, 2006. This citation and notification or penalty required corrective actions such as the establishment of a preventive maintenance program and training of maintenance personnel on potential failure recognition among a range of initiatives to be implemented. This is just one of a number of recent safety events where maintenance has been flagged as a contributing factor.
So, if our policy for determining how best to manage physical assets is based around data capture, then we are creating an environment that runs counter to the principles of responsible asset stewardship in the twenty-first century.
The underlying theories of maintenance and of reliability are based on the theory of probability and on the properties of distribution functions that have been found to occur frequently, and which play a role in the prediction of survival characteristics. (Resnikov, H. L. 1978. Mathematical Aspects of Reliability-centered Maintenance, Springfield: National Technical Information Service, US Department of Commerce)
Critical failures are, by their very nature, serious. When they occur they are often designed out, or a replacement asset is installed, or some other initiative is put in place to ensure that they don't recur. As a result, the volume of data available for analysis is often small, and therefore the ability of statistical analysis to deliver results within a high level of confidence is questionable at best.
This fundamental fact of managing physical assets highlights two flaws with the case of capturing data for designing maintenance programs. First, collecting failure information for future decisions means managing the asset base in a way that runs counter to basic aims of modern maintenance management. Second, even if a company was to progress down this path, the nature of critical failures is such that they would not lend themselves to extensive statistical review.
By establishing an effective, or reliability-centered, maintenance regime, the policy designer is in effect creating a management environment that attempts to reduce failure information, not increase it. The more effective a maintenance program is, the fewer critical failures will occur, and correspondingly less information will be available to the maintenance policy designer regarding operational failures (see Mathematical Aspects of Reliability-centered Maintenance, cited above). The more optimal a maintenance program is, the lower the volume of data there will be.
Designing Maintenance Policy
When maintenance policy designers begin to develop a management program, they are almost always confronted with a lack of reliable data to base their judgments on. It has been the experience of the author that most companies start reliability initiatives using an information base that is made up of approximately 30 percent hard data, and 70 percent knowledge and experience.
One of the leading reasons for this is the nature of critical failures and the response they provoke. However, there are often other factors such as data capturing processes, consistency of the data, and the tendency to focus efforts in areas that are of little value to the design of maintenance policy. With EAM technologies changing continually, there are often upgrade projects, changeover projects, and other ways that data can become diluted.
Figure 2. Corporate knowledge = data + information
There are still other key reasons why data from many EAM implementations are of limited value only. Principal among these is the fact that even with well-controlled and precise business processes for capturing data, some of the critical failures that will need to be managed may not yet have occurred. An EAM system managing a maintenance program that is either reactive or unstructured, will only have a small impact on a policy development initiative.
At best they may have collected information to tell us that faults have occurred, at a heavy cost to the organization, but with small volumes of critical failures and limited information regarding the causes of failure. RCM facilitates the creation of maintenance programs by analyzing the four fundamental causes of critical failures of assets:
- poor asset selection (never fit for purpose)
- asset degradation over time (becomes unfit for purpose)
- poor asset operation (operated outside of the original purpose)
- exceptional human errors (generally following the generic error modeling [GEM] principles)
The RCM analyst needs to analyze all of the reasonably likely failure modes in these four areas, to an adequate level of detail (reasonably likely is a term used within the RCM standard SAE JA1011, to determine whether failure modes should, or should not, be included within an analysis; reasonableness is defined by the asset owners). Determining the potential causes for failures in these areas, for a given operating environment, is in part informed by data, but the vast majority of the information will come from other sources.
Sources such as operators' logs are strong sources for potential signs of failure, as well as for failures often not found in the corporate EAM. Equipment manufacturers' guides are also powerful sources for gleaning information regarding failure causes and failure rates. However, all information from a manufacturer needs to be understood in the context of how you are using the asset, and the (often conservative) estimates of the manufacturer. For example, if there are operational reasons why your pumping system is subject to random foreign objects, for whatever reason, then failure rates for impeller wear can become skewed.
Other sources of empirical data can be found in operational systems such as supervisory control and data acquisition (SCADA) or Citect, commercial databanks, user groups, and at times consultant organizations. Similarly to information from manufacturers, there is a need to understand how this applies to the operating environment of your assets. As asset owners require more and more technologically advanced products, items come onto the market with limited test data in operational installations, further complicating the issues of maintenance design through data.
The factors that decide the lengths that an RCM analyst should go to collect empirical data is driven by a combination of the perceived risk (probability X consequence), and of course the limitations set on maintenance policy design by commercial pressures. Even when all barriers are removed from the path of RCM analysts, they are often faced with an absence of real operational data on critical failures.
The vast majority of the information regarding how assets are managed, how they can fail, and how they should be managed, will come from the people who manage the assets on a day-to-day basis. Potential and historic failure modes, rates of failure, actual maintenance performed (not what the system says, but what is really done), why a certain task was put into place in the first place, and the operational practices and the reasons for them, are all elements of information that are not easily found in data, but in knowledge.
This is one of the overlooked side-benefits of applying the RCM process—that of capturing knowledge, not merely data. As the workforce continues to age, entry rates continue to fall in favor of other managerial areas; and as the workforce becomes more mobile, the RCM process and the skills of trained RCM analysts provide a structured method to reduce the impact of diminishing experience.
In Part Two of this series, we'll examine the methodology of the RCM process in greater detail.