Beware of Legacy Data - It Can Be Lethal

  • Written By:
  • Published:

Beware of Legacy Data It Can Be Lethal
Featured Author - Jan Mulder - August 23, 2002


The term legacy is mostly used for applications. For example, according to the Foldoc dictionary, legacy is:

A computer system or application program which continues to be used because of the cost of replacing or redesigning it and often despite its poor competitiveness and compatibility with modern equivalents. The implication is that the system is large, monolithic and difficult to modify.

This definition is enough to give you an appetite to get rid of your legacy application. However, every legacy application has its close associate: the legacy data being maintained and used through that application. When legacy modernization is planned for example, replacing the legacy application with an ERP system the data part is often overlooked. This can be killing for the new and often expensive application. Recently I worked for two organizations, both in the process of replacing their legacy applications with ERP systems. Both did not give enough attention to the associated legacy data. Both experienced severe problems in using the functionality offered by the new application. The legacy data turn out to be real functionality killers.

Legacy Data

One of the companies I worked for was using mainframe based applications that dated back from the seventies. So, who wouldn't want to migrate to a modern state-of-the-art ERP system? Management decided to switch. Getting rid of the legacy application is one thing not getting bothered in the future by the legacy data is quite another.

In an environment with systems that have been used for so long, it is almost certain that you will experience some or all of the following problems with the data:

  • Lack of understanding: what the heck is meant by columns 197 through 204 in the file? Nobody can tell you. The local expert has retired or was downsized long ago.

  • The presence of data that really shouldn't be in the file or database: The file is supposed to contain only customer records, but over time other categories are usually added employees, test cases etc.

  • The usual cases of bad data: duplicate records, incorrect data. Note that these are often caused by the limitations of the legacy application. The legacy application might only allow one customer category per customer record. So, if a the customer belongs to two categories, his data must be entered twice.

  • The effect of all those years: the bugs introduced by a programmer back in 1984 and discovered after 4 months (but not always corrected in the data); the trainee working some months for the company, maintaining the system without really understanding it.

Ok, I know what you are thinking: not in our shop! People are very reluctant to admit that there might be problems like these in their systems. I would go so far as to state: there is no information system without some of these abnormalities, simply because there is no perfect software and there are no perfect people working with these systems. The reaction should not be denial but proactive, by developing a strategy to contain the damage.

Case: A Utility Company

In Europe, there used to be lots of utility companies roughly covering the area of a town or region. The liberalization of the energy market led to an enormous consolidation of these many utility companies to just a few. One of the managers of this company tried to sketch the genealogy of the current company for me. He gave up after summing up about 25 predecessors: companies that once were independent and now are part of this huge consolidated company. Naturally, each of these predecessors had its own information system. As more companies were added, the more complex the landscape of information systems became. So, about two years ago the company decided to standardize on SAP for it's financial and logistics information systems and on the industry-specific component of SAP IS-U, as its customer information and billing system.

A huge project team worked on the SAP implementation, and over time all the old legacy customer information systems were replaced by SAP IS-U. Due to time constraints (and probably lack of knowledge of the legacy data) the legacy data were more or less unchanged when migrated into the new SAP IS-U.

I worked as an IT architect for this company at this time. One of my tasks was to help other projects (needing customer related information) interface to SAP IS_U.

An example might be a website for a specific group of customers that needed the basic data of those customers, to be able to have a meaningful dialogue on the website. The technical integration with SAP IS-U did succeed. But it didn't work out well... When a customer started using the website, he would typically see only part of his data.

The reason: lots of customers had multiple customer records, each telling part of the truth about the customer.

The explanation: one customer record might come from the company selling natural gas in his region; a second customer record might come from yet another company selling electricity; the third from a former company selling heating. One of the old customer information systems turned out to be a real nightmare. Because that system was so slow, call center employees used to enter a new customer record when somebody called. That way they could at least enter a complaint or request in the system. The company got rid of the legacy applications, but not the legacy data. Months after the migration the organization was still struggling resulting in many unhappy customers.

Case: A Retailer

This retail company used old mainframe applications for ordering, invoicing, and inventory. Because it was getting extremely difficult to maintain these legacy applications, the company decided to turn to Armature an ERP supplier specializing in integrated solutions for the retail industry. The company was wise enough to migrate gradually to the new system. They didn't want to put their main business processes in danger by a big bang approach. By migrating piece by piece they wanted to reduce the risks. Data present in the files on the mainframe were migrated to the Armature database: articles, suppliers etc. However, no filtering or cleaning of these records was done. The result: although the ERP system was brand new it reflected right from the start 25 years of history, complete with all the data entry errors made in the course of all those years. The limitations of the legacy application were replicated to the new system, like duplicate records for suppliers because of some obscure limitation in the legacy application.

The effect is that new functionality simply cannot be used: multiple records for the same supplier means that supplier scoring is unnecessarily complicated or even impossible.

Develop a strategy

A company that is planning to get rid of a legacy application by migration to a new application (ERP or otherwise) should develop a strategy how to deal with the legacy data. The worst possible approach is just copying the old data to the new system without giving further attention to this process.

Note that this dealing with legacy data is quite commonly the reason for setting up a data warehouse system. Much can be learned from the methodologies and tools used in the data quality and data warehouse world.

When you are planning to migrate your legacy data to a new environment, you should basically treat it as a data warehouse project. Sensible steps are:

  • Profile the data to be migrated: what is the content of the legacy data stores, what abnormalities can be discovered, can the data be understood at all? SQL can help you; or specialized tools like those of Metagenix.

  • Develop an approach for what to do with abnormalities found: It makes sense to create a work-in-progress area between the legacy application and the new system. This is quite common in the data warehouse world where it is called the staging area; here the data can be analyzed and if necessary the data quality can be improved. If a lot of data have to be migrated (i.e,. more than can be handled by the human hand and eye) it is worthwhile to look at data quality tools or external parties that can do data checking. Vendors like Trillium or HumanInference are experts in this area.

  • Have a clear picture of the features required of the new application: If supplier management is an issue the supplier related data in the new application should make that possible. And the legacy data should be transformed such that they will fit this purpose.

User Recommendations

The goal should be to maximize the investment in the new system. Migration of legacy data should be done in such a way as to maximize the use of the new application.

Develop a strategy to handle problems.

Most important: be aware that problems in legacy data are the rule and not the exception.

About the Author

Jan Mulder is solution architect in the newHP services organization in the Netherlands. With ten years of experience in system integration, he is the architect of several data warehouse systems. In his opinion other areas like application integration and data migration could and should learn from data warehouse methods and techniques.

Jan can be reached at

comments powered by Disqus