Beware of Legacy Data - It Can Be Lethal
Jan Mulder -
8/23/2002
Beware
of Legacy Data It Can Be Lethal
Featured
Author - Jan
Mulder
- August 23, 2002
Introduction
The
term legacy is mostly used for applications. For example, according to
the Foldoc dictionary, legacy is:
A
computer system or application program which continues to be used because
of the cost of replacing or redesigning it and often despite its poor
competitiveness and compatibility with modern equivalents. The implication
is that the system is large, monolithic and difficult to modify.
This
definition is enough to give you an appetite to get rid of your legacy
application. However, every legacy application has its close associate:
the legacy data being maintained and used through that application. When
legacy modernization is planned for example, replacing the legacy application
with an ERP system the data part is often overlooked. This can be killing
for the new and often expensive application. Recently I worked for two
organizations, both in the process of replacing their legacy applications
with ERP systems. Both did not give enough attention to the associated
legacy data. Both experienced severe problems in using the functionality
offered by the new application. The legacy data turn out to be real functionality
killers.
Legacy
Data
One
of the companies I worked for was using mainframe based applications that
dated back from the seventies. So, who wouldn't want to migrate to a modern
state-of-the-art ERP system? Management decided to switch. Getting rid
of the legacy application is one thing not getting bothered in the future
by the legacy data is quite another.
In
an environment with systems that have been used for so long, it is almost
certain that you will experience some or all of the following problems
with the data:
- Lack
of understanding: what the heck is meant by columns 197 through
204 in the file? Nobody can tell you. The local expert has retired or
was downsized long ago.
- The
presence of data that really shouldn't be in the file or database:
The file is supposed to contain only customer records, but over time
other categories are usually added employees, test cases etc.
- The
usual cases of bad data: duplicate records, incorrect data. Note
that these are often caused by the limitations of the legacy application.
The legacy application might only allow one customer category per customer
record. So, if a the customer belongs to two categories, his data must
be entered twice.
- The
effect of all those years: the bugs introduced by a programmer back
in 1984 and discovered after 4 months (but not always corrected in the
data); the trainee working some months for the company, maintaining
the system without really understanding it.
Ok,
I know what you are thinking: not in our shop! People are very reluctant
to admit that there might be problems like these in their systems. I would
go so far as to state: there is no information system without some of
these abnormalities, simply because there is no perfect software and there
are no perfect people working with these systems. The reaction should
not be denial but proactive, by developing a strategy to contain the damage.
Case:
A Utility Company
In
Europe, there used to be lots of utility companies roughly covering the
area of a town or region. The liberalization of the energy market led
to an enormous consolidation of these many utility companies to just a
few. One of the managers of this company tried to sketch the genealogy
of the current company for me. He gave up after summing up about 25 predecessors:
companies that once were independent and now are part of this huge consolidated
company. Naturally, each of these predecessors had its own information
system. As more companies were added, the more complex the landscape of
information systems became. So, about two years ago the company decided
to standardize on SAP for it's financial and logistics information systems
and on the industry-specific component of SAP IS-U, as its customer information
and billing system.
A
huge project team worked on the SAP implementation, and over time all
the old legacy customer information systems were replaced by SAP IS-U.
Due to time constraints (and probably lack of knowledge of the legacy
data) the legacy data were more or less unchanged when migrated into the
new SAP IS-U.
I
worked as an IT architect for this company at this time. One of my tasks
was to help other projects (needing customer related information) interface
to SAP IS_U.
An
example might be a website for a specific group of customers that needed
the basic data of those customers, to be able to have a meaningful dialogue
on the website. The technical integration with SAP IS-U did succeed. But
it didn't work out well... When a customer started using the website,
he would typically see only part of his data.
The
reason: lots of customers had multiple customer records, each telling
part of the truth about the customer.
The
explanation: one customer record might come from the company selling natural
gas in his region; a second customer record might come from yet another
company selling electricity; the third from a former company selling heating.
One of the old customer information systems turned out to be a real nightmare.
Because that system was so slow, call center employees used to enter a
new customer record when somebody called. That way they could at least
enter a complaint or request in the system. The company got rid of the
legacy applications, but not the legacy data. Months after the migration
the organization was still struggling resulting in many unhappy customers.
Case:
A Retailer
This
retail company used old mainframe applications for ordering, invoicing,
and inventory. Because it was getting extremely difficult to maintain
these legacy applications, the company decided to turn to Armature an
ERP supplier specializing in integrated solutions for the retail industry.
The company was wise enough to migrate gradually to the new system. They
didn't want to put their main business processes in danger by a big bang
approach. By migrating piece by piece they wanted to reduce the risks.
Data present in the files on the mainframe were migrated to the Armature
database: articles, suppliers etc. However, no filtering or cleaning of
these records was done. The result: although the ERP system was brand
new it reflected right from the start 25 years of history, complete with
all the data entry errors made in the course of all those years. The limitations
of the legacy application were replicated to the new system, like duplicate
records for suppliers because of some obscure limitation in the legacy
application.
The
effect is that new functionality simply cannot be used: multiple records
for the same supplier means that supplier scoring is unnecessarily complicated
or even impossible.
Develop
a strategy
A
company that is planning to get rid of a legacy application by migration
to a new application (ERP or otherwise) should develop a strategy how
to deal with the legacy data. The worst possible approach is just copying
the old data to the new system without giving further attention to this
process.
Note
that this dealing with legacy data is quite commonly the reason for setting
up a data warehouse system. Much can be learned from the methodologies
and tools used in the data quality and data warehouse world.
When
you are planning to migrate your legacy data to a new environment, you
should basically treat it as a data warehouse project. Sensible steps
are:
- Profile
the data to be migrated: what is the content of the legacy data
stores, what abnormalities can be discovered, can the data be understood
at all? SQL can help you; or specialized tools like those of Metagenix.
- Develop
an approach for what to do with abnormalities found: It makes sense
to create a work-in-progress area between the legacy application and
the new system. This is quite common in the data warehouse world where
it is called the staging area; here the data can be analyzed and if
necessary the data quality can be improved. If a lot of data have to
be migrated (i.e,. more than can be handled by the human hand and eye)
it is worthwhile to look at data quality tools or external parties that
can do data checking. Vendors like Trillium or HumanInference are experts
in this area.
- Have
a clear picture of the features required of the new application:
If supplier management is an issue the supplier related data in the
new application should make that possible. And the legacy data should
be transformed such that they will fit this purpose.
User
Recommendations
The
goal should be to maximize the investment in the new system. Migration
of legacy data should be done in such a way as to maximize the use of
the new application.
Develop
a strategy to handle problems.
Most
important: be aware that problems in legacy data are the rule and
not the exception.
About
the Author
Jan
Mulder is solution architect in the newHP services organization in the
Netherlands. With ten years of experience in system integration, he is
the architect of several data warehouse systems. In his opinion other
areas like application integration and data migration could and should
learn from data warehouse methods and techniques.
Jan can be reached at jan.mulder@hp.com.