Forgot password?
|
|
|
|
We were unable to sign you in.
Please verify your user name and password and try again. If you do not have a TEC account, register now.
Read Comments

Implementing A CDQM Application

It is impossible to improve that which cannot be measured. A CDQM (Continuous Data Quality Management) tool provides a real-time, up-to-date scorecard to measure data quality within the enterprise. By checking data quality in real-time, "data fires" can be detected when they are just starting, before any real damage has occurred. Most enterprises fight fires with axes, fire hoses, trucks, and hordes of firemen, but the CDQM approach is a smoke detector. It's far less expensive to put a fire out when it's just smoldering, rather than to extinguish a blazing house fire and then remodel the entire house.

Data quality must be a constant commitment. Most companies, when implementing a data quality initiative, look at it as a massive data-cleansing project that scrubs data as part of a system upgrade or new system implementation. That approach is a lot like taking a shower at the beginning of the month and saying, "Now I'm clean!".

Without metrics and constant measurement, there is no way to verify data quality on an ongoing basis and keep the data in the system clean. And if the quality of data is not fully understood, one cannot be confident in the decisions made based upon that data. Considering that US enterprises are expected to spend upwards of $22B by 2005 on business intelligence initiatives, doesn't it make sense to implement a system to track the quality of that data?

This is Part Two of a two-part article on the importance of maintaining data quality, based on the author's experience.

Part One defined the problem of maintaining data quality to an enterprise.

When Good Data Goes Bad

The classic "garbage in, garbage out" scenario becomes all too real when there are quality problems with the data on which important decisions are based. At Metagenix, we like to tell the story of a fictional electronic parts manufacturing company we call Huntington Corp. We pieced together Huntington from several real-world companies whose data quality issues almost did them in, and whom, for obvious reasons, we cannot name. We like to tell the Huntington story because it so clearly illustrates the need for a continuous data quality tool. Take, for example, the data issues that arose when Huntington acquired rival company, Systron, and began the process of combining the two companies' customer accounts into one customer database. What started as an innocent attempt to merge this data became a mess that almost spiraled out of control.

The problem began with the two companies' customer account numbers. Huntington's were eight numbers long; Systron's were 10 alphanumeric characters long. When Huntington's IT department began merging the account data, it was decided that Huntington's eight-number account number would be the default format. All of Systron's accounts were uploaded to an eight-number format, changing all alpha characters to zeroes and truncating the 10-character format to eight, eliminating the validity of Systron account numbers in one stroke—with all the attendant downstream angst and confusion for employees and customers. Having a continuous data quality tool in place during this ETL process would prevent this data nightmare from occurring.

Another problem occurred during the Systron integration when IT combined the two companies' parts and inventory tables. Like the customer account number fiasco, Systron used alphanumeric parts numbers in relational tables from each of its divisions. Huntington only had one master table, with numeric parts listings. When Huntington's IT department combined Systron's multiple alphanumeric tables into Huntington's master table, it failed to include the system division number assigned to each division table in the new Huntington part number. As a result, customers calling to order Systron parts and familiar with a Systron part number were befuddled by Huntington's new part numbers, a disaster not fixed until it was called to IT's attention.

Other data quality issues arose even before the Systron acquisition. Huntington's accounting department was still batch processing journal and other general ledger entries on a nightly and weekly basis. Over time, Huntington's accounting department was spending more and more time researching and reconciling erroneous entries. Though Huntington Corp. accounts were clearly defined in a chart of accounts by department and type of account, the general ledger and other systems were allowing invalid account numbers, thus causing subsequent delays in invoicing and payment processing. A continuous data quality tool set up to check business rules on account numbers would have ensured that account entries were going to valid account numbers.

The same type of problem happened when Huntington installed a new CRM software system. While customer service representatives were doing their best to get the customer data needed to build effective communications with the company's customers, they were frequently not capturing addresses and telephone numbers. When Huntington's marketing or customer relations departments decided to conduct campaigns or send follow-up messages to customers, this customer contact data was missing—unbeknownst to these departments and to the detriment of these campaigns and messages. A continuous data quality tool would have produced an error report by detecting that contact information was not being captured, thereby allowing this information to be obtained in time for these customer communications.

The downside of the data quality issues in these situations is fairly obviouspoor data quality negatively impacts the value of the data used to support decision-making and operations. From as simple as the case of an incorrect general ledger account that throws off accounts payable analysis and reconciliation, to a missing email address that results in failure to notify a multi-million dollar customer of a backorder, and everything in between, data quality plays a crucial role in the support structure of today's business.

The Solution

Businesses clearly need a framework for implementing and monitoring data quality on a continuous basis as part of any business intelligence initiative. At Metagenix, we have developed this new framework, a continuous data quality management tool to work in concert with enterprise business intelligence and database systems. Built with an easy-to-use interface like simple address checkers, yet developed for robust enterprise use across departments, data warehouses and silos, Metagenix's CDQM tool works by applying specific business rules as a continuous check of data quality. The CDQM framework consists of several interconnected processes, which we have outlined below.

Business Rules Capture Repository and Interface

This system provides a meta-data repository to capture business rules and knowledge about data across the enterprise. Business rule assertions are expressed as algorithms and functions that indicate the validity of data. These assertions can be in the context of a field, a virtual record, or an entire data source. One example of a rule is that zip codes must be five or nine digits for US addresses, and must match the expected city and state fields. Another example is that the schema for the Customers table is not expected to change.

Metrics Repository

This tracks the results of processes within the CDQM system and provides historical information. A complete data quality scorecard can be constructed based upon the information stored in the Metrics Repository. The interface allows a user to view the results, and slice and dice the results data. Depending upon the job function of the user, the interface will provide different views of the scorecard. For example, the CIO might be interested in which systems are generating the most quality problems, while a DBA might be interested in table schema changes that have occurred.

Event Processor

This is an object interface that allows external applications to communicate events of special interest to the CDQM framework. For example, an ETL job might inform the system that a transfer of a file of 225,003 records time-stamped from yesterday took place at 12:03AM into table CUSTOMERS and took 14 seconds.

The CDQM framework could then be used to track execution speeds, check sums on the data, and check the timeliness of the transfers. Problems such as loading the same file twice could be immediately recognized. Likewise, recognizing an incremental increase in transfer times over the last month could spur investigation into potential difficulties in the ETL process. The Event Processor is not limited to monitoring data movements; external applications could signal a variety of events to be tracked as part of the data quality monitoring effort.

Transaction Server

This system allows an enterprise to centralize data validation. Instead of multiple applications each implementing their own validation logic, a central set of business rules is used to determine the validity of a data gram.

External applications use an object interface to transmit data to the Transaction Server, which determines the validity of the data according to the rules stored in the repository and returns a result indicating potential problems with the data. At the same time, the Transaction Server updates the Metrics Repository with the available information about the transaction. Thus, decision makers can determine the sources and causes of faulty data and adjust the business processes accordingly. Just as easily, implementing new business rules merely requires adjusting the meta-data in the Business Rules Capture Repository, rather than recoding potentially hundreds of applications that deal with data in a slightly different fashion.

Rules Checker

The Rules Checker mirrors the Transaction Server, but operates on a macro level. Instead of providing a real-time service, the Rules Checker is run periodically to verify compliance with the business rules against a variety of data sources. For instance, the Rules Checker might be run every night against all records in the order entry database to verify that all orders reference parts numbers that are in the catalog. Another example would be a check each night that the domain of the Customers->Type field is what was expected when compared to the values stored in the repository.

The Rules Checker updates the Metrics Repository, and can also generate events such as an email to a responsible manager when certain rules are violated. Imagine being able to come into the office each morning and receive an email indicating which records were loaded incorrectly last night and what's wrong with them!

Scheduler

The Schedule activates the Rules Checker processes according to schedules and dependencies determined by the user. For example, a user could specify that the check of the master customer name file should be run every night immediately following the successful completion of an ETL job.

The Metagenix CDQM framework employing these components is extremely scalable and user-friendly. All interfaces are delivered via a web browser. The repositories are implemented in standard, ODBC compliant relational databases. The Rules Checker, Transaction Server, and Event Processor are implemented from conception as massively parallel, high-performance systems capable of handling the massive amounts of data required.

Editor note: The information presented here is the opinion of the author, based on his experience in using a continuous data quality management tool. TEC does not endorse this specific product per se. This article is published by TEC because it contains some useful information for companies concerned with data quality management issues.

CDQM: In Summary

With an ever-increasing dependence on data for near and real-time decision-making and with more connectedness of databases, data warehouses, marts, and silos across the enterprise, we at Metagenix believe the call for CDQM is loud and clear.

Data becomes information when it is used in analysis upon which decisions are made. Data quality problems result in bad information, necessarily leading to bad decisions. At Metagenix, we strongly believe CDQM is the answer to the data quality problem, a problem that our technology will solve.

About the Author

Greg Leman is the CEO of Metagenix, Inc.

Metagenix, Inc. (www.metagenix.com) is a developer of data quality tools. The company builds solutions that allow organizations to monitor data quality throughout the enterprise. The company's latest product, MetaPure, is a state-of-the-art continuous data quality management (CDQM) tool. A new development in data quality analytics, MetaPure takes over where traditional data cleansing tools leave off, providing a real-time insurance policy against poor quality data in CRM, ERP, Business Intelligence, and Supply Chain Management applications across the enterprise.


 
comments powered by Disqus


A Retail Sourcing Suite Built on Experience | One Vendor's Quest to Garner a Global Sourcing Ecosystem | Microsoft Dynamics AX 4.0 for Manufacturing Environments | Supplier Relationship Management: Benefits and Challenges | Software as a Service's Functional Catch-up | Software as a Service: Not without Caveats | Application Portfolio Management: Are You Getting the Most from your Enterprise Software? | Driving Factors in The Enterprise Applications Market | Understanding SOA, Web Services, BPM, and BPEL Part Two: BPEL and User Recommendations | Understanding SOA, Web Services, BPM, BPEL, and More Part One: SOA, Web Services, and BPM | Understand J2EE and .NET Environments Before You Choose | Outsourcing 101 - A Primer Part Three: Approaches and Recommendations | Financial Reporting, Planning, and Budgeting As Necessary Pieces of EPM Part Two: Challenges and User Recommendations | Financial Reporting, Planning, and Budgeting As Necessary Pieces of EPM Part One: Executive Summary | Has The BI Market Consolidation Been Crystal-Clearly Actuated? Part Three: Competition and User Recommendations. |
Has The BI Market Consolidation Been Crystal-Clearly Actuated? Part Two: Market Impact | Has The BI Market Consolidation Been Crystal-Clearly Actuated? | BI Market Consolidation Compared to ERP Market Consolidation | Analyse This | BPM Weaves Data And Processes Together For Real-time Revenues | SCE Leaders Partner To See Beyond Their Portfolio Part Three: Challenges and User Recommendations | SCE Leaders Partner To See Beyond Their Portfolios | The Art Of Distributed Development Of Multi-Lingual Three-Tier Internet Applications | The Case of A Boutique Vendor's Benefits of Focus - IRM Corporation | Why Systems Fail - The Dead-end of Dirty Data | Data Conversion in an ERP Environment | Continuous Data Quality Management: The Cornerstone of Zero-Latency Business Analytics | What Makes Process Process? | Beware of Legacy Data - It Can Be Lethal | A Definition of Data Warehousing | The Next Big Thing or Integration-The Interaction Server Part 2: Possible Solutions | Shall Bifurcated Tack Reverse J.D. Edwards’ Bad Spell? | Enterprise Application Integration - Where Is It Now (And What Is It Now)? Part 2: Where Is It Now? | Enterprise Application Integration - Where Is It Now (And What Is It Now)? Part 1: What Is It Now? | The SOAP Opera Progresses - Helping XML to Rule the World | Can You Add New Life To an Old ERP System? | BEA Systems Announces WebLogic Integration | New Era of Networks Gets Blinded By the NEON | J.D. Edwards' QUEST To End Its String Of Pyrrhic Victories Part 2: The Implications | J.D. Edwards' QUEST To End Its String Of Pyrrhic Victories Part 1: The News | The Application Server War Escalates | EAI Vendor MITEM Integrates Legacy Systems With Siebel | Knosys Seeks Clarity With A Name Change | Computer Associates Jasmineii - When Is A Portal Not Just A Portal? | Hewlett Packard Makes Multiple Moves in Middleware | Where Is ERP Headed (Or Better, Where Should It Be Headed)? Part 2: Product Architecture and Web-Basing | EDI and XML Integration: Vitria Buys XMLSolutions | TIBCO Announces Results That Are 'Better Than Worse Than Expected' | Sagent Improves Its Image With SAS Partnership | Great Product: Too Bad The Architecture Doesn’t Fit | Seagate Software 'Crystallizes' Its New Name: Crystal Decisions | Informatica PowerCenter 5 Enables Enterprise Data Integration | EAI Market Consolidation Continues With Peregrine Acquisition of Extricity | IONA Purchases Netfish Technologies (And Much, Much More) | A New Era Dawns for Sybase | Evolutionary Technologies Does EAI (Always Did, We Just Didn’t Call It That) | Information Builders Did It iWay | GMAC Web-Enables Legacy Data With NEON Systems Shadow Direct | Business Objects Teams With TopTier For Analytics | Sun’s Java Won’t Be In Microsoft’s .NET – Complicate Your Integration? You .BET | Metagenix Reverse Engineers Data Into Information | Hummingbird Smells Nectar In The Corporate Portal Market | Mercator Continues to Suffer Turmoil - Can They Stay on the Map? | Tibco Takes a Pragmatic Approach to Multicasting | Talarian and NextSet Team for B2B Solutions | Informatica Powers Siebel’s New eBusiness Analytics | Implementation Acceleration Using Integration | BEA Systems Has A Broad Vision For E-Business Infrastructures | QueryObject Partners With Cognos | Knosys "in the Kno" With ProClarity 3.0 Analytical Platform | SPSS Has A New ShowCase | Did Sagent Technology Pull the Old 'Pump and Dump'? | Data Mining: The Brains Behind eCRM | Tempest Creates a Secure Teapot | Optum’s ConnectStream: First the Pieces Now the Glue | What’s Up with Computer Associates? | Now the Minnows are Eating the Minnows | Informix Decides to Start Analyzing Websites | EAI - The 'Crazy Glue' of Business Applications | SAS Institute Shoots for the Two-Stop-Shop with new Release of Warehouse Administrator | System Software Suppliers Slip Seriously | EAI Vendor Mercator Drops to a Lower Place on the Map | The Necessity of Data Warehousing | Syncsort Sigma Manages Database Aggregates | MicroStrategy 7 Hits the Street | CPortals Technologies Aims for the Middle | To BEA or Not to BEA: Is That the Question? | Informix Goes Vertical With Software Vendor ADRM | Evoke Software Releases Axio Data Integration Product | Vignette of an EAI Vendor (So to Speak) | Viador Teams With Business Objects | Applix Still Shows a Presence in the OLAP Market | Information Builders Announces New Release of WebFOCUS | webMethods Gets Active (Software That Is) | Sagent Technology Teams for Telco e-Business | EAI Vendor Active Software Activates Transactions | BMC Software Webs for the DBA | Business Objects Objects Again | Acta Gets Active | Parametric Technology Chills Out With Windchill Info*Engine V4 | Informix XML’s Its Metadata Transport Layer | Metadata Standards in the Marketplace – Why Do I Care? (And Where Does Godzilla Fit In?) | EAI Vendor Extricity Teams with Moai to Automate E-Commerce Systems | Computer Associates Goes E-Business in a Big Way | IBM Moves into Enterprise Application Integration | Sybase Tag-Teams with Informatica | Mercator Software Extends EAI Solutions for Insurance with XML | EAI Vendor CrossWorlds Eases Middleware Customization | Brio Technology Expands Support for WML and XML | Ardent Software: Will Informix Merger Affect their Success? | MicroStrategy Hits a Big Speed Bump on the Information Superhighway | Aspen Follows Good Quarter With Internet Launch | Brio Technology Reports Record Second-QuarterEarnings | Sybase and MicroStrategy Team on Vertical Market Portal Applications | Informatica Conforms to Metadata Standard | Business Objects Outguns Brio Technology in Patent Dispute | Is There Finally a Metadata Exchange Standard on the Horizon? | Datawarehouse Vendors Moving Towards Application Suites | Microstrategy Moves Up with e-Business | Seagate Technology Refocuses its Software Business | The Market Rewards Ardent Software Initiatives | Hummingbird Announces Extraction and Portal Strategy for ERP | Sagent Technology Reports Strong Growth | Oracle8i Release 2 - Ready to Storm the Web | Sterling Software Sees the Light with Eureka:Intelligence | Brio Technology Enters the ETL Market | More Data is Going to the Cleaners | Informix to Acquire Ardent Software-Another Vendor's Attempt at End-to-End Data Warehousing | Informatica Heads for E-Business | Acta Technology Helps Add Business Intelligence Capabilities to Major ERP Vendors | Inprise/Borland Challenges Other Vendors to Open-Source Their Database Code | Informatica Goes Multinational With Support for Unicode | Bus-Tech Speeds up Mainframe DB2 Access | NEON Systems Moves Further into Enterprise Application Integration | Hummingbird Releases Genio 4.0 With Improved Support for Oracle, Business Objects, Cognos, and NCR | Business Objects Launches WebIntelligence Extranet | Analysis of Novell and EAI Vendor Talarian Alliance | Informix Holds Fire Sale on Linux Database | Resistance is Futile: Computer Associates Assimilates yet another Major Software Firm | systemfabrik Releases an EAI Product? | Saga Continues Roll Out of EAI Tools | NCR's Teradata Database Meets Windows 2000. A Match Made in Redmond? | BMC Software Gets Slapped with Class Action Lawsuit | Software Technologies Corporation (STC) Prepares to go Public | SAS/Warehouse 2.0 Goes Live | GE Comes to Lunch. Want to Guess Who the Appetizer Will Be? | Computer Associates Splashes Into the Data Warehousing Market with Platinum Technology Acquisition | Informatica Morphs into Enterprise Decision Support Vendor | Enterprise Application Integration - the Latest Trend in Getting Value from Data |


Use this index to search for white papers related to commonly used search terms A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Others 
Recent Searches
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Others
A: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
B: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
D: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
E: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
F: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
G: 1 2 3 4 5 6 7
H: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
I: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
J: 1 2 3 4 5
K: 1 2 3 4
L: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
M: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
N: 1 2 3 4 5 6 7 8
O: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
P: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Q: 1 2
R: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
T: 1 2 3 4 5 6 7 8 9 10 11 12 13
U: 1 2 3
V: 1 2 3 4
W: 1 2 3 4 5 6 7 8 9 10 11
X: 1
Y: 1
Z: 1
Others: 1 2 3


©2013 Technology Evaluation Centers Inc. All rights reserved. Search powered by Google