The Hidden Role of Data Quality in E-Commerce Success

  • Written By: Mark E. Atkins
  • Published: January 4 2003


E-commerce holds the promise of working seamlessly with customers, prospects, suppliers, and partners. That's why companies publish volumes of information on e-commerce sites and even open internal systems to business affiliates. Exchanging information online increases the opportunities for more sales, faster product development and fulfillment, and better relationships.

Or does it?

E-commerce also opens the door to data quality issues that can endanger revenues and relationships. New data enters systems daily from internal sources and the Web, where input is uncontrollable - resulting in inconsistencies and typos in names, addresses, and product numbers. These discrepancies can hamper data integration, generate duplicates, and prevent companies and their business affiliates from getting accurate information. Published data like product price and availability quickly becomes outdated, giving rise to misinformation and, possibly, financial and legal consequences.

Additionally, there's a communications gap when people search internal systems and online catalogs. Many people will use terminology that differs from the entries in an e-catalog in searching for an item (like laptop versus notebook), leading to delays and dissatisfaction, at least, and causing searches to fail and prospects to click away, at worst.

Previously, company representatives limited the exposure of internal data to outsiders and bridged the communications gap. But four trends arising from the nature of e-business expose and aggravate data quality issues, threatening revenue growth, online collaboration and relationships. It's important for businesses to understand these challenges and the resulting risks of misinformation in order to implement the right solutions.

Trends Accentuating Data Quality Issues

Diversity and Volume of Data: In e-commerce the unpredictable external world generates data into your operational infrastructure without controls. It's often difficult to integrate this diverse data and link it to related internal information for business analyses.

For example, repeat visitors often vary their personal data - Mark Atkins, M.E. Atkins, or Mark Akins (a typo) - or misrepresent key data (e.g., type in 000-00-0000 Social Security number because they fear privacy violations). These quality issues result in duplicates in the customer database - blocking complete views of the customer and causing one-to-one marketing to fail.

Also, given the speed of electronic exchange, data from buyers and sellers needs to be quickly integrated, synchronized, and standardized for smooth, accurate transactions. Examples include catalog updates, changes to contract terms or industry-coding schemes such as UN/SPS, or dynamic pricing based on real-time market fluctuations.

Without controls and updates for data quality, companies risk disseminating and acting on misinformation.

Disintermediation: Though eliminating the customer service representative reduces costs, that person bridged the communications gap, translating the corporate data for outsiders. Now when you expose your products through your catalog and search engine, don't assume people know your nomenclature and classification.

A typical experience illustrates the point. I visited a popular website to buy shirts, typed Egyptian cotton, and got towels. To be more precise, I entered Egyptian cotton button-down shirt, but got zero matches. Next, I tried cotton button-dwon shirt, inadvertently transposing letters. No matches, again. Finally, I typed simply button-down shirt - and received 20 pages of results! I clicked to a competitor's site.

This is a data quality problem: Not that the data was wrong, but processes couldn't reconcile disparate communications. Multiply this miscommunication by hundreds or thousands per day, and the result is real dollars in e-commerce sales! In the B2B context, the stakes are often higher.

Without an intermediary, e-commerce will breakdown unless automated processes mediate varying contexts and representations. The consequences of disintermediation are apparent in the next two trends as well.

Breadth of User Exposure: E-commerce exposes your data to customers, prospects, partners, and suppliers. If it is inaccurate, there will be transaction mistakes and business misunderstandings. Even if it is accurate, it may be unclear to users whose perspective may be different than anticipated.

Consider this product listing. The product number shows the weight of chips in a bag - 4-, 12-, or 24-ounces - and "Package" lists the bags per carton.

Your sales rep (an intermediary) knew that a local grocer who ordered the "small" size, actually wanted the 12-ounce (117JU-12) - 16 bags to a carton. But to the grocer, who doesn't know the product numbers, orders directly online, the data seems unclear. If he orders 117JU-04 (which seems to be the "small"), or if he orders based on the smallest "Package" number, he'll get the wrong size.

The more numerous your online customers and affiliates, the greater the risk of misinterpretations - which can increase the time and costs of doing business and, even worse, sabotage sales and relationships.

Privacy and Ethical Issues: When accurate operational data is available on an e-commerce site or extranet, there are still risks - data quality issues - that can reduce sales, offend people and cause legal liabilities.

An example of context determining quality relates to "good" internal data that is inappropriate to expose to the public. Your customer records probably contain information on discounts and negotiated terms. If you just open your systems to the public, this information could alienate companies that don't receive the discount, violate the confidentiality of your agreements, and give information on a company to its competitor.

Ensuring Data Quality

Avoiding data misinterpretations and ensuring the integrity and success of e-commerce initiatives require a continuous data quality process. Software designed for this class of problem must be combined with business rules customized to your data and its intended usage.

Start by ensuring the quality of initial loads when you translate legacy or supplier data into the new formats for e-business processing and searches. This must be followed by near real-time data quality filtering to keep catalogs and databases current and pure. Equally important, you must mediate the daily activity that goes against your databases, both user searches and unattended transactions.

To solve tough problems, you need mathematically based and other advanced technologies that go beyond basic conversion and reformatting and deal with data at the syntactical level, to analyze informational content. Look for these functions:

  1. Context mediation - for determining the business meaning of a word or value based on its context or adjacent data values. This usually involves parsing, lexical analysis, and multi-field correlation analysis. Context mediation is a pre-requisite for the next functions and essential when dealing with free-form text and multiple-word fields such as names, addresses, product descriptions and search queries.

  2. Normalization - for standardizing spelling, abbreviations, format and recognizing word variations and synonyms. This facilitates matching input data to internal systems and shoppers' searches to your product master. The normalization strategies used for search should also be employed when loading product masters to create consistent, searchable e-catalogs.

  3. Fuzzy retrieval - for finding data without a precise key (e.g., a product number) or under conditions in which data is inconsistent or missing. An inbound transaction or user search must be joined to an optimal set of matches from thousands, even millions, of choices in the database. Vendor software uses sophisticated database indexing and search optimization strategies to achieve speed and yield.

  4. Fuzzy matching and filtering - for measuring and ranking "possible" matches to get the best one(s) and avoid irrelevant matches. In search engines, this technology returns only the most useful "hits." In non-interactive transactions (EDI/EAI), it applies your business rules and measures of statistical certainty, to determine critical relationships within the data, such as duplication and affiliations (e.g., households, divisions of one company). Fuzzy matching is necessary because normalizing data can't eliminate all non-standard data.

What does it cost? Though prices vary, higher priced solutions yield greater automation and better quality results. For catalog content, automated processes will dramatically reduce the aggregation time and costs of manual review and provide content that is generally more detailed and consistent than what content resellers and content factories (which aggregate your own data) offer.

Software license fees are usually competitive, and some vendors offer pay-as-you-go transactional pricing models. Implementation fees may be the most expensive part. Configuring the software generally takes several weeks and accounts for up to 15 percent of the total cost. For data-intensive sites, such as e-procurement sites or e-marketplaces, the process can take several months and a majority of a site's deployment budget.

However, successful e-commerce relies on intelligible, trustworthy content. To achieve this, companies need a complete solution at their back- and front-ends, so they can harness and leverage their data and maximize the return on their e-commerce investment.

About the Author

Mark Atkins has over 30 years of experience building entrepreneurial and Fortune 1000 firms in the computer and financial industries. Since joining Vality in 1990, he has served as Chairman, President, and CEO. Under his leadership, Vality has evolved from a start-up to its current position as the leading company in the data quality and integration market. He is now driving the next level of growth, extending Vality's products and services in the business intelligence, ERP, SCM, EAI, and e-commerce sectors and spearheading major partner initiatives.

He can be reached at

For more information about Vality, consult

comments powered by Disqus