Mainstream Enterprise Vendors Begin to Grasp Content Management Part One: PCM System Attributes


SAP's recent acquisition of the former catalog management vendor A2i and IBM's acquisition of the former product information management (PIM) vendor Trigo might indicate some enterprise-wide product content management (PCM) approaches of the mainstream enterprise platform and enterprise applications or enterprise resource planning (ERP) vendors, as their responses to the need for an effective master data management (MDM) system to the widespread challenges of sprinkled data integration from multiple systems, physical locations, and diverse trading partners. Thus, PCM and PIM would be the core parts of MDM solutions that will manage any kind of master data and be seamlessly integrated into a customer's existing enterprise architecture, ideally eliminating all data duplication and making centralized customer, supplier, or product information available to other applications across the organization.

SAP, IBM, and like mainstream enterprise vendors need to solve the problems inherent to data residing in disparate systems, as enterprises are becoming painfully aware of the need to clean up their structured data and unstructured content acts to capitalize on more important efforts like regulatory compliance, globalization, demand aggregation, and supply chain streamlining, to name some. To that end, these enterprise vendors have to provide the ability to also integrate emerging radio frequency identification (RFID) data into their software, as well as full support for web services-based provisioning and consumption of data and processes.

Yet, the all-encompassing content management solution is still in the ever-evolving design stage, as vendors try to piece together comprehensive systems. Therefore, there seems to be a proliferation (and subsequent confusion about) of the pertinent terms and acronyms like enterprise content management (ECM), product content management (PCM), catalog management, product information management (PIM), records management (RM), product data management (PDM), enterprise data repositories (EDR), document management (DM), knowledge management (KM), web content management (WCM), digital asset management (DAM), enterprise information management (EIM), digital rights management (DRM), document imaging, workflow management (WM) or business process management (BPM), and many more.

Generally speaking, PCM (sometimes also called PIM) refers to a system for managing all types of information about finished products, and it is a further evolutionary step of catalog content management backed up with a workflow management. This is however different from ECM, which focuses more on document management and other unstructured editorial and web content, whereas PCM is more granular around individual data elements and focuses on highly structured product content. ECM encompasses many of the above-cited technologies used to capture, manage, store, preserve, and deliver content and documents related to organizational processes. In other words, it allows the management of an organization's unstructured information (e.g., e-mails, photos, spreadsheets, documents, etc.), wherever that information exists—stored in repositories, shuttled across networks, and managed over the course of its existence or life cycle.

This is part one of a three-part note.

Part two will present background information and lessons learned.

Part Three will address challenges.

Definition of PCM Systems

Coming back to managing structured, alphanumeric information, a PIM or PCM solution would include the ability to organize a company's product information, regardless of location, into a consolidated system of record, and be able to synchronize or distribute that information to any business partners that require it. Yet, true PCM should mean more than just the centralized repository to eliminate data duplication with a limited nugget of functionality; rather, this repository must be capable of storing all product information, while the system must be more than a point solution or an island, since it must also offer high-performance access to that information, and it must include tightly integrated functionality that can be used to drive all crucial enterprise initiatives.

First and foremost, the PCM should revolve around a single centralized repository of product information. It should be the "system of record" for all non-transactional product information and organizational intelligence about products, and eliminate data duplication and system redundancy across the enterprise. In effect, it should be the "ERP for product information" containing not only "rich product content", but also other types of related information, such as supplier information, as well as one or more supplier-specific sub-records of sourcing information for each product that allows the PCM to simultaneously drive both sell-side and buy-side initiatives. In other words, the rich product content managed by the PCM must be much more than simply transactional data about each product from the ERP or product master file (e.g., a part number, a description, and a price).

This brings us to the notion of enterprise publishing (where some PCM systems will overlap with ECM), which aims at reducing costs to create and speed deployment of all the product-related information, including user manuals, sales collateral, and web sites, that make up the complete product offering. In fact, rich product content must comprise all of the non-transactional product information within an organization, such as detailed parametric data on product specifications; merchandising text, high resolution images, drawings, diagrams, and portable data formats (PDF) for various marketing and publishing requirements; a classification scheme for organizing the products into a searchable taxonomy of categories and subcategories with category-specific attributes; product relationships to represent selling relationships (such as up-sells, cross-sells, and accessories) and structural relationships (such as assemblies, kits, and bundles); parts usage information; and finally, various product-specific services for leveraging the rich product content such as hotspots information for illustrated parts catalogs without the need for a separate system.

The term PIM has appeared more frequently lately in the discussion of global data synchronization (GDS) and syndication because of a number of market initiatives that act as catalysts for change. For example, many large retailers, including Wal-Mart, Office Depot, The Home Depot, Target, Albertsons, and Safeway have mandated their suppliers to synchronize product data via European article number (EAN)/UCCnet registry and data synchronization services. Other catalysts would include the Sunrise 2005 initiative that seeks to standardize on a format for global product identification via a new 14-digit code, and the RFID initiatives in place to bring about the rapid adoption of new radio frequency tags on all products, so that they may be more easily tracked through manufacturing and retail environments.

A full-fledged PCM system should additionally have no predetermined notion of the repository structure itself, but rather offer a fully flexible schema that can be tailored to meet the specific requirements of each enterprise and each vertical industry, and that can change over time. The PCM must be more than just a simple database application or end-user application, and more than just a standalone point solution that addresses a single functional requirement (such as UCCnet synchronization, paper print or web-based publishing, or illustrated parts catalogs). Rather, it must be a completely open system with both graphical user interface (GUI) tools for end users and multi-platform application programming interfaces (API) for programmatic access (e.g., Java 2 Enterprise Edition [J2EE], Microsoft .NET, eXtensible Markup Language [XML], web services, and simple object access protocol [SOAP]), supporting both content authoring and runtime searching, and providing a horizontal platform for building best-of-breed vertical solutions.

The like PCM system must also support all the leading middleware application stacks so that it can leverage and integrate with web application servers (WAS), enterprise application integration (EAI) and portal servers. Also, rather than a fixed web-based user interface, it should provide a flexible presentation layer that can be completely customized and tailored to particular organizational requirements and various vertical markets needs.

Finally, the PCM should be able to unify and harmonize product information stored within repositories across the enterprise, creating "a single copy of the truth" regardless of where the data resides. That is to say, the PCM must act as a centralized "hub" that plugs PCM functionality and high-performance access to highly-structured product information into all enterprise initiatives, not only at the user level but also at the enterprise integration level, for plug-and-play coordination with other extended-ERP solutions, such as customer relationship management (CRM), product lifecycle management (PLM), supplier relationship management (SRM) and supply chain management (SCM), where the vendors with broad offering like SAP or Oracle should be glad to oblige their users.

Desired PCM System Attributes

Based on the above discussion, a proper PCM system, such as the one acquired by SAP, should have the following attributes:

  • Powerful product content aggregation and cleansing, management and editing of product information, since the proper PCM system should do more than store data that used to reside in another system. Instead, it must include powerful and extensive capabilities for loading, restructuring, cleansing, normalizing, and transforming source data from a variety of electronic sources, including text, Microsoft Excel, Microsoft Access, structured query language (SQL), and XML for both flat files and relational data.

  • Classification into a taxonomy with category-specific attributes, since not only must the proper PCM systems have a completely flexible schema, it must also support multiple classification schemes, user-defined taxonomy hierarchies of arbitrary depth with category-specific attributes, multiple simultaneous taxonomies, and drag-and-drop taxonomy editing capabilities that allow the taxonomy of the fully populated repository to be completely restructured and refined over time.

  • Intelligent image management, since many systems can easily store an image as a binary large object (BLOB). By contrast, the proper PCM system must support intelligent image management with an understanding of all of the leading image formats, the ability to automatically transform images for different publishing purposes, and optimized high-performance image access and efficient image caching.

  • Integrated high-performance product search engine, since search mechanisms offered by traditional systems are not precise enough for searching product information. The full-fledged PCM system must hence include a fully integrated multidimensional search engine that is optimized for product search, with support not only for drill-down, parametric, and keyword search, but also units or measurement search, partial or contains search, and other types of search. To that end, there should be the ability to let customers search for goods without knowing product codes, that is, in a "No part number, no problem" manner.

  • Performance acceleration, with scalability up to millions of products, since traditional enterprise applications, such as ERP or CRM, are not optimized for heavy search and access loads. Similarly, a traditional relational database management system (DBMS) is slow on typical searches against large repositories, so relying on the "naked" DBMS is also a problem. Not to mention that databases have not been architected well to manage large, binary objects, since rows, columns, and SQL access are not suited for managing object like frames of a video or pages of a document. Therefore, a proper PCM system must have a self-optimizing performance acceleration layer that is able to quickly serve up product information to users and other enterprise applications.

Most catalog solutions are simple database applications that layer a thin veneer of functionality over SQL and they rely on SQL for all access to the data, whereby SQL works well with retrieving a single record from among thousands or even millions. Yet, to retrieve, for example, several thousand records from among a few million, and to limit across all of the different dimensions of the search for users to only see valid selections and valid values, that requires a multi-table join.

Also, to interactively browse and sort search results, it requires the use of cursors and temporary files, which is another thing that cripples the performance of a SQL-based DBMS. One such example would be having thirty thousand bearings and very intricate relationships of which bearings can be sold with which other bearings, which requires a system to manage and automate those relationships.

  • Cross-media publishing (web and paper or CD-ROM print), since the appropriate PCM system must drive all product content initiatives, including tightly integrated functionality not only for internal PCM, but also multichannel syndication, deployment of searchable web catalogs, and print solutions for catalogs and other printed publications. The things that people expect in a paper catalog in terms of layout, structure, and tabular orientation of product records, should also be deliverable to the web. Additionally, the ability to slice and dice a single master catalog that may contain several million products into as many customized virtual private, personalized, subset catalogs as necessary, whereby each slice looks like a complete catalog, either to the user on the web or when published to paper.

  • Database-driven print catalogs, since a full-fledged PCM system that supports print catalog publishing must do so in a way that is completely database-driven, meaning it "pushes" product information into the page layouts, rather than simply using the repository to store product information that was first entered directly into the page layout application.

  • The system must support UCCnet synchronization, and also be able to syndicate product information to multiple audiences, transforming it into a variety of industry-standard and user-defined XML and delimited text formats, on an ad hoc and scheduled basis.

  • The system must have an integrated workflow engine that can provide a framework for managing product information in a collaborative environment, and can function standalone or in conjunction with external workflow applications and systems.

  • The above-elaborated cross-platform compatibility; and enterprise scalability, since the appropriate PCM system must offer an n-tier architecture, capable of easily integrating with various deployment architectures, including a full suite of security and encryption services as well as the ability to integrate with leading user directories, such as lightweight directory access protocol (LDAP). Finally, the PCM system must provide master and slave capabilities to enable a global 24-7 deployment consisting of both staging and publishing servers.

This concludes part one of a three-part note.

Part two will present background information and lessons learned.

Part three will address challenges.

comments powered by Disqus