Metadata Standards in the Marketplace - Why Do I Care? (And Where Does Godzilla Fit In?)

  • Written By: M. Reed
  • Published: May 16 2000

Metadata Standards in the Marketplace -
Why Do I Care? (And Where Does Godzilla Fit In?)

M. Reed and D. Geller - May 16, 2000


Metadata ("data about data") is essential for data warehousing. In order to populate a database, extract data, or run a report, more is required than simply raw data. The tools involved must also "understand" the context, or meaning, of the data. This is one of the purposes of metadata. Consider 2.5. This could be an amount in dollars, yen, or euros. It could be a mail stop at a company. It could be the height in meters of a member of the Boston Celtics. Nothing about the number itself can tell you what it means. To interpret an item like this, you need metadata.

Metadata is a description of data that tells you how to interpret it. Sometimes the metadata description can be deduced from the structure or the column names in a database; sometimes it is found in the programs that use it, and sometimes it resides only in comments or text. Frequently it is a combination of these.

When dealing with a single database, the use of metadata is as natural as swimming is to a fish - and just as invisible as the water. However, when dealing with heterogeneous environments, things get much more opaque. As one TEC analyst put it, once you've begun building and maintaining a data warehouse, metadata problems begin to surface "like Godzilla lurching out of Tokyo harbor." Metadata becomes important in a data warehousing context because the value of a data warehouse derives from the possibility of having application suites that tie the source databases with the target data warehouse and produce the reports that give a data warehouse value.

If you write your own suite of programs or can find a single vendor solution that handles all of your needs, you might still be able to get away with ignoring the metadata. But in most cases you'll need to tie together applications from different vendors, possibly including some homegrown ones, to make your data warehouse work. It's at this point that you start paying attention to metadata and, more specifically, metadata standards.

A simplified definition of metadata: One type of metadata defines the meaning of data. It is made up of entities, attributes, and relationships. For example, a table is an entity. It has attributes such as a name. A column is also an entity. It also has attributes such as a datatype. A column is related to a specific table. This is an example of a relationship. Any database has hundreds or thousands of entities, each entity having many attributes and relationships. It is this information that is necessary for any useful manipulation of data.

This note gives a high-level overview of metadata from the viewpoint of warehouse tools and repositories, and the standards that have been promulgated to support its use and interchange.

Importance to the Customer

Why should an Information Technology manager be interested in standards for metadata? The data warehousing industry is rapidly building "suites" of applications that will allow for:

  1. The population of a data warehouse or data mart

  2. Intelligent reporting and analysis of that data.

In order to do this, the tools must be provided the metadata information about the source systems and the target data warehouse. Most suites on the market are incomplete, so it is common for customers to purchase different tools from different vendors. If each vendor tool has its own metadata store (often referred to as a "repository"), the customer has to supply metadata information individually to each tool, repetitively.

Current metadata repositories often use static metadata with batch file based exchange mechanisms. When tables change, each tool must have its metadata definitions refreshed. As the number of tools and the volatility of the data warehouse increase, this becomes extremely cumbersome.

Ideally, distributed metadata repositories could dynamically exchange information and keep each other synchronized. This would greatly reduce the customer's workload and ensure correct metadata across domains and life cycles. (Note: currently, even tools from the same vendor may not be able to share metadata).


In order for metadata to be useful, it must be represented in a format the data warehouse tool can understand. It must also be accessible to the tool. Over the years, vendors had many different methods of metadata representation and storage. Recently, they have formed standards bodies and reduced the number of standards to two. As Richard Soley, Chairman and CEO of the Object Management Group described it to TEC, "Better two standards than ten". It certainly could be argued that one would be even better than two, but that does not appear likely at this time.

OMG is the primary standards body in this scenario, and has presented a widely supported standard called CWMI (Common Warehouse Metadata Interchange).

The second standards body involved is the MetaData Coalition (MDC), to which Microsoft handed over it's standard, OIM (Open Information Model).

Much work has been done to bring the standards closer together, and an XMI (XML Metadata Interchange) "bridge" has been written to allow OIM-compliant products to interact with CWMI-compliant products. However, the standards are substantially different in their implementation. The major differential in the implementations is that OIM is based strictly on Microsoft standards and products, and CWMI is an open standard that will also work on UNIX, mainframes, and other systems.

Glossary of Terms

OMG: Object Management Group. An international organization founded in 1989 to endorse technologies as open standards for object-oriented applications. The consortium now includes over 800 members.

MDC: Meta Data Coalition. A consortium founded in 1995 with close to 50 vendors and end-users whose goal is to provide a tactical solution for metadata exchange.

CORBA: Common Object Request Broker Architecture. A standard from the OMG for communicating between distributed objects (objects are self-contained software modules). CORBA provides a way to execute programs (objects) written in different programming languages running on different platforms (i.e. UNIX, mainframe, Windows), no matter where they reside in the network. The CORBA standard competes to some degree with DCOM, although through COM-CORBA bridging products, both can be used in cooperation.

IIOP: Internet Inter-ORB Protocol. The CORBA messaging protocol used on a TCP/IP network. It allows programs (objects) to be run remotely in a network. IIOP links TCP/IP to CORBA's General Inter-ORB protocol (GIOP), which specifies how CORBA's Object Request Brokers (ORBs) communicate with each other. When a user accesses a Web page that uses a CORBA object, a small Java applet is downloaded into the web browser, which invokes the ORB to pass data to the object, execute the object and get the results back.

DCOM: Distributed Component Object Model. Microsoft's technology for distributed objects. DCOM defines the remote procedure call, which allows those objects to be run remotely over the network. DCOM only functions in a Microsoft Windows environment.

UML: Unified Modeling Language. An object-oriented design language from the OMG. Many design methodologies for describing object-oriented systems were developed in the late 1980s. UML "unifies" the popular methods into a single standard, including Grady Booch's work at Rational Software, James Rumbaugh's Object Modeling Technique and Ivar Jacobson's work on use cases. In only four years, UML has become the software industry's dominant modeling language. UML 1.3 was ratified at OMG's meeting in November 1999. Until June of 1999, Microsoft had their own "flavor" of UML, but they now adhere to the standard. MOF: Meta Object Facility. This OMG specification provides a set of CORBA interfaces that can be used to define and manipulate a set of interoperable metamodels. MOF is a key to integration of metamodels across domains.

XML: Extensible Markup Language. The World Wide Web Consortium's document format for the Web that is more flexible than the standard HTML (HyperText Markup Language) format. While HTML uses only predefined tags to describe elements within the page, XML allows tags to be defined by the developer of the page.

IDL: Interface Definition Language. A language used to describe the interface to a routine or function. For example, objects in the CORBA distributed object environment are defined by an IDL, which describes the services performed by the object and how the data is to be passed to it.

SGML: Standard Generalized Markup Language. An ISO standard for defining the format of a text document. An SGML document uses a separate Document Type Definition (DTD) file that defines the format codes, or tags, embedded within it.

DTD: Data Type Definitions. A language that describes the contents of an SGML document. The DTD is also used with XML, and the DTD definitions may be embedded within an XML document or in a separate file.

XMI: XML Metadata Interchange. A method for two MOF-compliant repositories to exchange information. When the OMG was questioned by TEC about Microsoft's position on support for XMI, Grady Booch, Chief Scientist of Rational Software and one of the developers of UML, stated that they "could not comment publicly". TEC received the same "no comment" response from every member of the OMG that we questioned.

CWMI: Common Warehouse Metadata Interchange. A universal data format for interchange of metadata between data warehouse and business intelligence products. Developed by the Object Management Group in conjunction with a consortium of over 700 vendors.

OIM: A competing standard to CWMI. Written by Microsoft and turned over to the MDC. Co-developed by over 20 companies. Based on standards such as SQL and COM, and used for metadata interchange.

Market Predictions

We believe that the variance between the MDC and the OMG standards will continue to shrink, due to market pressure from major customers who are growing tired of having to hand-craft integration strategies between business intelligence and data warehousing products (probability 80%). A single standard for "plug and play" metadata interchange will be a powerful market force in the future, especially for the vendor who implements it first.

Customers are becoming increasingly frustrated at the level of effort required to integrate different vendors "best of breed" tools. There will come a time very soon when they will refuse and vendors will be forced to work together, whether they get along with each other or not.

Vendor Recommendations

Ensure compliance with the standards promulgated by the Object Management Group. Much has been done to ensure interoperability between the two standards (i.e., the Microsoft XML Interchange Initiative), but it is not yet a perfect world.

Some vendors have created proprietary application programming interfaces to allow other vendors to interoperate (for example the Informatica MX2 API), but this limits customers to the use of tools whose vendors have developed the "plug-ins" to the Informatica product. Other vendors have taken a pure XML approach, which is a more open course of action.

User Recommendations

Customers must make their data warehousing vendors carefully articulate what standards have been used in the creation of their product, or if the product is completely proprietary. As the various tools are chosen, care must be taken to ensure that each additional tool interoperates with the others in the required manner.

In a heterogeneous environment, it becomes even more important, since the Microsoft solution is a Windows-only one unless bridging products are used. It is almost unheard of for any major company to have a completely homogeneous environment, since many legacy and mainframe systems are still in use (i.e., IBM MVS, DEC VMS, Adabas, MUMPS), therefore, metadata is extremely important to the customer and must be kept in mind at all times.


comments powered by Disqus