Looking for Mr. Goodsearch
While most of the technical press focuses on unstructured documents, as well as extensible markup language (XML) and hypertext markup language (HTML) search technologies, the unpublicized truth remains that more than 80 percent of today's business applications require interaction with a relational database. As Terry Moriarty noted at the 2004 Software Development West Conference, "the organization is all about the database." The other "skeleton in the data closet," or shameful industry secret, is that there is relatively little search help for the business user when it comes to locating corporate information in databases. And since locating information accounts for 25 to 35 percent of an information user's time, the lack of focus on the database search ultimately results in a productivity blind spot in media and corporate attention.
Much of the reason for this blind spot lies in the double (and sometimes confusing) meaning of the term search when applied to databases: this might mean data, or it might mean metadata (in other words, the business category of the data—such as table and column names—and descriptions). Historically, a search at the database level simply meant a data query, or specifically, a structured query language (SQL), against a table or group of tables. Thus, the traditional technology goal of software providers was to make SQL easier to generate, through wizards, natural language parsers, and so on. "Just ask your business question, and our product will do the query," was the "holy grail," or the highest aim of data access. Likewise, the goal of information technology (IT) trainers and managers was to educate users in developing SQL skills to reduce their reliance on programmers.
What's the Problem?
However, even for users who can compose left outer joins with the best of them, the problem of locating the correct database, table, and column (to query or report against) in a multidatabase enterprise (often with thousands of tables and many thousands of columns) can be formidable. This aspect of the search problem, the "search before the search," is described as the metadata search issue: in which database, schema, table, or column is the information? Business users ask characteristic questions when searching: If I wanted to ask for total commissions by month, where would I look to find my salespersons' commissions? If I wanted to create a report of sales per product, where would I find my customer sales orders? And as an IT analyst, if I wanted to integrate our customer data, where would I find customer addresses?
This metadata search problem is compounded beyond mere size and distribution of data, by wide variations in metadata design and labeling. Varied and arbitrary vocabularies, lexical mangling, and untamed formatting dispersed between table and column names all contribute to the challenge of locating the correct metadata prior to querying, reporting on, or integrating the actual data.
There are standard workarounds for this metadata search problem: (a) stay at a company long enough to become familiar with the location of everything; (b) call a programmer who is intimate with the business database and tables at issue; (c) browse through schema lists or explorers for each database, opening any candidate table that looks suspicious; or (d) query a catalog for all possible combinations of semantics and lexical structures. The metadata demon is even more intimidating if you are a traveling consultant new to a client's data stores (who ya gonna call?).
A Glass Half Full
The introduction of standardized or exemplar metadata, synonym dictionaries, and repositories helped alleviate this pain by centralizing all metadata and providing a more natural language vocabulary. However, there are still downsides to this approach. These dictionaries generally need to be manually constructed and cross-referenced with their respective source databases. With thousands of business elements, this can be a huge task to inaugurate, let alone maintain. And, more importantly, much of the key search problem still remains, namely, how to locate the desired metadata quickly among innumerable elements? A schema explorer in a metadata dictionary is nearly as extensive as its native source counterpart. Phrase or wild card searches still challenge the system and the dictionary designer for their range in lexical and synonym variations.
The goal, of course, is to make the database metadata search as easy and dynamic as a web search, but with the added ability to accommodate the unique problems of database metadata structure. Tools that can help achieve this goal are as important and useful as tools that help abstract the complexities of SQL generation from the end user. Ideally, the final goal of database information retrieval is a dynamic tool (without IT intervention) that completely hides both metadata and SQL from the business users. However, the wide variations of metadata (unlike the syntax of SQL keywords) will challenge this ideal, most likely for some time to come. A program can assist the user by making calculated guesses of correct metadata, but ultimately the human eye and the human brain will need to be involved.
Managers grappling with the complex issues of metadata management need to be aware that aside from the huge and publicized problem of integrating and retrieving dispersed document and XML metadata and data, the issue of easy database metadata searches and data access for end users is also of primary importance. The days of calling programmers (or keepers-of-the-data-keys) to the rescue every time a new business query or report is needed, are drawing to a close (or ought to be). "Where do we keep our supplier contact data?" is a question for system analysts that is one question too many in an age of true database metadata searches.
Mazda's Data Challenge—And Its Solution
This question has been nearly eliminated at Mazda North America, thanks to MetaTrieval Software's natural language metadata search tool, MetaTrieve. Mazda North America is an automotive original equipment manufacturer (OEM) which sells cars and trucks through a dealer network of more than 700 dealers throughout the US and Canada. Mazda's specialty is fun-to-drive vehicles with a sports car orientation. The new CX-7 and MAZDASPEED 6 are representative of Mazda vehicles, and are aptly portrayed in Mazda's "zoom-zoom" commercials, which memorably features a young boy whispering "zoom zoom" as a Mazda drives by.
To support sales and service of its vehicle and parts, Mazda maintains multiple databases from many vendors running on a wide variety of platforms. These include an online transaction processing (OLTP) mainframe database 2 (DB2) to support transactions, and an e-commerce universal database (UDB) database to support MazdaUSA.com. For support of marketing campaigns and reports, Mazda uses an SQL server database (which also supports the KANA eMail response system), and for customer assistance agents, it uses a customer relationship management (CRM) SQL database. A UDB database supports dealer sales and service, an SQL database shared with Ford Motor tracks safety issues, and multiple departmental Access databases support business.
Business analysts, auditors, information technology (IT) analysts, and programmers at Mazda need to generate large numbers of ad hoc queries and reports to support the analysis and planning of business initiates and dealer objectives. The information they need, however, may be distributed and duplicated across many databases, with no standard vocabularies for business elements in tables and columns. The challenge, then, is to locate information throughout the enterprise before querying, reporting, integrating, or warehousing data. The bulk of this challenge lies in the sheer immensity of the steps involved in the task:
- sifting through silos of printouts of database schemas or models
- opening each table in an explorer or schema browser tool
- trying multiple catalog searches against multiple databases
- calling a programmer or database administrator (DBA) who was familiar with the business area
To resolve this challenge, Mazda installed MetaTrieve in 2004. Using MetaTrieve, the business or IT user simply enters the business element to be searched for (for example, vehicle purchase date), and MetaTrieve locates the database, table, and column in seconds. The user now knows where information is stored. A specific task example illustrates a very common data discovery problem and two very common questions in business: Do we have this data in the company? And where is it?
Mazda and its marketing campaign partners decided to execute a dealer service campaign to entice customers to visit authorized Mazda dealers for automotive service. The deadline for the data extract was fast approaching, and the final piece of information to be included in the marketing material—the dealer hours of service—was still missing. No one in the marketing group was sure if the data was present in-house, and if so, which database, table, and column the data resided in. Someone in the company knew, of course, but who was that person? Would it be necessary to call every programmer who worked with service or dealer systems?
A user who had MetaTrieve installed was able to speed things up. The user performed a single search for dealer service hours throughout the enterprise, and found the information in seconds. The campaign file extract was complete and out the door in time for delivery to the consumer.
The World and Everything in It
Now more than ever, business information users have urgent data needs, and high expectations regarding easy, natural language search technologies. The Web and companies like Google and Yahoo! have spoiled them in this regard. Users often assume that this same ease of search should extend to the database and its metadata—and managers expect such a solution to be inexpensive and low maintenance. These assumptions have only recently been fulfilled with natural language query builders and database metadata search technologies. The time for true comprehensive enterprise searching is near.
About the Author
Daniel Oldis (email@example.com) is cofounder and chief technology officer of MetaTrieval Software, LLC. He has worked as a software consultant, teacher, and author for twenty-five years, and has been published in Software Development and Enterprise Systems Journal.