Access to Critical Business Intelligence: Challenging Data Warehouses?
Written By: Predrag Jakovljevic
Published On: July 15 2005
Direct Access Rather Than a DW for Mid-Market?
For a long time, data warehousing used to be synonymous with business intelligence (BI), to the extent that there is a deep ingrained belief that BI cannot be conducted without a data warehouse (DW). Indeed, when companies are dealing with a deluge of data, it helps to have a DW, since it offers large corporations the ability to leverage information assets to support enterprise reporting and analysis. DWs also provide a technical solution to the problem of multiple systems, separate data stores, and rapidly expanding historical data, since information is extracted from various transaction-based systems, such as spreadsheet, enterprise resource planning ( ERP), supply chain management (SCM), or customer resource management (CRM) systems, and stored in a central repository where it is transformed, cleaned, and consolidated.
During the 1990s, this model grew to form the basis for an entire data warehousing industry, with supporting hardware, software, and consulting vendors (see The Necessity of Data Warehousing).
Many data warehousing proponents also believe that a transaction database cannot support the concurrent demands of both enterprise systems and BI applications, since the argument is that business users querying the database will decimate the performance of the entire transactional system. However, a less known detail might be that the major database vendors have meanwhile created relational database management systems (RDBMS) that are entirely capable of supporting both functions, but most of the enterprise applications and BI vendors continue to use only a small fraction of the total functionality available in the source database management systems (DBMS). Namely, their product developers tend to focus rather on cross-platform design, causing them to leverage only the limited amount of functionality that the major databases have in common.
Conversely, rather than duplicating these tools, Vanguard tries to optimally leverage them as part of the overall solution by introducing Direct Access, the technological enterprise information integration (EII)-like foundation of its Graphical Performance Series (GPS) BI solution. This solution delivers integrated enterprise information directly to business decision-makers without relying on a DW, thereby potentially saving time, increasing business agility, and reducing costs. The Vanguard GPS solution has since been able to directly access the information stored in enterprise systems, without requiring businesses to move or stage data, or invest in complex and unwieldy data warehousing technology.
Businesses are collecting an ever-expanding amount of data in multiple systems, formats, and locations, making it increasingly difficult to maintain and synchronize a redundant copy of the original data in a central repository. As the complexity of this process increases, the administrative costs go up, and the business value decreases. At the same time, the highly competitive nature of the relational database market drives continuous improvements in the functionality of the source databases.
As the processing power and functionality of the databases improve, the most logical approach might then be to size the database server to manage both the transactional and reporting workloads. Vanguard believes that it is essential to leverage effectively all of the available database capabilities, so that the GPS solution includes native data access for each database and takes full advantage of the distinct features of each. The vendor's experience has reportedly shown that properly sizing the database server to support transactional and reporting functions can be far less expensive, while adequately effective, than building and maintaining a redundant DW. Further, the IT management costs of tuning the transaction database are likely less than the cost of the ongoing maintenance required by a DW, which becomes a mission on its own, to a degree that the enterprises even forgot the original purpose of the DW.
Part Six of the Business Intelligence Report Status Quo series.
Improving the DW Model
When the DW model was conceived, it was believed that it was necessary to stage data in a warehouse to provide reasonable query performance for users. However, this is a case where query performance can be drastically improved simply by taking advantage of the features in the source databases, since Oracle, IBM, and Microsoft now offer declarative summarization capabilities that automate the process of creating and maintaining summary tables. Called materialized views, materialized query tables, or indexed views, these database features, combined with Vanguard's "summary-aware" metadata feature, can enable users to specify and deploy declarative summarization.
Vanguard makes use of this capability to limit the processing impact of user queries and maximize the performance of the solution. By incorporating declarative summarization directly into their database engines, the major database vendors might have removed, in some instances, a primary justification for data warehousing. The database heavyweights Oracle, IBM, and Microsoft are also devising ways to leverage XML and enable multi-format integration with data other than their own.
In fact, Oracle's BI solution, although featuring data warehousing and extract/transport/load (ETL) capabilities, often tends to avoid unnecessary data consolidation, given that historically, companies would dump all the information into a DW. They believed that if they put all of their data into a single database/DW they could actually eliminate the problem of functional silos and everybody would share information and be working off the same playbook.
Yet, the actual findings are too often that when a company builds a DW, the focus becomes one of getting the data from the source systems into the warehouse, whereas users are still left to their own devices to develop reports or write queries, which is a time consuming process. Thus, the right way to do it might be to keep management reporting in the same system with the transactions themselves, so that they bring along with them all the context of why the transaction happened. That approach enables users to drill into the information and explore the metrics/key performance indicators (KPI) all the way down to the very transaction that may have caused the problem. Additionally, by leaving everything in the same system or in the source instance where the transaction occurred, the users can actually drill through a transaction and fix the error on the spot if something is wrong.
Furthermore, as the Sarbanes-Oxley (SOX) and other reporting requirements hold businesses to higher standards of accuracy and accountability, business transparency is vital, but a DW, by definition, is a copy of the original data records. DWs are constantly being shifted, updated, or consolidated, and they combine data from multiple systems, each with different business rules. On the contrary, Vanguard's Direct Access pulls information directly from the transaction systems—the real systems of record—so that the question of what "version" of the data is reported becomes obsolete.
The issue is also whether there is a point of integrating a large amount of data in a DW when one only needs a small sliver of it. Given that the nature of corporate information is dynamic, trying to keep it replicated and synched in multiple databases when, for example, merged with another entity, is impractical, especially if data is accessed infrequently. Changes to a traditional DW model to bring in new data can take months, whereas EII-like data federating solutions are not as fragile as the procedural ETL scripts and can accommodate the necessary changes much more quickly.
Need for Clean Information
Another historical driver for the DW model has been the perceived need to "clean" information prior to making it available for reporting, and many companies are still investing large amounts of time, human resources (HR), and money in cleaning their data and applying consistent terminology to it as a necessary step in building their DW.
As was mentioned earlier on, some vendors and their customers believe that the transaction systems are "the systems of record" and that DWs create an artificial split between transactions and reporting. Yet, the more data is copied, duplicated, or modified, the less accurate it becomes, and it might be more efficient and cost-effective to correct data problems at the source than to move and modify databases. As another illustration along the Oracle's aforementioned lines, Vanguard's customers reportedly often find that they can drive rapid data quality improvements with the level of visibility that Direct Access brings to their systems, including providing a sole place to maintain information and a more streamlined data management process.
There is also a perception that if business users are given access to enterprise databases and raw query tools, they will create havoc in the system, which is a possibility—unless the BI product developer understands the potential problem and addresses it as a business-critical factor. Accordingly, Vanguard has developed the expertise and the business rules specific to the native databases to eliminate the potential for damage to the transaction systems, whereby business users do not have direct control over the structured query language (SQL), so that they cannot create invalid queries. To that end, the appropriate semantic layer ensures that business users are shielded from the complexity of the underlying systems, and the enterprise transaction systems can function smoothly.
For enterprises with multiple source databases, Vanguard has found from experience that it can be more efficient to optimize each one separately than to build a central DW, since Direct Access streams data packets to the user in parallel, where they are assembled on the user's PC. The total number of databases that contribute information should not affect cube-loading performance, since each contributes independently.
Vanguard believes the use of a semantic model and integrated middleware enable an effective use of the processing power available at each tier—database, middle tier, and client, which design obviates the need for a central physical DW, and provides scalable integrated information on-demand, even from multiple disparate data sources. Demand-driven BI requires a rich semantic layer that "virtualizes" the source databases for the end user, and Vanguard's UIM is thus based on a three-level metadata solution that brings multiple disparate sources together and presents them in a coherent enterprise view.
- The Business Model—drives business user interaction and provides the essential translation of database terminology into functional business language;
- The Runtime Model—accepts user requests, identifies the correct data sources, and generates the necessary queries; and
- The Database Model—accesses the database tables using techniques native to each specific database.
Having a central repository/DW makes it convenient to "massage" the data and make predictions, but, in today's business environment, information is often scattered throughout several disparate databases, sometimes in different applications, all of which are probably in different physical locations. Hence, other BI/enterprise performance management (EPM) vendors too are stepping in with solutions that can collect information from multiple data sources. Silvon Software, for instance, has evolved its client/server-based Data Tracker product so that it now extracts data out of a transactional enterprise system and loads it into a rejuvenated Web-based solution called Stratum, which can pull from ERP, CRM, even point-of-sale (POS) data, and then integrate and validate it all. Stratum's current modules with built-in BI capabilities include CRM analysis; inventory performance; marketing performance; manufacturing performance; profitability performance; sales performance; and supplier relationship management (SRM) analysis.
Challenging Existing DW Solutions
The above nuggets of facts and counter-facts have firmed up the belief that, while data warehousing is one means to an end, it is no longer a requirement. Despite the widespread adoption of the data warehousing concept within large organizations, fundamental challenges and problems continue to plague real-world implementations. From a business perspective, a DW is a complex, time-intensive project that requires investments in time, people, and hardware, and yet so often with an unclear business case.
Data warehousing is also often difficult to tie to direct improvements in the bottom line, which is a bad practice in the current business climate, where major IT projects need to show a clear payback within a defined time interval before they can be funded. Once a DW project is begun, the sheer scale and complexity of the task creates an element of risk for most businesses, since legacy data from older systems, problems with data extraction and integrity, and problems with content relevance all contribute to the challenge.
Recent moves by some pure-play EII vendors to align themselves with BI counterparts may confirm the connection between the two technologies. Namely, Composite has signed a deal with Cognos to build its EII tools into Cognos' ReportNet query and reporting software, whereas MetaMatrix has forged like technical partnerships with Business Objects and Hyperion Solutions.
As mentioned earlier on, back in 2003, Actuate acquired former EII specialist Nimble to integrate data query federation capabilities into its enterprise-reporting platform. By incorporating Nimble's open, XML-based data integration technology into the platform, Actuate's customers have since found it easier to design BI applications that provide an integrated view of their business. Furthermore, the incorporation of Nimble's capabilities has since enabled the Actuate BI platform to integrate more readily with a broad range of XML-enabled systems. Along similar lines of pulling in data and creating unified views relatively quickly and economically, IBM launched its Information Integrator product in 2004, while the enterprise application integration (EAI) specialist BEA Systems also announced a Liquid Data integration initiative that sits in front of databases and file systems, allowing users to search for data in various locations.
However, Direct Access (or the EII technology in a wider context) cannot always be an alternative to a DW, given these data integration solutions augment historical time-series BI reporting with fresher operational detail, rather than conduct deep, complex analytic processing, such as multi-terabyte queries, which are still needed for many businesses. The technology might be especially functional for situations where users want to get to detailed data that is usually omitted from the DW. Thus, bolting on an EII capability may allow enterprises to supplement DWs with lighter queries running directly against current, or intra-day, data from transactional systems.
Despite the adoption by Vanguard, Oracle, IBM, and a number of small niche EII players such as Certive, MetaMatrix, Avaki Corporation (recently acquired by Sybase), Composite Software, and Ipedo, the approach remains a nascent idea and, consequently, a nascent market, with more proof of concept required. These products feature the ability to map varied data in a single data model and process queries on the fly with relatively fast performance. However, using only EII for BI could make it difficult to deal with business change or analyze historical trends, while the prospects might still be concerned about safeguards for data quality in data-diverse environments, and the impact of EII on transactional systems is always a real concern (i.e., the EII chain is only as fast as its slowest component).
Hence, the technology is nowadays still far from mainstream adoption, as opposed to more mature technologies la ETL tools, database replication, and gateway technologies, as confirmed with IBM's recent purchase of the ETL leader Ascential. In the meantime, the virtual data unification/EII preaching vendors must strive to educate the market and gain a critical mass of customers for the approach. The successful ones might, for the time being, be those that position their tools to complement, rather than replace, conventional data warehousing. Some recent surveys do cite a notable percentage of users mentioning the lack of centralized DW as a key reason for postponing the adoption of analytic tools, like dashboards, within their companies. Thus, while customers are designing and implementing modern information architectures, they might leverage EII as a stopgap technology to immediately explore data in scattered sources. The market has been somewhat validated by Sybase's acquisition of Avaki, Acutate's acquisition of Nimble, and the partnerships between Cognos and Composite and Business Objects and Ipedo.
This concludes Part Six of a seven-part note.
Part One detailed history and current status.
Part Two looked at contemporary BI tools.
Part Three described what is available.
Part Four presented the BI/CPM market landscape.
Part Five discussed Geac and Point Solution vendors.
Part Seven will make Recommendations.
About the Authors
Olin Thompson is a principal of Process ERP Partners. He has over twenty-five years experience as an executive in the software industry. Thompson has been called "the Father of Process ERP." He is a frequent author and an award-winning speaker on topics of gaining value from ERP, SCP, e-commerce, and the impact of technology on industry.
He can be reached at Olin@ProcessERP.com
Predrag Jakovljevic is a research director with TechnologyEvaluation.com (TEC), with a focus on the enterprise applications market. He has nearly twenty years of manufacturing industry experience, including several years as a power user of IT/ERP, as well as being a consultant/implementer and market analyst. He holds a bachelor's degree in mechanical engineering from the University of Belgrade, Yugoslavia, and he has also been certified in production and inventory management (CPIM) and in integrated resources management (CIRM) by APICS.