Forgot password?
|
|
|
|
We were unable to sign you in.
Please verify your user name and password and try again. If you do not have a TEC account, register now.
Read Comments The Necessity of Data Warehousing
M. Reed - August 2, 2000

Why the market is necessary

Data warehousing is an integral part of the "information age". Corporations have long known that some of the keys to their future success could be gleaned from their existing data, both current and historical. Until approximately 1990, many factors made it difficult, if not impossible, to extract this data and turn it into useful information. Some examples:

  • Data storage peripherals such as DASD (Direct Access Storage Device) were extremely expensive on a per-megabyte basis. Therefore, much of the needed data was stored offline, typically on magnetic tape.

  • Processing power was very expensive as measured in MIPS (Millions of Instructions per Second). Mainframes had to reserve most of their processing power for day-to-day operations, reports could only be run overnight in batch mode (without interaction from the user).

  • Relational database technology was still in its infancy, and server engines were not powerful enough to support the data loads required.

  • The type of programming that had to be done with third generation languages (3GL's) was tedious and expensive. Fourth generation languages were needed to abstract some of the required coding, but 4GL's were still in their infancy.

Most operational data is stored in what is referred to as an OLTP (On-Line Transaction Processing) system. These systems are specifically designed for high levels of transaction volume with many concurrent users. If the database is relational, it has probably been "normalized" (the process of organizing data in accordance with the rules of a relational database). If the database is non-relational, custom programs have to be written to store and retrieve data from the database. (This is often accomplished with the COBOL programming language). Whether relational or non-relational, the very design that makes an OLTP system efficient for transaction processing makes it inefficient for end-user queries. In the 1980's, many business users referred to their mainframes as "the black hole", because all the information went into it, but none ever came back out - all requests for reports had to be programmed by the Information Systems staff. Only "pre-canned" reports could be generated on a scheduled basis, ad-hoc real-time querying was virtually impossible.

To resolve these issues, data warehousing was created. The theory was to create a database infrastructure that was always on-line, contained all the information from the OLTP systems, including historical data, but structured in such a way that it was fast and efficient for querying. The most common of these schemas (logical and physical database designs) is known as the star schema. A star schema consists of facts (actual business facts) and dimensions (ways of looking at the facts). One simple way to look at a star schema is that it is designed such that the maximum amount of information can be derived from the fewest number of table reads. Another way to reduce the amount of data being read is to pre-define aggregations (summaries of detail data, such as monthly total sales) within the star, since most queries ask questions like "how many were sold last month?"

Data warehousing also led to the development of the concept of metadata management. Metadata is data about data, such as table and column names, and datatypes. Managing metadata makes it possible to understand relationships between data elements and assists in the mapping of source to target fields. (For more information of Metadata see "Metadata Standards in the Marketplace ")

Next came the creation of Extract/Transform/Load (ETL) tools, which made use of the metadata to get the information from the source systems into the data warehouse.

Additional tools, which made use of SQL (Structured Query Language), were developed to give end-users direct access to the data in the warehouse. As time went by, the query tools became user-friendly, and many now have a parser that can turn plain English questions into valid SQL. These end-user tools are now loosely referred to as "business intelligence" tools. In addition, there are other database constructs used to assist business intelligence tools in multi-dimensional analysis of data in the warehouse. These databases are referred to as hypercubes (also known as cubes, multi-dimensional cubes, or MDB's).

Since the early 1990's, data warehouses have become ubiquitous, technology and methodology have been improving, and costs have been decreasing. In 1998, data warehousing was a $28 Billion (USD) industry, and growing at over 10% per year. In addition, a recent survey of top IT executives indicated that data warehousing would be the number one post-Y2K priority. Data warehousing is now recognized as an important way to add business value and improve return on investment, if it is properly planned and implemented.

Selection Issues

Selecting a set of products for a data warehouse effort is complex. The first and most important issue is to ensure that the Extract/Transform/Load tool that is chosen can effectively and efficiently extract the source data from all the required systems.

The selection of the ETL tool requires an understanding of the source data feeds. The following issues should be considered:

  • Many warehouses are built from "legacy" systems that may be difficult to access from the computer network. ETL tools often do not reside on the same machine as the source data.

  • The data structures of the legacy systems may be hard to decompose into raw data.

  • Legacy data is often "dirty" (containing invalid data, or missing data). Care must be taken in the evaluation of the tool to ensure it has an adequate function library for cleansing the data. Depending on the complexity of the cleansing required, a separate tool designed specifically for cleansing and validation may have to be purchased in addition to the ETL tool.

  • The ETL tool should have a metadata ("data about data") repository, which allows the data sources, targets, and transformations to be tracked in an effective manner.

  • The tool should be able to access legacy data without the need for pre-processing (usually with COBOL programs) to get the data into sequential "flat files". This becomes increasingly complex when working with filesystems like VSAM (Virtual Sequential Access Method), and files that contain COBOL Occurs and Re-Defines clauses (repeating groups and conditionally defined fields). It should be noted that a large percentage of the world's data is stored in VSAM files.

  • A final issue is whether the ETL tool moves all the data through its own engine on the way to the target, or can be a "proxy" and move the data directly from the source to the target.

Selection of the business intelligence tool(s) requires decisions such as:

  • Will multi-dimensional analysis be necessary, or does the organization need only generalized queries? Not all warehouse implementations require sophisticated analysis techniques such as data mining (statistical analysis to discover trends in the data), data visualization (graphical display of query results), or multi-dimensional analysis (the so called "slice and dice").

  • Will the architecture be two-tiered or three-tiered? Three-tiered architectures offload some of the processing to an "application server" which sits between the database server and the end-user.

  • Will the tool employ a "push" or a "pull" technology? ("Push" technology publishes the queries to subscribed users, much like Pointcast works, "pull" requires that the user request the query).

  • Will the information be broadcast over a corporate intranet, extranet, or the Internet?

  • How will the organization implement data security, especially if information is being broadcast outside the corporate firewalls?

Average acquisition cost

A minimum tool set, excluding hardware, would include a database engine, ETL tool, and a number of BI tools for the individual users. A reasonable starting point for this set would be in the range of $100,000. Costs vary greatly based on vendor selection, number of seats purchased, software maintenance fees, and many other factors. In addition, consulting services drive the cost up immensely. A full blown, large-scale data warehouse in the terabyte range with hundreds of user licenses could cost tens of millions of dollars to implement and maintain.

Conclusions

Data Warehousing has taken many years to develop into its current form, and is still an immature technology. Improvements in hardware and software have made it possible to build warehouses that are many terabytes (one terabyte is 1,099,511,627,776 bytes) in size. The ability to access current and historical data to examine market trends and to perform business analysis has proven to provide significant return on investment. In today's dynamic marketplace, companies will continue to use data warehousing to derive competitive advantage.


 
comments powered by Disqus


A Retail Sourcing Suite Built on Experience | One Vendor's Quest to Garner a Global Sourcing Ecosystem | Microsoft Dynamics AX 4.0 for Manufacturing Environments | Supplier Relationship Management: Benefits and Challenges | Software as a Service's Functional Catch-up | Software as a Service: Not without Caveats | Application Portfolio Management: Are You Getting the Most from your Enterprise Software? | Driving Factors in The Enterprise Applications Market | Understanding SOA, Web Services, BPM, and BPEL Part Two: BPEL and User Recommendations | Understanding SOA, Web Services, BPM, BPEL, and More Part One: SOA, Web Services, and BPM | Understand J2EE and .NET Environments Before You Choose | Outsourcing 101 - A Primer Part Three: Approaches and Recommendations | Financial Reporting, Planning, and Budgeting As Necessary Pieces of EPM Part Two: Challenges and User Recommendations | Financial Reporting, Planning, and Budgeting As Necessary Pieces of EPM Part One: Executive Summary | Has The BI Market Consolidation Been Crystal-Clearly Actuated? Part Three: Competition and User Recommendations. |
Has The BI Market Consolidation Been Crystal-Clearly Actuated? Part Two: Market Impact | Has The BI Market Consolidation Been Crystal-Clearly Actuated? | BI Market Consolidation Compared to ERP Market Consolidation | Analyse This | BPM Weaves Data And Processes Together For Real-time Revenues | SCE Leaders Partner To See Beyond Their Portfolio Part Three: Challenges and User Recommendations | SCE Leaders Partner To See Beyond Their Portfolios | The Art Of Distributed Development Of Multi-Lingual Three-Tier Internet Applications | The Case of A Boutique Vendor's Benefits of Focus - IRM Corporation | Why Systems Fail - The Dead-end of Dirty Data | Data Conversion in an ERP Environment | Continuous Data Quality Management: The Cornerstone of Zero-Latency Business Analytics Part 2: One Solution | Continuous Data Quality Management: The Cornerstone of Zero-Latency Business Analytics | What Makes Process Process? | Beware of Legacy Data - It Can Be Lethal | A Definition of Data Warehousing | The Next Big Thing or Integration-The Interaction Server Part 2: Possible Solutions | Shall Bifurcated Tack Reverse J.D. Edwards’ Bad Spell? | Enterprise Application Integration - Where Is It Now (And What Is It Now)? Part 2: Where Is It Now? | Enterprise Application Integration - Where Is It Now (And What Is It Now)? Part 1: What Is It Now? | The SOAP Opera Progresses - Helping XML to Rule the World | Can You Add New Life To an Old ERP System? | BEA Systems Announces WebLogic Integration | New Era of Networks Gets Blinded By the NEON | J.D. Edwards' QUEST To End Its String Of Pyrrhic Victories Part 2: The Implications | J.D. Edwards' QUEST To End Its String Of Pyrrhic Victories Part 1: The News | The Application Server War Escalates | EAI Vendor MITEM Integrates Legacy Systems With Siebel | Knosys Seeks Clarity With A Name Change | Computer Associates Jasmineii - When Is A Portal Not Just A Portal? | Hewlett Packard Makes Multiple Moves in Middleware | Where Is ERP Headed (Or Better, Where Should It Be Headed)? Part 2: Product Architecture and Web-Basing | EDI and XML Integration: Vitria Buys XMLSolutions | TIBCO Announces Results That Are 'Better Than Worse Than Expected' | Sagent Improves Its Image With SAS Partnership | Great Product: Too Bad The Architecture Doesn’t Fit | Seagate Software 'Crystallizes' Its New Name: Crystal Decisions | Informatica PowerCenter 5 Enables Enterprise Data Integration | EAI Market Consolidation Continues With Peregrine Acquisition of Extricity | IONA Purchases Netfish Technologies (And Much, Much More) | A New Era Dawns for Sybase | Evolutionary Technologies Does EAI (Always Did, We Just Didn’t Call It That) | Information Builders Did It iWay | GMAC Web-Enables Legacy Data With NEON Systems Shadow Direct | Business Objects Teams With TopTier For Analytics | Sun’s Java Won’t Be In Microsoft’s .NET – Complicate Your Integration? You .BET | Metagenix Reverse Engineers Data Into Information | Hummingbird Smells Nectar In The Corporate Portal Market | Mercator Continues to Suffer Turmoil - Can They Stay on the Map? | Tibco Takes a Pragmatic Approach to Multicasting | Talarian and NextSet Team for B2B Solutions | Informatica Powers Siebel’s New eBusiness Analytics | Implementation Acceleration Using Integration | BEA Systems Has A Broad Vision For E-Business Infrastructures | QueryObject Partners With Cognos | Knosys "in the Kno" With ProClarity 3.0 Analytical Platform | SPSS Has A New ShowCase | Did Sagent Technology Pull the Old 'Pump and Dump'? | Data Mining: The Brains Behind eCRM | Tempest Creates a Secure Teapot | Optum’s ConnectStream: First the Pieces Now the Glue | What’s Up with Computer Associates? | Now the Minnows are Eating the Minnows | Informix Decides to Start Analyzing Websites | EAI - The 'Crazy Glue' of Business Applications | SAS Institute Shoots for the Two-Stop-Shop with new Release of Warehouse Administrator | System Software Suppliers Slip Seriously | EAI Vendor Mercator Drops to a Lower Place on the Map | Syncsort Sigma Manages Database Aggregates | MicroStrategy 7 Hits the Street | CPortals Technologies Aims for the Middle | To BEA or Not to BEA: Is That the Question? | Informix Goes Vertical With Software Vendor ADRM | Evoke Software Releases Axio Data Integration Product | Vignette of an EAI Vendor (So to Speak) | Viador Teams With Business Objects | Applix Still Shows a Presence in the OLAP Market | Information Builders Announces New Release of WebFOCUS | webMethods Gets Active (Software That Is) | Sagent Technology Teams for Telco e-Business | EAI Vendor Active Software Activates Transactions | BMC Software Webs for the DBA | Business Objects Objects Again | Acta Gets Active | Parametric Technology Chills Out With Windchill Info*Engine V4 | Informix XML’s Its Metadata Transport Layer | Metadata Standards in the Marketplace – Why Do I Care? (And Where Does Godzilla Fit In?) | EAI Vendor Extricity Teams with Moai to Automate E-Commerce Systems | Computer Associates Goes E-Business in a Big Way | IBM Moves into Enterprise Application Integration | Sybase Tag-Teams with Informatica | Mercator Software Extends EAI Solutions for Insurance with XML | EAI Vendor CrossWorlds Eases Middleware Customization | Brio Technology Expands Support for WML and XML | Ardent Software: Will Informix Merger Affect their Success? | MicroStrategy Hits a Big Speed Bump on the Information Superhighway | Aspen Follows Good Quarter With Internet Launch | Brio Technology Reports Record Second-QuarterEarnings | Sybase and MicroStrategy Team on Vertical Market Portal Applications | Informatica Conforms to Metadata Standard | Business Objects Outguns Brio Technology in Patent Dispute | Is There Finally a Metadata Exchange Standard on the Horizon? | Datawarehouse Vendors Moving Towards Application Suites | Microstrategy Moves Up with e-Business | Seagate Technology Refocuses its Software Business | The Market Rewards Ardent Software Initiatives | Hummingbird Announces Extraction and Portal Strategy for ERP | Sagent Technology Reports Strong Growth | Oracle8i Release 2 - Ready to Storm the Web | Sterling Software Sees the Light with Eureka:Intelligence | Brio Technology Enters the ETL Market | More Data is Going to the Cleaners | Informix to Acquire Ardent Software-Another Vendor's Attempt at End-to-End Data Warehousing | Informatica Heads for E-Business | Acta Technology Helps Add Business Intelligence Capabilities to Major ERP Vendors | Inprise/Borland Challenges Other Vendors to Open-Source Their Database Code | Informatica Goes Multinational With Support for Unicode | Bus-Tech Speeds up Mainframe DB2 Access | NEON Systems Moves Further into Enterprise Application Integration | Hummingbird Releases Genio 4.0 With Improved Support for Oracle, Business Objects, Cognos, and NCR | Business Objects Launches WebIntelligence Extranet | Analysis of Novell and EAI Vendor Talarian Alliance | Informix Holds Fire Sale on Linux Database | Resistance is Futile: Computer Associates Assimilates yet another Major Software Firm | systemfabrik Releases an EAI Product? | Saga Continues Roll Out of EAI Tools | NCR's Teradata Database Meets Windows 2000. A Match Made in Redmond? | BMC Software Gets Slapped with Class Action Lawsuit | Software Technologies Corporation (STC) Prepares to go Public | SAS/Warehouse 2.0 Goes Live | GE Comes to Lunch. Want to Guess Who the Appetizer Will Be? | Computer Associates Splashes Into the Data Warehousing Market with Platinum Technology Acquisition | Informatica Morphs into Enterprise Decision Support Vendor | Enterprise Application Integration - the Latest Trend in Getting Value from Data |


Use this index to search for white papers related to commonly used search terms A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Others 
Recent Searches
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Others
A: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
B: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
D: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
E: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
F: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
G: 1 2 3 4 5 6 7
H: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
I: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
J: 1 2 3 4 5
K: 1 2 3 4
L: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
M: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
N: 1 2 3 4 5 6 7 8
O: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
P: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Q: 1 2
R: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
T: 1 2 3 4 5 6 7 8 9 10 11 12 13
U: 1 2 3
V: 1 2 3 4
W: 1 2 3 4 5 6 7 8 9 10 11
X: 1
Y: 1
Z: 1
Others: 1 2 3


©2013 Technology Evaluation Centers Inc. All rights reserved. Search powered by Google