Forgot password?
|
|
|
|
We were unable to sign you in.
Please verify your user name and password and try again. If you do not have a TEC account, register now.
Read Comments Biographical Information

Bill Inmon
Bill Inmon is universally recognized as the "father of the data warehouse." He has over 26 years of database technology management experience and data warehouse design expertise, and has published 36 books and more than 350 articles in major computer journals. His books have been translated into nine languages. He is known globally for his seminars on developing data warehouses and has been a keynote speaker for every major computing association. Before founding Pine Cone Systems, Bill was a co-founder of Prism Solutions, Inc.

Ralph Kimball
Ralph Kimball was co-inventor of the Xerox Star workstation, the first commercial product to use mice, icons, and windows. He was vice president of applications at Metaphor Computer Systems, and founder and CEO of Red Brick Systems. He has a Ph.D. from Stanford in electrical engineering, specializing in man-machine systems. Ralph is a leading proponent of the dimensional approach to designing large data warehouses. He currently teaches data warehousing design skills to IT groups, and helps selected clients with specific data warehouse designs. Ralph is a columnist for Intelligent Enterprise magazine and has a relationship with Sagent Technology, Inc., a data warehouse tool vendor. His book "The Data Warehouse Toolkit" is widely recognized as the seminal work on the subject.

 

In order to clear up some of the confusion that is rampant in the market, here are some definitions:

Data Warehouse:

The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way: "A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process".

He defined the terms in the sentence as follows:

  • Subject Oriented: Data that gives information about a particular subject instead of about a company's ongoing operations.

  • Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.

  • Time-variant: All data in the data warehouse is identified with a particular time period.

  • Non-volatile: Data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent picture of the business.

(Source: "What is a Data Warehouse?" W.H. Inmon, Prism, Volume 1, Number 1, 1995). This definition remains reasonably accurate almost ten years later. However, a single-subject data warehouse is typically referred to as a data mart, while data warehouses are generally enterprise in scope. Also, data warehouses can be volatile. Due to the large amount of storage required for a data warehouse, (multi-terabyte data warehouses are not uncommon), only a certain number of periods of history are kept in the warehouse. For instance, if three years of data are decided on and loaded into the warehouse, every month the oldest month will be "rolled off" the database, and the newest month added.

Ralph Kimball provided a much simpler definition of a data warehouse. As stated in his book, "The Data Warehouse Toolkit", on page 310, a data warehouse is "a copy of transaction data specifically structured for query and analysis". This definition provides less insight and depth than Mr. Inmon's, but is no less accurate.

Data Warehousing:

Components of Datawarehousing

Data warehousing is essentially what you need to do in order to create a data warehouse, and what you do with it. It is the process of creating, populating, and then querying a data warehouse and can involve a number of discrete technologies such as:

  • Source System Identification: In order to build the data warehouse, the appropriate data must be located. Typically, this will involve both the current OLTP (On-Line Transaction Processing) system where the "day-to-day" information about the business resides, and historical data for prior periods, which may be contained in some form of "legacy" system. Often these legacy systems are not relational databases, so much effort is required to extract the appropriate data.

  • Data Warehouse Design and Creation: This describes the process of designing the warehouse, with care taken to ensure that the design supports the types of queries the warehouse will be used for. This is an involved effort that requires both an understanding of the database schema to be created, and a great deal of interaction with the user community. The design is often an iterative process and it must be modified a number of times before the model can be stabilized. Great care must be taken at this stage, because once the model is populated with large amounts of data, some of which may be very difficult to recreate, the model can not easily be changed.

  • Data Acquisition: This is the process of moving company data from the source systems into the warehouse. It is often the most time-consuming and costly effort in the data warehousing project, and is performed with software products known as ETL (Extract/Transform/Load) tools. There are currently over 50 ETL tools on the market. The data acquisition phase can cost millions of dollars and take months or even years to complete. Data acquisition is then an ongoing, scheduled process, which is executed to keep the warehouse current to a pre-determined period in time, (i.e. the warehouse is refreshed monthly).

  • Changed Data Capture: The periodic update of the warehouse from the transactional system(s) is complicated by the difficulty of identifying which records in the source have changed since the last update. This effort is referred to as "changed data capture". Changed data capture is a field of endeavor in itself, and many products are on the market to address it. Some of the technologies that are used in this area are Replication servers, Publish/Subscribe, Triggers and Stored Procedures, and Database Log Analysis.

  • Data Cleansing: This is typically performed in conjunction with data acquisition (it can be part of the "T" in "ETL"). A data warehouse that contains incorrect data is not only useless, but also very dangerous. The whole idea behind a data warehouse is to enable decision-making. If a high level decision is made based on incorrect data in the warehouse, the company could suffer severe consequences, or even complete failure. Data cleansing is a complicated process that validates and, if necessary, corrects the data before it is inserted into the warehouse. For example, the company could have three "Customer Name" entries in its various source systems, one entered as "IBM", one as "I.B.M.", and one as "International Business Machines". Obviously, these are all the same customer. Someone in the organization must make a decision as to which is correct, and then the data cleansing tool will change the others to match the rule. This process is also referred to as "data scrubbing" or "data quality assurance". It can be an extremely complex process, especially if some of the warehouse inputs are from older mainframe file systems (commonly referred to as "flat files" or "sequential files").

  • Data Aggregation: This process is often performed during the "T" phase of ETL, if it is performed at all. Data warehouses can be designed to store data at the detail level (each individual transaction), at some aggregate level (summary data), or a combination of both. The advantage of summarized data is that typical queries against the warehouse run faster. The disadvantage is that information, which may be needed to answer a query, is lost during aggregation. The tradeoff must be carefully weighed, because the decision can not be undone without rebuilding and repopulating the warehouse. The safest decision is to build the warehouse with a high level of detail, but the cost in storage can be extreme.

Now that the warehouse has been built and populated, it becomes possible to extract meaningful information from it that will provide a competitive advantage and a return on investment. This is done with tools that fall within the general rubric of "Business Intelligence".

Business Intelligence (BI):

A very broad field indeed, it contains technologies such as Decision Support Systems (DSS), Executive Information Systems (EIS), On-Line Analytical Processing (OLAP), Relational OLAP (ROLAP), Multi-Dimensional OLAP (MOLAP), Hybrid OLAP (HOLAP, a combination of MOLAP and ROLAP), and more. BI can be broken down into four broad fields:

  • Multi-dimensional Analysis Tools: Tools that allow the user to look at the data from a number of different "angles". These tools often use a multi-dimensional database referred to as a "cube".

  • Query tools: Tools that allow the user to issue SQL (Structured Query Language) queries against the warehouse and get a result set back.

  • Data Mining Tools: Tools that automatically search for patterns in data. These tools are usually driven by complex statistical formulas. The easiest way to distinguish data mining from the various forms of OLAP is that OLAP can only answer questions you know to ask, data mining answers questions you didn't necessarily know to ask.

  • Data Visualization Tools: Tools that show graphical representations of data, including complex three-dimensional data pictures. The theory is that the user can "see" trends more effectively in this manner than when looking at complex statistical graphs. Some vendors are making progress in this area using the Virtual Reality Modeling Language (VRML).

Metadata Management:

Throughout the entire process of identifying, acquiring, and querying the data, metadata management takes place. Metadata is defined as "data about data". An example is a column in a table. The datatype (for instance a string or integer) of the column is one piece of metadata. The name of the column is another. The actual value in the column for a particular row is not metadata - it is data. Metadata is stored in a Metadata Repository and provides extremely useful information to all of the tools mentioned previously. Metadata management has developed into an exacting science that can provide huge returns to an organization. It can assist companies in analyzing the impact of changes to database tables, tracking owners of individual data elements ("data stewards"), and much more. It is also required to build the warehouse, since the ETL tool needs to know the metadata attributes of the sources and targets in order to "map" the data properly. The BI tools need the metadata for similar reasons.

Summary:

Data Warehousing is a complex field, with many vendors vying for market awareness. The complexity of the technology and the interactions between the various tools, and the high price points for the products require companies to perform careful technology evaluation before embarking on a warehousing project. However, the potential for enormous returns on investment and competitive advantage make data warehousing difficult to ignore.


 
comments powered by Disqus


Meet the New (Revolutionized) Progress Software | Using ERP to Deliver E-commerce for Engineer-to-order Companies | Massive Data Requires Massive Measures | Every Angle for SAP: A Product Note | The Evolution of a Real-time Data Warehouse | Open Source Business Intelligence: The Quiet Evolution | Distilling Data: The Importance of Data Quality in Business Intelligence | Innovations in Business Intelligence | Business Intelligence: Its Ins and Outs | Contemporary Business Intelligence and Its Main Components | Can ERP Speak PLM? | A Retail Sourcing Suite Built on Experience | One Vendor's Quest to Garner a Global Sourcing Ecosystem | Microsoft Dynamics AX 4.0 for Manufacturing Environments | Supplier Relationship Management: Benefits and Challenges |
Software as a Service's Functional Catch-up | Software as a Service: Not without Caveats | The Challenges of a Business Intelligence Implementation: A Case Study | A One-stop Event for Business Intelligence and Data Warehousing Information | Application Portfolio Management: Are You Getting the Most from your Enterprise Software? | Driving Factors in The Enterprise Applications Market | Access to Critical Business Intelligence: Challenging Data Warehouses? | Understanding SOA, Web Services, BPM, and BPEL Part Two: BPEL and User Recommendations | Understanding SOA, Web Services, BPM, BPEL, and More Part One: SOA, Web Services, and BPM | Understand J2EE and .NET Environments Before You Choose | Outsourcing 101 - A Primer Part Three: Approaches and Recommendations | Financial Reporting, Planning, and Budgeting As Necessary Pieces of EPM Part Two: Challenges and User Recommendations | Financial Reporting, Planning, and Budgeting As Necessary Pieces of EPM Part One: Executive Summary | Has The BI Market Consolidation Been Crystal-Clearly Actuated? Part Three: Competition and User Recommendations. | Has The BI Market Consolidation Been Crystal-Clearly Actuated? Part Two: Market Impact | Has The BI Market Consolidation Been Crystal-Clearly Actuated? | BI Market Consolidation Compared to ERP Market Consolidation | Analyse This | BPM Weaves Data And Processes Together For Real-time Revenues | SCE Leaders Partner To See Beyond Their Portfolio Part Three: Challenges and User Recommendations | SCE Leaders Partner To See Beyond Their Portfolios | The Art Of Distributed Development Of Multi-Lingual Three-Tier Internet Applications | The Case of A Boutique Vendor's Benefits of Focus - IRM Corporation | Why Systems Fail - The Dead-end of Dirty Data | Data Conversion in an ERP Environment | Continuous Data Quality Management: The Cornerstone of Zero-Latency Business Analytics Part 2: One Solution | Continuous Data Quality Management: The Cornerstone of Zero-Latency Business Analytics | What Makes Process Process? | Beware of Legacy Data - It Can Be Lethal | The Next Big Thing or Integration-The Interaction Server Part 2: Possible Solutions | Shall Bifurcated Tack Reverse J.D. Edwards’ Bad Spell? | Enterprise Application Integration - Where Is It Now (And What Is It Now)? Part 2: Where Is It Now? | Enterprise Application Integration - Where Is It Now (And What Is It Now)? Part 1: What Is It Now? | The SOAP Opera Progresses - Helping XML to Rule the World | Can You Add New Life To an Old ERP System? | BEA Systems Announces WebLogic Integration | New Era of Networks Gets Blinded By the NEON | J.D. Edwards' QUEST To End Its String Of Pyrrhic Victories Part 2: The Implications | J.D. Edwards' QUEST To End Its String Of Pyrrhic Victories Part 1: The News | The Application Server War Escalates | EAI Vendor MITEM Integrates Legacy Systems With Siebel | Knosys Seeks Clarity With A Name Change | Computer Associates Jasmineii - When Is A Portal Not Just A Portal? | Hewlett Packard Makes Multiple Moves in Middleware | Where Is ERP Headed (Or Better, Where Should It Be Headed)? Part 2: Product Architecture and Web-Basing | EDI and XML Integration: Vitria Buys XMLSolutions | TIBCO Announces Results That Are 'Better Than Worse Than Expected' | Sagent Improves Its Image With SAS Partnership | Great Product: Too Bad The Architecture Doesn’t Fit | Seagate Software 'Crystallizes' Its New Name: Crystal Decisions | Informatica PowerCenter 5 Enables Enterprise Data Integration | EAI Market Consolidation Continues With Peregrine Acquisition of Extricity | IONA Purchases Netfish Technologies (And Much, Much More) | A New Era Dawns for Sybase | Evolutionary Technologies Does EAI (Always Did, We Just Didn’t Call It That) | Information Builders Did It iWay | GMAC Web-Enables Legacy Data With NEON Systems Shadow Direct | Business Objects Teams With TopTier For Analytics | Sun’s Java Won’t Be In Microsoft’s .NET – Complicate Your Integration? You .BET | Metagenix Reverse Engineers Data Into Information | Hummingbird Smells Nectar In The Corporate Portal Market | Mercator Continues to Suffer Turmoil - Can They Stay on the Map? | Tibco Takes a Pragmatic Approach to Multicasting | Talarian and NextSet Team for B2B Solutions | Informatica Powers Siebel’s New eBusiness Analytics | Implementation Acceleration Using Integration | BEA Systems Has A Broad Vision For E-Business Infrastructures | QueryObject Partners With Cognos | Knosys "in the Kno" With ProClarity 3.0 Analytical Platform | SPSS Has A New ShowCase | Did Sagent Technology Pull the Old 'Pump and Dump'? | Data Mining: The Brains Behind eCRM | Tempest Creates a Secure Teapot | Optum’s ConnectStream: First the Pieces Now the Glue | What’s Up with Computer Associates? | Now the Minnows are Eating the Minnows | Informix Decides to Start Analyzing Websites | EAI - The 'Crazy Glue' of Business Applications | SAS Institute Shoots for the Two-Stop-Shop with new Release of Warehouse Administrator | System Software Suppliers Slip Seriously | EAI Vendor Mercator Drops to a Lower Place on the Map | The Necessity of Data Warehousing | Syncsort Sigma Manages Database Aggregates | MicroStrategy 7 Hits the Street | CPortals Technologies Aims for the Middle | To BEA or Not to BEA: Is That the Question? | Informix Goes Vertical With Software Vendor ADRM | Evoke Software Releases Axio Data Integration Product | Vignette of an EAI Vendor (So to Speak) | Viador Teams With Business Objects | Applix Still Shows a Presence in the OLAP Market | What Good Is Information If Nobody Sees It? | Information Builders Announces New Release of WebFOCUS | webMethods Gets Active (Software That Is) | Sagent Technology Teams for Telco e-Business | EAI Vendor Active Software Activates Transactions | BMC Software Webs for the DBA | Business Objects Objects Again | Acta Gets Active | Parametric Technology Chills Out With Windchill Info*Engine V4 | Informix XML’s Its Metadata Transport Layer | Metadata Standards in the Marketplace – Why Do I Care? (And Where Does Godzilla Fit In?) | EAI Vendor Extricity Teams with Moai to Automate E-Commerce Systems | Computer Associates Goes E-Business in a Big Way | IBM Moves into Enterprise Application Integration | Sybase Tag-Teams with Informatica | Mercator Software Extends EAI Solutions for Insurance with XML | EAI Vendor CrossWorlds Eases Middleware Customization | Brio Technology Expands Support for WML and XML | Ardent Software: Will Informix Merger Affect their Success? | Oracle Warehouse Builder: Better Late than Never? | MicroStrategy Hits a Big Speed Bump on the Information Superhighway | Key Product Delays Take a Toll on Oracle Users | Aspen Follows Good Quarter With Internet Launch | Brio Technology Reports Record Second-QuarterEarnings | Oracle Buys Carleton Corporation to Enhance Warehouse Offering | Sybase and MicroStrategy Team on Vertical Market Portal Applications | Informatica Conforms to Metadata Standard | Business Objects Outguns Brio Technology in Patent Dispute | Is There Finally a Metadata Exchange Standard on the Horizon? | Datawarehouse Vendors Moving Towards Application Suites | Microstrategy Moves Up with e-Business | Seagate Technology Refocuses its Software Business | The Market Rewards Ardent Software Initiatives | Hummingbird Announces Extraction and Portal Strategy for ERP | Sagent Technology Reports Strong Growth | Oracle8i Release 2 - Ready to Storm the Web | Sterling Software Sees the Light with Eureka:Intelligence | Brio Technology Enters the ETL Market | More Data is Going to the Cleaners | Informix to Acquire Ardent Software-Another Vendor's Attempt at End-to-End Data Warehousing | Informatica Heads for E-Business | Acta Technology Helps Add Business Intelligence Capabilities to Major ERP Vendors | Inprise/Borland Challenges Other Vendors to Open-Source Their Database Code | Informatica Goes Multinational With Support for Unicode | Bus-Tech Speeds up Mainframe DB2 Access | NEON Systems Moves Further into Enterprise Application Integration | Hummingbird Releases Genio 4.0 With Improved Support for Oracle, Business Objects, Cognos, and NCR | Business Objects Launches WebIntelligence Extranet | Analysis of Novell and EAI Vendor Talarian Alliance | Informix Holds Fire Sale on Linux Database | Resistance is Futile: Computer Associates Assimilates yet another Major Software Firm | systemfabrik Releases an EAI Product? | Saga Continues Roll Out of EAI Tools | NCR's Teradata Database Meets Windows 2000. A Match Made in Redmond? | BMC Software Gets Slapped with Class Action Lawsuit | Microsoft Goes Their Own Way with Data Warehousing Alliance 2000 | Software Technologies Corporation (STC) Prepares to go Public | SAS/Warehouse 2.0 Goes Live | GE Comes to Lunch. Want to Guess Who the Appetizer Will Be? | Computer Associates Splashes Into the Data Warehousing Market with Platinum Technology Acquisition | Informatica Morphs into Enterprise Decision Support Vendor | Enterprise Application Integration - the Latest Trend in Getting Value from Data |


Use this index to search for white papers related to commonly used search terms A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Others 
Recent Searches
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Others
A: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
B: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
D: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
E: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
F: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
G: 1 2 3 4 5 6 7
H: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
I: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
J: 1 2 3 4 5
K: 1 2 3 4
L: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
M: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
N: 1 2 3 4 5 6 7 8
O: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
P: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Q: 1 2
R: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
T: 1 2 3 4 5 6 7 8 9 10 11 12 13
U: 1 2 3
V: 1 2 3 4
W: 1 2 3 4 5 6 7 8 9 10 11
X: 1
Y: 1
Z: 1
Others: 1 2 3


©2013 Technology Evaluation Centers Inc. All rights reserved. Search powered by Google