Event Summary
According to an announcement by International Business Machines on Thursday
December 16, 1999, IBM is working with German telecommunications services company
Deutsche Telekom to assemble the largest data warehouse in the world. When complete,
the warehouse will contain up to 100 terabytes of customer and call records,
to be used for Customer Relationship Management (CRM) applications. The warehouse
will be built by T-Nova, Deutsche Telekom's systems integration subsidiary,
and will use IBM's RS/6000 SP parallel processing servers and IBM's DB2 Universal
Database. The customer expects to have 25 terabytes of data loaded by the third
quarter of 2000.
Market
Impact
As
a point of reference, 100 terabytes is 109,951,162,777,600 bytes of data. Very
few data warehouses have extended into the multi-terabyte range. The cost of
the hardware associated with an effort of this scale is enormous. In addition,
the largest cost of a warehouse is the manpower and processing required to actually
load the transactional data into the warehouse. If this effort is successful,
other large organizations may consider enterprise scale data warehouses of similar
size. First Union Bank has already announced plans to increase the size of its
customer data warehouse to 27 terabytes by early next year. As part of their
efforts to convince clients to partner with them, IBM ran tests and set records
for price/performance and power against one terabyte of user data on Windows
NT using IBM DB2 Universal Database on a 32-node cluster of IBM's Netfinity
servers during January of 1999. According to IBM, it exceeded Oracle's query
speed against the same data by a factor of 89. DB2 Universal Database uses a
cost-based optimizer to improve query speed using a variety of methods (i.e.
I/O and CPU), and can route queries to summary tables to avoid costly join processing.
IBM has invested a great deal to improve the power and scalability of DB2 Universal
Database, and clearly has Oracle looking over its shoulder.
User
Recommendations
Customers considering large-scale data warehouses should include IBM's DB2 Universal
Database on a short list of databases to be considered. IBM has clearly made
great strides in optimizing its product for multi-terabyte data stores. During
the design phase, developers should take great care to make sure that their
database design and technical architecture are highly scalable, since data warehouses
almost always end up much larger than the customer expected.