Massive Data Requires Massive Measures


From Sun Tzu’s The Art of War:

In the operations of war, where there are in the field a thousand swift chariots, as many heavy chariots, and a hundred thousand mail-clad soldiers, with provisions enough to carry them a thousand Li, the expenditure at home and at the front, including entertainment of guests, small items such as glue and paint, and sums spent on chariots and armor, will reach the total of a thousand ounces of silver per day. Such is the cost of raising an army of 100,000 men.

It is evident from the above quote that the price of war was very high even in ancient times. It comes to no surprise that the ongoing war for increased presence in the data warehouse and information management space islikewiseby no means cheap. In the corporate world, the information explosion has made it difficult for big organizations to collect, clean, store, and analyze such volumes of information. This has to do not only with quantity, but also with the number of sources, quality requirements, and the speed at which data is generated.

The amount of information that companies generate is still growing at a constant pace, and the complexity of the analysis of this information resides in both its volume and in the way it has to be processed. Data can be processed to accomplish simple executive report publishing or to perform complex data mining for fraud detection.

Another factor that enhances this complexity is that the culture of most businesses is changing: the traditional data analytical process has reached beyond the common business analysis areas to get to the executive and advisory boards, as well as operative areas of the organization—potentially everyone in the organization can add and promote data analysis accordingly.

Nowadays, many users are exploring and analyzing information without even knowing it. Besides, this incrementally data generation within the corporate world has reached a point where much of the analysis takes place at the moment data is captured, or with a very small delay: real-time analysis is here.

The Repositioning Game
When companies realized that their traditional data warehouse implementations were not enough to solve their big data analysis problems, they turned their attention to different and more advanced solutions that would help them improve all their analytical processes.

Currently the data warehouse and “big data” data analysis spaces are changing their composition because some business software companies addressing this space are trying to reposition their market presence in the field. Directly or indirectly, almost all major software companies made expensive adjustments during the last couple of years to redirect or reinforce strategies regarding its position in the data warehousing space and analysis of large data volumes. Here’s a quick look at some of these recent events:

EMC and Greenplum
In the data warehouse and business intelligence (BI) area, the acquisition of Greenplum by the information management company EMC rang the bells of war in the field of data warehousing and massive data analysis. With this acquisition, EMC unfolded a strategy to locate itself in a good position for a mature but still growing market. While large companies increasingly need tools to quickly and accurately analyze the enormous amounts of data that they generate, software vendors have realized that this software market still has strong development potential. EMC is betting on Greenplum’s analytical database to position itself in the data warehouse game.

SAP and Sybase
Certainly, two things were important when SAP decided to acquire Sybase in a transaction worth about $5.8 billion (USD): 1) With this acquisition, SAP reinforced its strategy regarding in-memory analysis technologies and corporate mobility solutions; and 2) Sybase’s analytic server—Sybase IQ—can deliver high-speed solutions and help SAP complete the BI information cycle with competitive advantages. Mobility capabilities will be a very important part of the next generation of SAP’s analytical applications.

IBM and Netezza
On September 20 2010, IBM announced that it was about to acquire Netezza, a company based in Marlborough, Massachusetts (US). In a cash transaction worth about $1.7 billion (USD), IBM made an important move to establish itself as a big player in the big data field. Netezza holds a privileged position in the data warehouse and analytics arenas, and IBM will be able to take advantage of this leadership and its pre existing partnership to gain more presence in the market. Despite the “big money” involved in the transaction, this merger will give IBM the ability to indirectly gain important clients like eHarmony, Neiman Marcus, and Time Warner, among others. This merger is telling us that the big data market is positioned to attract the attention of almost all the major software companies and encourage them to invest in its development.

Oracle, the Sun, and Exadata
We are still seeing the effects from Oracle’s acquisition of Sun. Besides many other aspects of this merger, the one related to the data warehouse leaps to our attention. With the introduction of its Exadata Database Machine X2-8, Oracle sets itself up for serious competition in the big data segment. One of Oracle’s strategies is to deliver specific industry data warehousing solutions with Oracle Exadata Intelligent Warehouse Solutions and serve industry unique requirements. Based on features like scalability, energy efficiency, and performance, Oracle intends to keep the pace and leadership positioning with respect to other vendors in the market. Still, the question remains: Is Oracle planning a big data warehouse acquisition in the near future?

Microsoft and SQLServer
is relying on the release of SQLServer 2008 R2 Parallel Data Warehouse, which contains specific enhancements for data warehousing tasks, to gain presence in the data warehouse market. With this release, Microsoft intends to deliver a product with massive parallel capabilities and highly scalable features to compete in the space of big data analysis. Most companies already working with SQLServer can deploy this new version to achieve all the new set of functionality features. Also, with partnerships with HP and IBM, SQLServer can be delivered as a complete data warehouse appliance combining all the necessary hardware and software architecture for a fast deployment process. With this new release, Microsoft can address the enormous potential market of Microsoft platforms and databases already installed to expand in the data warehouse segment.

Teradata and SAS
One thing that can work in the big data space for software companies is to create partnerships to bring added value to the table, in this case to information giants—one in the data warehouse space and the other in the BI space—by combining efforts to deliver solutions capable of running SAS analytic capabilities under the parallel processing capabilities form of the data warehouse infrastructure from Teradata. One of the most interesting challenges with partnerships like this is to deliver a product integrated enough so that this product combination can really be transparent for the end user. The combination of Teradata and SAS aims to provide a real integration of data warehouse capabilities with state-of-the-art BI analytical functions.

The War Rages On
As a final thought, it is worth mentioning that the war for big data is far from over. Data management for large companies can be very complex and involves a number of specific issues that these types of products need to address. Software companies aiming to provide this type of software need to be able to add real value for the management of large data volumes.

This value can be reflected in terms of performance, ease-of-use, and integration with other corporate applications, as well as other important factors. Speed of process can be a very important factor, but obviously not the only factor, considering that the decision for a data warehousing or analytic database involves not only technical criteria but also business-related criteria.

Besides this spending rush, many other things are happening in the data warehouse space that are worth mentioning, such as the adoption of non-conventional database solutions such as Nosql, and distributed databases as well as the adoption of special types of technologies such as in-memory analysis technologies or column-oriented databases. There are also topics we will have to cover in future articles related to the big data world. For large companies, the analysis of big data is and will remain an essential part of their organic composition. 

As Sun Tsu might say, in the war for the data warehouse space, spending big money is justified simply because there is enormous business potential in this area, and the players will keep expending “glue and paint” and “chariots and armors” to deliver what the industry badly needs for the analysis and management of  massive data.

comments powered by Disqus