In May 2010, The Economist featured a review of SAP’s announcement to purchase Sybase and introduce in-memory technology in its next version of Business ByDesign. While the principal focus of The Economist remains world news and politics, it targets—through its readership—the most influential corporate decision makers. Through the years, this publication has analyzed some of the most significant trends in business and technology, and it is interesting to note that in-memory technology is now one of them. In-memory technology is permeating the surfaces of several enterprise software systems, spanning data warehouses, analytic applications, database management systems, middle-tier caching systems (applications that cache frequently-used data in distributed systems), and event-stream processing systems (applications that deal with multiple streams of event data, such as automated trading in financial services). In-memory technology is making the minimization—even elimination—of the data warehouse a possibility. In fact, every software application that works with large volumes of data or performs memory-intensive processes stands to benefit from this technology.
Analytics is the branch of logic that relates to or uses analysis. Consequently, in-memory analytics constitutes all analysis that is performed using in-memory technology. This can include multi-dimensional analysis or online analytical processing (OLAP), statistical analysis, data mining, and text mining. Analytic applications are most often built on large volumes of data and this presents a significant demand on memory. Such applications are also anchored on complex data structures that emulate business models. As business models change, corresponding data structures also have to change. Every structural change requires costly time and effort to re-design and re-optimize. This article examines how in-memory technology is transforming OLAP by the convergence of low-cost, high-performance hardware and resourceful ideas in software.
The memory part of in-memory refers to silicon-based semiconductors that constitute the material of a computer system’s random access memory (RAM). The entire in-memory revolution is built on the simple fact that RAM is significantly faster than disks that are made from magnetic material. Access to data is direct with semiconductor-based memory, while disk storage suffers the unavoidable bottleneck associated with rotation speed of magnetic disks.
Low cost, high performance hardware is making it possible for enterprise software to move towards in-memory architecture. A 64-bit architecture is central to realizing the benefits of an in-memory analytics solution. In theory, 64-bit architecture supports up to 16 exabytes (EB) or 16x106 terabytes (TB); in reality, however, the actual amount of available memory is much less. This is determined by balancing application requirements against those of system entities that are necessary for memory management. The following table illustrates some key measures of Microsoft 64-bit operating systems:
|Total Memory that can be accessed by the operating system.
Memory available to each application or process. This is the limit that would apply to an analytic/OLAP application
Semiconductor-based memory. This in addition to virtual memory (on disk), constitutes addressable space of a system. Physical RAM is limited by space on the motherboard
2TB (Itanium-based systems).
1 TB (x64-based systems)
128GB and less (most editions of Vista, server 2003, some editions of server 2008)
Contrast this with 32-bit systems that have a total limit of 4GB and a per-process limit of 3GB. A 64-bit architecture can potentially allow a 300GB OLAP database or data mart to be completely resident in physical RAM and an 8TB data mart to be addressable in its entirety. It is important to note that memory available to an analytic application depends on the per-process space and although this includes space from slower disk-based storage, all of the per-process space is addressable and therefore available. Most in-memory analytics vendors include sophisticated compression techniques to bridge the gap between the volume of data and the physical memory available. This addresses scalability (and hence the ability to adapt to increasing data volume) and makes the in-memory option possible in environments where large amounts of physical memory are not yet viable.
In Multidimensional Memory
Applications that run against data resident entirely in memory are much simpler to write than those that need to retrieve data from disk. For instance, caching algorithms (programs that allow frequently used data to be stored in physical memory to improve performance) are some of the most complex pieces of code to write, maintain, or even use effectively. By storing all relevant data in memory, in-memory architecture eliminates the need for such algorithms.
The in-memory approach addresses two important issues with conventional OLAP: disk space requirements and input-output (I/O). Traditionally, multi-dimensional data for analysis is stored on hard drive as special—often proprietary—structures, commonly referred to as cubes. Cubes present a multi-dimensional perspective of a business and are typically built from several data sources. As the volume of business data grows, storage requirements for cubes also grow. The cubic puzzle has two pieces: the update process (periodic process to keep cubes up-to-date either daily, hourly, or with even smaller time granularity depending on latency requirements); and querying (operations that allow users to analyze the data in cubes). Periodic processing of cubes involves expensive I/O operations that read data from sources and write data to disk. Querying engines of traditional OLAP systems also incur the cost of I/O operations to read and cache data. An in-memory solution, on the other hand, eliminates completely both disk space requirements and I/O bottlenecks. All source data required to create multidimensional analytical data is brought into main memory or RAM. Complex queries (both relational and multidimensional) run significantly faster when data is in RAM.
In-memory analytics addresses another important aspect of OLAP—pre-aggregations. Multidimensional analysis requires data to be available across multiple business dimensions. For instance, it should be possible to analyze sales by region, time and salesperson. In order to reduce the overhead involved in aggregating data each time an analysis is performed, an OLAP database includes pre-aggregations. A significant percentage of processing time—particularly with multidimensional OLAP (MOLAP)—goes towards the computation of pre-aggregates. With in-memory analytics, aggregation is done on the fly and requires no pre-processing.
There are two primary flavors of OLAP and it is important to understand how they are each impacted by in-memory architecture.
1. Multidimensional OLAP (MOLAP): In traditional MOLAP systems, data for analysis is stored on hard disk and is based on a dimensional model. Every change to the structure involves the complete rebuilding of objects (cubes and dimensions) affected by the change. While it is still based on a data warehouse or data mart that is built using dimensional modeling, in-memory MOLAP does not require any processing as cubes are built on the fly and completely loaded in memory. Data loading is done using a data integration/extract-transform-load (ETL) tool. Aggregations are computed based on queries submitted to the in-memory MOLAP server. Queries are run against data in memory and consequently, response times are very low and user experience is enhanced.
2. Relational OLAP (ROLAP): Traditional ROLAP involves storing dimension, fact and aggregate data in relational databases. A ROLAP database is designed to serve the needs of multidimensional analysis. In-memory ROLAP implementations actually combine the power of in-memory MOLAP and ROLAP. Not all data is loaded into memory. Rather, datasets are cached based on frequency of usage. Data not available in memory is retrieved from relational stores and may be cached by the in-memory analytics system for future use.
In-memory architecture paves the way for inventive methods in analytics. In addition to traditional approaches in OLAP, the in-memory alternative has resulted in several new paradigms. In-memory analytics was in fact spearheaded by QlikTech’s associative technology used in their product, QlikView. ETL functionality enables users to combine data from various sources into memory. QlikView discovers associations between data entities as it prepares data for analysis and creates a decentralized platform where each user can take complete control of the data required for analysis. This contrasts the traditional BI approach of a central data warehouse that serves the entire organization and forces a single definition of measures and dimensions across an enterprise. However, proliferation of data sources across most organizations and the demand to make decisions rapidly has rendered the maintenance of a data warehouse and specialized OLAP databases expensive. A pure in-memory analytics solution such as QlikView provides a highly available and responsive alternative to traditional OLAP.
Microsoft’s PowerPivot also offers a decentralized OLAP solution. Referred to as a personal BI system, it is complementary to Analysis Services, which represents an organizational BI system. PowerPivot uses an in-memory instance of Analysis Services in order to support several OLAP features on a user’s desktop. A PowerPivot database is contained within an Excel (.xlsx) file. The PowerPivot in-process engine imports data from various sources and can automatically discover relationships for certain types of data sources.
Still in development, SAP’s in-memory Business Analytic Engine is aimed at providing a powerful combination of tools (a calculation engine; data modeling tools; and data management tools) to create a single in-memory platform where real-time analysis of data from operational sources, data warehouses and web-based systems will become possible. SAP intends to deliver this product on optimized hardware that will connect to IT systems to discover data and prepare it for real-time analysis without disrupting any existing systems’ operations.
SAP also provides a solution that maximizes the use of appropriate server technology. Using HP’s BladeSystem technology, SAS provides an in-memory analytics solution where individual servers are dedicated to specific analytic processes or applications. The solution is targeted towards complex computational environments where tasks can be run in parallel for maximum efficiency.
Organizations have spent significant time and resources to build BI platforms. How do they augment or replace their existing solutions with in-memory alternatives? The answer to this depends entirely on the nature of an organization’s existing analytics solution, options offered by current vendors, and flexibility in terms of combining different vendor solutions. It also depends on other technologies within the BI spectrum, many of which are also turning in-memory. The value of a traditional data warehouse with a data quality strategy and a centralized organizational platform is still very high. However, there may be parts of a platform that can be migrated into in-memory architecture. Reports that need to be near real-time and computations that are memory intensive are bound to benefit from a high performance in-memory solution. Most vendors offer migration paths from their current solutions to newer in-memory solutions and organizations must examine these paths in terms of their overall impact. Self-service or personal BI can be a powerful addition to an organization’s platform, enabling users to accomplish high-performance analysis on data that is near real-time or conduct what-if analyses at the speed of thought.
To evaluate and compare specific functional features visit our Evaluation Center today.