Role of In-memory Analytics in Big Data Analysis

In a previous article, I discussed big data and the need for a big data solution to handle, manage, and derive valuable insight from large volumes of data in various formats and coming from disparate sources. Here I discuss the role that in-memory technologies plays in big data analysis and the potential of this technology to change the business intelligence (BI) landscape and the industry technology space. But first we need to understand the basics.


A Basic Framework
What does it mean to have an in-memory technology in place? It means that all the data in a computer is stored within a computer’s random access memory (RAM), rather than in the hard (physical) disks. The storage of data in-memory improves its management in the following ways:

  • By using a semiconductor storage media, as opposed to using physical disk storing, the data is read and processed much faster.
  • By minimizing or avoiding mechanical read and writes, the latency time for performing various operations is reduced.
  • By using different and innovative schemas to store the data (e.g., columnar, indexed, etc.), the processing of large volumes of data is improved.

A number of advances in hardware technology have been instrumental in supporting the development and use of in-memory technologies. For example, the use of 64-bit processors enables servers to process larger amounts of memory—and allows current server architectures to work with larger amounts of space on RAM. And the scalability and parallelization of processors enable in-memory technologies to take advantage of the improved performance of larger sets of RAM available. For a comprehensive analysis of some of the important principles and concepts in-memory technologies, I urge you to read an article titled In-Memory Analytics: A Multi-Dimensional Study written by my colleague Anna Mallikarjunan.

Products with in-memory capabilities are not something new to the software industry. For example, the vendor QlikTech started working with their in-memory–based products in the 90s, and other BI application vendors such as IBM Cognos have been using them for more than a decade. Many software providers provide in-memory capabilities in one form or another, particularly those for data analysis—e.g., BI software providers with applications for online analytical processing (OLAP). With data stored in RAM, OLAP applications can speed the data query and analysis process as well as ease data modeling by applying innovative ways for organizing and storing data. The table below lists some software products that use in-memory technologies for OLAP services.


Product Vendor
PowerPivot Microsoft
Cognos TM1 IBM
Jedox OLAP Accelerator (GPU) Jedox

WebFOCUS Visual Discovery

Information Builders
BIRT Data Objects and BIRT Data Analyzer Actuate
Tableau Tableau Software

Many products already incorporate in-memory technologies to provide fast data analysis and discovery. Some vendors even offer implementation of an in-memory database engine along with implementation of their products (e.g., QlikView, PowerPivot, Kognitio, Spotfire, and Tableau with its new in-memory engine). Such an engine enables data to be allocated in-memory, e.g., using a columnar store schema, for BI purposes—thereby enhancing data processing performance.

While analytics and BI make good use of in-memory database systems (IMDSs), IMDSs were not developed for this purpose. Database systems such as eXtremeDB, VoltDB, solidDB by IBM, TimesTen by Oracle, and HANA by SAP are multipurpose in-memory databases created specifically to allow the running applications to achieve fast response times . As such, these systems have the potential to change the way organizations process and distinguish between transactional (operational) and nontransactional (analytical) data.

From a design perspective, IMDSs exhibit key features that ensure the highest possible performance in critical environments:

  • Reduce data transfer overload. While traditional database management systems (DBMSs) need to read the data from disk storage files, IMDSs need little or no data transfer, as they point directly to the actual data.
  • Reduce and/or eliminate caching. In-memory databases remove much of the caching by ensuring almost or all the data resides within the RAM memory.
  • Optimize memory use (compression). This feature allows IMDSs to optimize RAM data storage and processing. 

So, in-memory databases have already shown advantages in enhancing data processing performance. Now let’s look at how they can help address the challenges associated with big data.


In-memory Meets Big Data
So, how do in-memory technologies fit within the big data spectrum? As the volume, variety, and the processing speed of data increases, organizations will need increasing amounts of data to be collected and analyzed as part of the decision-making process. This information will also need to analyzed in timely manner to confer a competitive advantage. For some organizations, the latency period—i.e., the time it takes for the data to be collected, analyzed, and available for decision making—needs to be very short. A way to solve the issue of processing vast amounts of structured and nonstructured data is to deploy a big data solution, such as a hadoop-based data solution, to manage big data scenarios, along with an in-memory database technology, which would allow for performing advanced analysis of extremely large sets of complex data at high speed (in real time). So, an analysis that might otherwise take hours or days can be achieved in hours, minutes, or even seconds with IMDSs.

The use of in-memory technologies also facilitates performing ad-hoc and informal data analysis, which may lead to data discovery and process improvement. Therefore, extremely fast databases and/or in-memory databases seem to be the logical companion when deploying a big data strategy. Like big data solutions, in-memory databases can provide the following:

  • Storage. By being able to handle large amounts of data in-memory.
  • Simplicity. By potentially handling both structured and nonstructured data in a simple way.
  • Speed of process. By providing high-speed processing capabilities.

From a technical perspective, organizations need to consider essential features of and address critical questions related to in-memory database technology to reap the benefits:

  1. Caching and memory swapping. How does the application handle data sets larger than the available space on disk?
  2. Compression. What is the compression ratio available, and how does the application deal with processing data in a compressed format?
  3. Loads (initial and incremental). How does the application deal with the initial data load and/or the incremental loading of data?
  4. Integration. How well will the in-memory database integrate with your third-party systems, both operational and nonoperational.

One Technology, Multiple Vendors
Vendors currently offer in-memory databases in the form of appliances and/or cloud-based applications. Appliances have the advantage of packaging the combination of software—i.e., the in-memory application and database—within the hardware—i.e., the server—while cloud-based in-memory solutions are associated with a low total cost of ownership (TCO) and low technical requirements for users.

The table below lists some of the vendors engaged with in-memory databases providing big data analysis capabilities.







HANA (High Performance Analytics Platform) is the in-memory
technology database developed by SAP. It is distributed as an
appliance using hardware certified by SAP. It has row and column
data stores with high compression and partition features.

Exalytics In-memory Machine


In-memory appliance developed by Oracle. It combines a set of
technologies such as Oracle’s BI foundation and Oracle TimesTen
in-memory database to provide in-memory big data analytics capabilities.



VoltDB is an in-memory–based fast relational database system. It is
specifically designed to run through servers connected via high-speed data
networks. While it is not intended specifically for big data analytics, its
high-performance features make it suitable for performing such tasks.

The Kognitio Analytical Platform—WX2


The analytical platform WX2 is Kognitio’s in-memory offering for big-data
in-memory analytics. It comes in both on premise as an analytical appliance
and as a cloud offering. Its scalability and high performance make this appliance
a contender in the in-memory analysis space.

QlikView QlikTech

QlikView is a platform powered by in-memory associative search technology
(column-based) and a series of application programming interfaces (APIs) to
interface with the APIs from Hadoop-based data providers. It is also possible
to use QlikView over distributed and clustered environments


Final Thoughts
Of course, not all solutions for big data analysis must be based on in-memory technologies. Some software providers can already handle big data analysis using hybrid—i.e., disk-memory–based—solutions as well as alternative techniques to enhance the performance of big data analysis. Nevertheless, we should expect more software vendors to come up with specific in-memory–based solutions for big data analysis in real time. Teradata, with its new expansion over its Integrated Analytics solution, Microsoft’s new SQLServer 2012 with improved in-memory and big data capabilities, and Software AG, with its new in-memory big data management strategy are just a few examples of the increased interest displayed by vendors, who are providing in-memory solutions to address the issue of real-time analysis of big data. So, I think it’s just a matter of time until the application of in-memory database technologies for big data analytics becomes the mainstream, considering the potential benefits of having an in-memory solution as part of an overall big data strategy. 

comments powered by Disqus