The third week of December 2011 was the week of my revelations of sorts in the realm of enterprise applications. In all that seemingly endless information technology (IT) talk throughout the year about the cloud computing, analytics, mobile and social apps, and in-memory buzzwords, two concrete announcements at vendor events in mid-December made me pause for thought. Though both vendor announcements are still some time away from commercial application, their visionary nature impressed the usually skeptical analyst in me (to the point of being accused of “Drinking the Kool-Aid”).
In addition to the announcement of UNIT4’s upcoming analytics app store (see related blog post), another revelation came about at SAP’s Influencers Summit 2011 in Boston, where I learned that the much-publicized SAP HANA offering is not merely an in-memory blade server appliance for quick analytics and data crunching. In fact, I sensed some sort of frustration by SAP’s spokespersons at the event because most attendees (other than the privileged and informed “SAP Mentors” and SAP insiders elite) still thought of HANA as only an in-memory and extremely quick data-crunching appliance. The mere processing speed is fine, but the point that SAP is trying to make is about creating smart apps that "think" on the users' behalf, owing to help from tools such as enterprise search, text analytics, predictive analytics, event stream processing, etc. In other words, a HANA-underpinned enterprise resource planning (ERP) system should produce only pertinent (filtered) info for every user.
In a nutshell, initially, SAP HANA will enable companies to analyze large volumes of detailed operational and transactional information in real time, from virtually any data source. In the long run, though, with this platform, SAP will renew its existing applications and deliver an entirely new class of applications that will change the way people think, work, plan, and operate.
HANA: HAsso’s New Architecture for SAP (and Beyond)
Well, I am neither a “techie” (always have been and will be a so-called functional analyst) nor a marketing expert. Still, if a product’s name stands for “High-Performance ANalytic Application,” is there any surprise if regular folks associate it with high-speed analytic appliances by default? In fact, most of the early available HANA-based products are some so-called “accelerators” or “fast analytics” for this or that. Again, unless you are an SAP insider, why would you think of HANA as being a full-fledged “system of record and master data repository”, i.e., a general-purpose database of the future?
For that reason, I initially confused HANA with SAP’s offerings that were touted by Hasso Plattner (SAP’s renowned co-founder and the current chairman of the supervisory board of SAP AG) a few years back. I am talking here about SAP BusinessObjects (BO) Explorer and SAP Business Warehouse (BW) Accelerator (see the related blog post series from 2009).
Well, now I know that SAP BO Explorer is a tool for creating InfoCubes, and for further modeling and manipulation (in SAP’s lingo, an InfoCube is an OLAP cube, or a set of relational tables arranged according to a star schema, whereby a large fact table in the middle is surrounded by several dimension tables). SAP BW Accelerator is a blade server that is optimized for fast indexing and searching. Both products can nowadays run on top of HANA. Confused and overwhelmed enough?
HANA: A Universal (Future) Database
The major point here is the HANA database will be able to handle both transactional and analytical (decision-making) purposes. The logical philosophical question is then why have the transactional and analytical worlds been separated for all of these decades, and why have Hasso and SAP been the first to (almost) converge them in one database?
The difference between online analytical processing (OLAP) and online transactional processing (OLTP) is not only in their use, but also in the way the data is organized. OLAP is suitable for non-volatile data types, i.e., reporting and historical analysis of data that is stored in data warehouses (and periodically refreshed), whereas OLTP is for highly volatile (ever-changing) transactional systems. Although both OLAP and OLTP can be used universally, in theory at least, there are apparent differences in their processing speeds and closeness to real time.
The key difference is in data organization, i.e., database tables and their relations. In relational databases, normalization refers to a single entity describing only one event from the real world, and vice versa. In contrast, OLAP data organization disregards the theories of normalization, and the data is rather kept in the so-called star or snowflake schema. In this way, one central (a.k.a., inner or fact) table contains the primary keys for all other outer or dimension tables, whereby one entity can describe multiple real-world events and/or notions.
Technically, an Analytic View or OLAP cube is a set of physical tables, interconnected in a star schema. Only the fact table provides the information the user is actually interested in. The facts, also referred to as metrics, measures, or key figures, are generally numbers (or numerical values) such as “Sales Net Revenue.” The dimension tables provide the categories, classes, or attributes by which the facts can be grouped. In other words, the fact table provides the content of the OLAP cube, i.e., the “what” to group data, whereas dimension tables specify the cube‘s dimensions, i.e., the “by what” to group data.
OLAP cubes in business applications usually organize economic numbers along business categories. For example, a cube can produce an individual factual statement that sales organization “X“ shipped material “Y” to ship-to party “Z” at value “400 EUR.” More often than not, the statements (reports) are along the lines of some summarized (grouped by, aggregated, averaged, etc.) categories (e.g., sales of dairy products in the Northeast region for the last quarter).
Classical data warehouses copy data into OLAP cubes in a three-step process called extract, transform, and load (ETL). Data copies are materialized, i.e., the data is saved to physical tables in its transformed state. OLAP cubes come along with a set of efficient transformations that simplify analytics. In contrast to a physical table, an Analytic View is not “flat.” Data must be “flattened out” during extraction by aggregating along the dimensions of the data cube. In terms of structured query language (SQL), “flattening out” means reading with a GROUP BY clause and aggregation functions (SUM, AVG, MIN, and MAX). Each measure must be provided with an aggregation type, which specifies how two lines of the fact table will be merged. Valid aggregation types are SUM, AVG, MIN, and MAX.
Bringing OLAP and OLTP Closer
Now, SAP is not necessarily attempting to merge the OLAP and OLTP worlds together into one, and HANA is not going to be only one particular type of database, per se. Microsoft SQL Server, for example, can be configured as either an OLAP or OLTP database, and by and large most traditional relational databases can be organized for one or the other purpose. It is quite difficult (and perhaps even pointless) to unify OLAP and OLTP worlds via currently available technologies, as each has its own specific requirements. Facebook reportedly uses multiple databases for different purposes: a relational database for user logging and data storing, a NoSQL (Not only SQL) database for unstructured data handling and indexing (e.g., in the e-mail inbox), an OLAP data warehouse for analysis of users’ logs and preferences, etc. The differences are nuanced, and these nuances present the advantage of one database (approach) over another.
To that end, HANA’s purpose is not to necessarily unify all of these disparate worlds and approaches, but rather to reduce the differences and chasm between them. What HANA brings to the table is the speed of combining predictive algorithms with cashed data. Furthermore, HANA features column-oriented data storage, in addition to the standard row orientation. Dennis Moore (a former long-time SAP employee at SAP Labs) explains the importance of HANA in his Enterprise Irregulars blog post. He states that relational databases are well suited to handling structured data where
- the schema does not change,
- text processing is not an important requirement,
- data is measured in gigabytes rather than petabytes,
- geographical or time-series (e.g., stream) processing is not required, and
- the server does not need to support transactional and decision-support queries simultaneously.
A key trait of most commercial relational databases is their compliance with a principle called “ACID” (atomicity, consistency, isolation, durability), which essentially guarantees that database transactions occur in a reliable way. Some might refer to this feature as data persistence. Row Locking is a common practice used to assure transactional integrity and database consistency during concurrent use by multiple users. Locking prevents users from reading data that is being changed by other users, and prevents multiple users from changing the same data at the same time.
Row and Columnar Databases
Traditional relational SQL-based databases store data as rows, as that is often the fastest way to look up a single value, such as employee salary, skills, or age, given a key value, such as the employee ID. But, as explained earlier, it is difficult to conduct analysis and decision making based on this ever-changing flat formatted data. In contrast, columnar databases group data by, well, column.
Within a column, generally speaking, all the data is of the same type. A columnar database stores data of a single type altogether, offering the potential for significant data compression. This compression can lead to reduced disk space requirements, memory requirements, and access times. Other advantages of columnar databases are better reporting performance, parallel data entry and processing, elimination of some aggregation types, etc.
In SAP HANA, users can decide whether to configure each individual table, and more likely a range of tables, in a row or columnar organization. If users change their minds later, they can change the data organization in a table on the fly. This operation literally requires that the table is read, copied to another entry format, and the former format is deleted. The HANA data organization default will most likely be columnar.
Apparently, HANA’s origins date back to 2005 when SAP acquired the Korean company Transact In Memory, Inc. and its P*Times row storing engine (here is some very detailed product info for techies). Needless to say, SAP has since added many more features to this foundation. Today, HANA features SAP’s proprietary TREX search engine, the MDX (MultiDimensional eXpressions) language for analytic and multimedia inquiries and calculations, and the proprietary SQL Script language (a cross-platform SQL language for database interfacing). In addition, there is a planning engine for performing financial operations in the database layer, a disk-based storage (for the data types that do not require the in-memory speed), SAP LiveCache for objects storage (to store non-text files and unstructured data), a data aging system, a persistence layer from MaxDB (SAP’s proprietary relational database used for SAP Business ByDesign), a metadata manager, a transaction manager, a calculation engine, an optimization engine, a request parser, an authorization manager, and a data repository (system of record, if you will). SAP purchased Sybase in 2010—and Sybase has been the chief pioneer and innovator in column-based databases with Sybase IQ (RDBMS) since 1996. Logically, HANA has implemented some of the indexing technology from Sybase IQ.
How Does In-memory Database Fit in Here?
HANA concurrently uses both in-memory and disk-based database technologies, and has a data persistence layer and data recovery system with page management and logger (i.e., not all data should necessarily be kept in memory). In-memory databases (IMDBs) take advantage of the following two hardware trends:
- A significant reduction in the cost of random-access memory (RAM)
- A significant increase in the amount of addressable memory in today’s computers
As Moore explains in his aforementioned blog post, it is thus possible, and economically feasible, to put an entire database content in memory, for fast data management and query. By using columnar or other compression approaches, even larger datasets can be loaded entirely into the main memory. With high-speed access to memory-resident data, more users can be supported on a single machine.
In addition, with an in-memory database, both transactional and decision-support queries can be supported on a single machine, meaning that there can be zero latency between transactional (OLTP) data appearing in the system and that data being available to decision-support (OLAP) applications. In a traditional setup, where data resides in the operational (transactional) store and is then extracted into a data warehouse for reporting and analysis, there is always a lag between data capture and its availability for data analysis (and thus the “driving with a rear view mirror” mantra).
In contrast, HANA can produce reports and analyses directly from physical tables, whereby data is transformed on the fly, i.e., there is no need for the so-called materialization (ETL). An Analytic View (OLAP cube) in HANA is not a transformed copy of data, but rather the plan of how to transform data on the fly. Indexing and searching are much faster with in-memory and TREX capabilities. HANA features so-called nearline storage (NLS, or intermediate storage) capabilities that are integrated with a data mart layer (for rapid creation and re-creation of data marts).
SAP HANA: Today and Tomorrow
Currently, HANA is handling OLAP stacks (based on generally available accelerator analytic solutions), but SAP is working fervently on enabling and testing HANA handling of OLTP stacks as well. Eventually, and we are talking about years here, the entire SAP ERP scope, and even SAP Business Suite will run on HANA (and other mainstream databases). Even more surprising to me was the announcement that the next release of the low-end SAP Business One ERP solution, which currently runs on Microsoft SQL Server, will be on HANA as well (with expected availability at the end of 2012).
Some might wonder in awe as to why companies with 15–20 users (the typical SAP Business One users’ profile) need HANA. As discussed earlier, SAP believes that these companies will not only need HANA for business intelligence (BI)/analytics speed, but also for smart and fast enterprise searches through the unstructured data. Basically, SAP has a three-pronged approach for HANA:
- To power isolated data warehouses (DWs) within many companies and departments, and not necessarily SAP BW
- For accelerator appliance applications that handle a particular task, such as sales and operations planning (S&OP), trade promotions, fraud detection, supply chain visibility and optimization, real-time pricing, automated and optimized customer service, etc., where high-speed data crunching, analysis, search, calculations, etc., are critical
- As an enterprise software general purpose database platform, an "Oracle database killer" of sorts
The third purpose has already been accomplished by Workday (see related blog post). HANA is not really an object-oriented (OO) in-memory database, unlike Workday, and both differ in their approaches and design intents for utilizing their database technologies. But they share many similar tools and approaches in designing next-generation applications and overcoming the aforementioned limitations of relational databases.
SAP has lately been working on perfecting HANA to be open, i.e., without the need to work with SAP's applications, per se. Thus, the giant wants to power data warehouses and quick OLAP calculations from wherever and from whatever data sources. Besides the recent additions of the TREX enterprise search engine, text analytics from Business Objects, predictive analytics and complex event processing (CEP) from Sybase, etc., HANA will be cloud- and mobility-enabled. At the recent Influencer Summit, SAP announced the Project River platform-as-a-service (PaaS) offering blueprint powered by HANA (a long-term feat, of course). Here is Gartner’s in-depth evaluation of SAP’s PaaS strategy.
Conclusion and Recommendations
With HANA, SAP will have about five different databases, many of which are traditional relational databases, such as the aforementioned Sybase IQ and MaxDB. Some pundits might have laughed or dismissively sniggered when SAP seriously stated its intentions to be the No. 2 database provider by 2015, but SAP appeared serious. The company even acknowledges that it might cannibalize its own relational offerings, but this is not a big deal, as most of Sybase database customers (Wall Street financial services companies) do not use SAP software on top of Sybase.
It is likely that HANA will coexist with incumbent relational databases in SAP environments. In theory, companies can even install two HANA instances: one for OLTP and the other for OLAP purposes. It will be quite critical for SAP to espouse an attractive pricing and value proposition for existing and prospective customers to abandon the technology they’ve lived with for decades. SAP HANA may pay for itself in reduced database maintenance payments. Thus, its biggest competitive advantage might be in creating pricing pressure on relational databases for data warehouse instances.
SAP should also clarify the relationship between HANA and NetWeaver, i.e., will NetWeaver be part of the future HANA-based architecture (e.g., an application server), or will there be a wholesale replacement and migration to a new platform? You can imagine the dismay of SAP customers that have finally moved from SAP R/3 to NetWeaver-based releases, only to now contemplate a future move to a yet another platform.
Initially, SAP HANA will likely make its biggest inroads as an “accelerator” for SAP BW and for specialized appliance analytics applications. SAP is not necessarily the first mover into the in-memory database arena in light of the existence of Oracle Exalogic and Exalytics, Oracle TimesTen (used by salesforce.com), and Microsoft SQL Server code-named Denali (which will feature significant in-memory capabilities on top of PowerPivot). Kinaxis and QlikView have long had their in-memory capabilities for their rapid supply chain responses and BI products.
But SAP is betting on the integration, certification, and optimized performance of its products for HANA as the preferred database in the future. Thus, the indications are that every new research and development (R&D) request at SAP that is not HANA-based must be approved by the board. We may be still a long way from seeing SAP HANA replace Oracle, IBM DB2, or Microsoft SQL Server as the transactional data manager for SAP Business Suite (and Progress OpenEdge in non-SAP ERP environments, if you will), but these companies should consider themselves warned.
References and Further Reading
Enterprise Irregulars. SAP HANA Makes Progress and Threatens Oracle.
People, Process & Technology. SAP HANA vs Oracle Exalytics – the game is on.
TEC. Year in Review: Top Enterprise Software News and Trends for 2011.
TEC. SAP as a Retail Market Force: More Fact Than Fiction.