NoSQL for My BI Solution?

Every day business intelligence (BI) gets closer to chaos. Don’t get me wrong—I’m by no means saying that BI will one day become obsolete and that you'll need to avoid it at all costs. On the contrary, what I mean is that BI will have to go outside the box into a world where data is not nicely structured within well-defined databases and using Structured Query Language—best known as SQL.

Because BI is no longer being used by only a small number of users (information workers, data scientists, and data geeks) for the purposes of providing results for an even smaller number of users (so-called decision makers), BI is expanding beyond the use of SQL, for querying relational data, to explore other methods in search for data.

In this chaotic world, BI will have to make sense of different types of information contained within a plethora of sources. And as we move from heavily structured data contained within relational databases to less structured or unstructured information contained within social media sources such as blog posts or messaging systems, the management and analysis of that data will become even more challenging.

In a blog post from The 451 Group, Matthew Aslett cites necessity as one of the key factors spurring the adoption of alternative data management technologies. This makes sense, as traditional relational database schemas fall short of addressing a whole slew of problems, especially when it comes to dealing with unstructured data, or information is not necessarily a table record or field structure type.

The advent of NoSQL databases allows organizations to work with different types of information in different ways. Companies are now able to manage large sets of data in distributed storage systems for subsequent analysis. Some prominent examples are Google’s Bigtable, Amazon’s Dynamo, and Apache’s Hadoop. So, what is NoSQL and how does it work? And can it be used, along with data warehousing and BI strategies, for analysis purposes?

What Is NoSQL?
Many people believe that the term NoSQL refers to the avoidance of SQL, but in fact it stands for “Not Only SQL,” and means the use of new ways of managing information besides with traditional SQL. The basic concept of a NoSQL database is of a fast database management system that could run and interact on UNIX systems and that stores data in common UNIX ASCII files, so that it can be managed by standard UNIX commands and utilities.

This means that NoSQL databases don’t necessarily work with only unstructured data, but rather that they are more versatile than traditional database schemas and can work with a variety of data types. NoSQL databases also don’t necessarily need to work with fixed sets of tables, registers, and fields. A number of NoSQL databases are available in the market, and though each offers specific features functions for managing information, all NoSQL databases present with the following general features:

  • Distributed processing

  • Scalability

  • High availability

  • No fixed schemas, and allow for schema migration without downtime

NoSQL for Analysis and Data Warehousing

Though some organizations have been reluctant and cautious about using NoSQL databases for analysis purposes and/or data warehouse functions, the increasing complexity and volume of data available for analysis has forced many organizations to look at alternate innovative options for handling large volumes of complex data.

For example, Cloudera offers a data platform based on the Apache Hadoop open-source framework for data processing services and for storing, consolidating, and processing large amounts of complex data for data analysis and mining purposes. As another example, DataStax (formerly Riptano) offers DataStax’ Brisk, a Hadoop and Hive (Hadoop’s data warehouse infrastructure) distribution that uses Cassandra (a scalable distributed database by Apache) to provide support for real-time time applications and analytics.

Despite the fast growth pace of some companies like Cloudera, most large organizations are not aware of the potential of NoSQL databases to help them address their information needs and still rely heavily on traditional BI tools and vendors, particularly in the BI space.

The Attractiveness of NoSQL for BI

Yet despite this lack of knowledge, some vendors offer solutions with the ability to connect to NoSQL databases, like Hadoop, Cassandra, and others, perhaps heralding a time when organizations will deploy more NoSQL databases for data analysis purposes. Below are some potential advantages of NoSQL databases for data warehousing:

  • Cost: Some NoSQL solutions are open source and thus can be downloaded for free. Others, while not open source, might have competitive pricing over traditional data warehousing options. Also, because of their distributed capabilities, most NoSQL solutions can be supported by inexpensive hardware.

  • Scalability: Most NoSQL databases offer distributed service, which means that they can be scaled up using servers in parallel.

  • Versatility: Several types of options and types of databases are available for processing large amounts of data from different sources (structured and unstructured data), empowering users with the ability to work with very specific types of data.

Some Challenges

However, some challenges exist with NoSQL databases, which may preclude their use for data analysis purposes:

  • Complexity: The deployment and use of some NoSQL databases can be rather challenging, as these systems don’t rely on SQL sentences. This might not be a good approach to start with in a BI space, given that BI solutions are becoming easier to use.

  • Specificity: As with other software applications, NoSQL databases may not be able to address all BI and analysis processes. Thus, additional efforts may be required to select the right type of database and/or customize it if needed.

  • Shortage of technically savvy personnel: As these tools are rather new to the market, many people may not be aware of their capabilities, which may turn out to be a disadvantage.

NoSQL databases, like other applications, have to go through a maturity process. In time, NoSQL databases might gain popularity and maturity—and be able to provide features to streamline its adoption by organizations as well as enhance user experience and administration.

This blog post is but a short introduction to NoSQL. There is a lot more to say on the matter, which I will be doing in the near future. But at this time, it’s safe to say that NoSQL databases—along with other current trends (in-memory technologies, new types of data warehouse schemas, etc.)—will undoubtedly change the way organizations approach their BI and data warehouse activities. Companies like Cloudera, IBM, and EMC are looking to provide the means for organizations to embrace Hadoop, which might lead many companies to adopt a NoSQL strategy for their BI tasks.

Please let me know your thoughts, and I’ll respond as soon as I can.
comments powered by Disqus