Business Intelligence Status Report

History and Current Status

Economic and regulatory pressures have made a broad set of technologies called business intelligence (BI), more important than ever for all enterprise application users. Users rarely feel satisfied with the amount of information they can extract, if they can extract any at all, from their enterprise applications. Enterprise resource planning (ERP) and BI (sometimes called analytics, which is specialized analytical software) are inseparable concepts, but they have taken different trajectories and evolved differently.

ERP systems have positively transformed many enterprises' business processes, yet many users feel oversold because ERP appears to inhibit access and lock up vital information. I In most traditional ERP systems operational activities are grouped together to form artificially created processes, which bear little resemblance to the actual business activities. For example, the focus of ERP has often been to get the correct figures into the general ledger (GL) and create a transactional glut (for more information on the genesis of enterprise applications, see Enterprise Applications—The Genesis and Future, Revisited.).

Conversely, BI, which has for few decades been called executive information systems (EIS), offers a new breed of similar, but more insightful and functional tools to help enterprises operate more efficiently and profitably. Many manufacturing and distribution enterprises of all sizes are amenable to leveraging software that would not only sense the daily pulse of the operations, but would also spot incongruities, analyze the performances of multiple areas, and initiate corrective adjustments. BI tools promise to help "rank and file" employees harness data too complicated for manual manipulation. For instance, few departments are as hard pressed for new tools as purchasing and sourcing, where rapid increases in materials costs, greater deviations in lead times, and supplier base growth and instability require ever increasing buyer dexterity. BI can provide this dexterity.

This is Part One of a seven-part note.

Part Two will detail contemporary BI tools.

Part Three will look at what is available.

Part Four will describe the BI/CPM market landscape.

Part Five will discuss Geac and Point Solution vendors.

Part Six will compare direct access to a data warehouse for the mid-market.

Part Seven will make recommendations.

The Transactional World

Enterprise software systems are designed as transaction processing tools and, nowadays the main job is to optimize an informed decision-making process for users at all levels of the organizational hierarchy. Recent trends seem indicate that access to key operational data is no longer under the purview of executives alone. Many manufacturing executives today are allowing (if not pushing and encouraging) access to operational performance data to the shop floor and in distribution centers to enable better and more timely decision-making by operators.

Most operational data in ERP systems—and in its younger siblings like supply chain management (SCM) or customer relationship management (CRM) is stored in what is referred to as an online transaction processing (OLTP) system, which is a type of computer processing where the computer responds immediately to user requests. Each request is considered to be a transaction, which is a computerized record of a discrete event, such as the receipt of inventory or a customer order. In other words, a transaction requires a set of two or more database updates that must be completed in an all-or-nothing fashion. The opposite of transaction processing is batch processing, in which a batch of requests is stored and then executed all at one time. In other words, transaction processing requires interaction with a user, whereas batch processing can take place without a user being present. Still, both approaches result with an immense number of records in the database.

To further refresh our memory, a database is a collection of structured data that is application-independent. This data processing file-management approach was designed to establish the independence of computer programs from data files, whereby redundancy is minimized, and data elements can be added to, changed, or deleted from the file structure without changing existing application programs.

Relational database are most commonly used in enterprise applications nowadays. A relational database is a software program that allows users to obtain information drawn from two or more databases that are made up of arrays of two-dimensional data (tables). Contrary to this, a hierarchical database is a method of constructing a database that requires that related record types be linked in tree-like structures. In this instance, no child record can have more than one physical parent record.

Relational databases are more powerful than the others because they require few assumptions about how data is related or how it will be extracted from the database. As a result, the same database can be viewed in many different ways. Another important feature is that a single database can be spread across several tables, which differs from, for example,, flat-file databases, where each database is self-contained in a single table. Accordingly, relational databases are prevalently deployed within enterprise applications.

Bundled with this are database management systems (DBMS) that access data stored in a database and present multiple data views to end users and application programmers. They are a collection of software programs designed for organizing data and providing the mechanism for storing; maintaining or modifying; and retrieving or extracting data on the database. A DBMS separates data from the application programs and people who use the data, and permits many different views of the data.

From a technical standpoint, DBMSs can differ widely, since terms such as relational, network, flat, and hierarchical all refer to the way a DBMS organizes information internally, which can affect how quickly and flexibly users can extract information. For example, a relational database management system (RDBMS) is a type of DBMS that stores data in the form of related tables, whose architecture is based on a formal method of constructing a database in rows and columns using rules that have formal mathematical proofs. In these systems, which originated in the work of EF Codd, relationships between files are created by comparing data, such as account numbers and names. In addition, an RDBMS has the flexibility to take any two or more files and generate a new file from the records that meet the matching criteria.

Requests for information from a database are made in the form of a query, which is a stylized question. For example, one query would request all records in which the "NAME" field is "JONES" and the "AGE" field is "greater than 32". The set of rules for constructing queries is known as a query language, and different DBMS's support different query languages, although there is a semi-standardized query language called structured query language or SQL (). SQL is a is a relational model database standardized query language for requesting information from a database.

Such language has an English vocabulary, is nonprocedural, and provides the ability to define tables, screen layouts, and indices, which, in database design, is a list of keys or keywords, each of which identifies a unique record. Indices make it faster to find specific records and to sort records by the index field, which is the field used to identify each record. Historically, SQL has been the favorite query language for DBMS's running on minicomputers and mainframes.

Increasingly, however, SQL is being supported by personal computer-based (PC) database systems because it supports distributed databases (databases that are spread out over several computer systems), which enables several users on a local-area network (LAN) to access the same database simultaneously. Such sophisticated languages for managing database systems are called fourth-generation languages (4GL), which are programming languages that are closer to human languages than typical high-level procedural programming languages like COBOL or C++.

The information from a database can be presented in a variety of formats, since most DBMS's include a report writer program that enables users to output data in the form of a report, while some DBMS's also include a graphics component that enables users to output information in the form of graphs and charts. However, these report writing programs have not typically been user-friendly nor efficient enough for regular enterprise software users to leverage en mass. Further, for most enterprise software systems, reporting has been a secondary objective and not usually a driving force when the system was originally designed. Instead, these systems were specifically designed for high levels of transaction volume with many concurrent users.

If the database is relational, it has probably been "normalized" (the process of organizing data in accordance with the rules of a relational database). Although less likely, if the database is non-relational, then even worse, custom programs have to be written to store and retrieve data from the database (which is often accomplished with the COBOL programming language).

Whether relational or non-relational, the very design that makes an OLTP system efficient for transaction processing makes it inefficient for end user queries. In the 1980s, many business users referred to their mainframes as "the black hole", because all the information went into it, but none ever came back out. Namely, all requests for reports had to be programmed by the IT staff, whereas only "pre-canned" reports could be generated on a scheduled basis, and ad-hoc real-time querying was virtually impossible. While the client/server-based ERP systems of the 1990s have been somewhat more report-friendly, it has still been a far cry from a desired usability by regular, non-technical, end users for, things such as operational reporting, interactive analysis, etc.

Enter Data Warehousing and Analytics

To resolve these issues, the notion of data warehousing was created, which is a collection of data designed to support management decision-making, whereby data warehouses (DW) contain a wide variety of data that present a coherent picture of business conditions at a single point in time. The theory was to create a database infrastructure that was always on-line, contained all the information from the OLTP systems, including historical data, but structured in such a way that it was fast and efficient for querying.

Generally speaking, a schema is the structure of a database system, described in a formal language (although the term is also often used to refer to a graphical depiction of the database structure) and supported by the DBMS. In a relational database, the schema, for example, defines the tables, the fields in each table, and the relationships between fields and tables. Schemas are generally stored in a data dictionary, which is a file in a DBMS that defines the basic organization of a database. It contains a list of all files in the database, the number of records in each file, and the names and types of each field. Data dictionaries do not contain any actual data from the database, only bookkeeping information for managing it. Without a data dictionary, however, a DBMS cannot access data from the database. On the other hand, most DBMS's keep the data dictionary hidden from users to prevent them from accidentally destroying its contents, since the data dictionary affects all data items throughout an organization, and explains the things like where the data comes from, how it has come into being, how it operates, where it resides, who uses it, and so on.

The most common of these schemas (logical and physical database designs) that is leveraged by data warehousing is known as the star schema, which consists of facts (actual business facts) and dimensions (ways of looking at the facts). One simple way to look at a star schema is that it is designed such that the maximum amount of information can be derived from the fewest number of table reads. Another way to reduce the amount of data being read is to pre-define aggregations (summaries of detail data, such as monthly total sales) within the star, since most queries ask questions like "how many widgets were sold last month?"

A DW would thus be a special database or repository of data that has been specially prepared to support decision-making applications for query and analysis, as opposed to a database for processing transactions (for more information, see A Definition of Data Warehousing). Separating these two functions has improved flexibility and performance, whereby the development of a DW includes the development of systems to extract data from underlying transactional operating systems. The DW also installs a warehouse database system that provides managers flexible access to the data. The term data warehousing generally refers to the combination of many different databases across an entire enterprise. This contrasts with a data mart, which is a database, or collection of databases, designed to help managers make strategic decisions about their business. Whereas a DW combines databases across an entire enterprise, data marts are usually smaller and focus on a particular subject or department, although some data marts, called dependent data marts, can be subsets of larger DW's.

This brings us to online analytical processing (OLAP), which is a category of software tools that provides analysis of data stored in a database, and which enable users to analyze different dimensions of multidimensional data, such as time series and trend analysis views. Hence, business users can on quickly and easily identify performance trends by uaing time-phased information analysis and graphing capabilities of products that support more sophisticated data analysis and have full calculated field capabilities integrated into reports. For instance, users quickly isolate and identify products, customers, regions, or other areas that are trending significantly up or down. Some solutions will also include a fully-integrated, powerful data graphing function that enables users to create detailed data visualizations. The graphing capability should, ideally, be entirely dynamic. In other words, users should be able to rapidly click through various report parameters and see graphical representations for each combination.

OLAP is often used in data mining, which is a class of database applications that look for hidden patterns in a group of data that can be used to predict future behavior. For example. ,it can help retail companies find customers with common interests. Yet, the term is commonly misused to describe software that presents data in new ways, since true data mining software does not just change the presentation, but actually discovers previously unknown relationships among the data; this knowledge is then applied to achieving specific business goals. These tools are used to replace or enhance human intelligence by scanning through massive storehouses of data to discover meaningful new correlations, pattern, and trends by using pattern recognition technologies and statistics. Hence, it is popular in the science and mathematical fields, but also is also increasingly being used by marketers trying to glean useful consumer data from their web sites.

Going one step further, predictive analytics is data mining uses pattern recognition, statistical, and mathematical techniques on large amounts of data to support decision-making by forecasting the outcomes of different scenarios. These programs search databases using techniques such as neural networks and decision trees, look for correlations and patterns that are virtually impossible for mortal humans to detect, and present the information to help management make the right corporate decisions. SAS Institute, the largest pure-play BI vendor (and the largest privately held software company, with $1.5 billion, USD in revenues), has been the leader in this area. It has been able to learn from the past, manage the present, and predict the future, and makes that intelligence available to all constituencies within the organization. Even a relatively small ERP provider, Relevant Business System, has recently developed a BI software tool, the Business Wizard with Predictive Analytics, which can sift through large amounts of data from an ERP system and aid decision makers in choosing between difficult to compare alternatives in a near real-time mode.

The main component of OLAP is the OLAP server, which sits between a client and user interface (UI) and a DBMS. It understands how data is organized in the database and has special functions for analyzing the data. There are OLAP servers available for nearly all the major commercial database systems. In the 1990s, OLAP systems were able to perform multidimensional analysis, with pre-built calculations, typically called cubes, providing much of the available, analytic functionality.

However, these early OLAP systems tended to be pricy, while the cubes were resource-intensive, impairing system performance. Consequently, lower-end analytical tools were reasonably priced but had limited flexibility, whereas the higher-end BI products from the likes of Cognos, Business Objects, or Hyperion Solutions were both expensive and intricate. Many solutions nowadays include more reasonably priced, less resource-intensive, intuitive, point-and-click features that enable business users to dynamically analyze business information from multiple perspectives. Users are able to perform multi-dimensional analysis, "slice-and-dice" data sets, and drill-down to additional detail, all using integrated OLAP technology, whose analytical functionality should ideally be available whether users are connected to the Internet, through a mobile, or are working off-line.

Data warehousing also led to the development of the concept of metadata management, whereby metadata is data about data, such as table and column names, and data-types. Managing metadata makes it possible to understand relationships between data elements and assists in the mapping of source to target fields. Metadata describes how and when and by whom a particular set of data was collected, and how the data is formatted. This is essential for understanding information stored in DWs and has become increasingly important in building extensible markup language-based (XML)-Web applications and Web services (for more information, see Metadata Standards in the Marketplace).

Next with the advent of DW came the creation of extract/transform/load (ETL) tools, which used the metadata to get information from the source systems into the DW. Just as the name suggests, the three database functions that combine into one ETL tool to pull data out of one database and place it into another database are logically called

  1. Extract—the process of reading data from a database.

  2. Transform—the process of converting the extracted data from its previous form into another form that can be placed into another database. Transformation occurs by using rules or lookup tables or by combining the data with other data.

  3. Load—the process of writing the data into the target database or DW.

Thus, ETL tools are logically used to migrate data from one database to another, to form data marts and DW's or to convert databases from one format or type to another. Additional tools, which also make use of SQL, were also developed to give end users direct access to the data in the DW. As time went by, these query tools became more user-friendly, and many now have a parser (a program that dissects source code so that it can be translated into object code) that can turn plain English questions into valid SQL commands.

This concludes Part One of a seven-part note.

Part Two will detail contemporary BI tools.

Part Three will look at what is available.

Part Four will describe the BI/CPM market landscape.

Part Five will discuss Geac and Point Solution vendors.

Part Six will compare direct access to a data warehouse for the mid-market.

Part Seven will make recommendations.

About the Authors

Olin Thompson is a principal of Process ERP Partners. He has over twenty-five years experience as an executive in the software industry. Thompson has been called "the Father of Process ERP." He is a frequent author and an award-winning speaker on topics of gaining value from ERP, SCP, e-commerce, and the impact of technology on industry.

He can be reached at

Predrag Jakovljevic is a research director with (TEC), with a focus on the enterprise applications market. He has nearly twenty years of manufacturing industry experience, including several years as a power user of IT/ERP, as well as being a consultant/implementer and market analyst. He holds a bachelor's degree in mechanical engineering from the University of Belgrade, Yugoslavia, and he has also been certified in production and inventory management (CPIM) and in integrated resources management (CIRM) by APICS.

comments powered by Disqus