In-Memory Business Intelligence Ranwood Paper April 2009
1 CONTENTS 1 Contents... 1-1 2 In-memory BI...... 2-2 3 In-Memory BI solutions and architecture... 3-5 4 Advantages of In-memory BI... 4-10 5 Disadvantages of in-memory BI:... 5-12 6 Does in-memory solutions need star-scheme s or not?... 6-13 7 Can we operate directly on OLTP System or do we need an ODS?... 7-14 8 Online transaction processing (OLTP)... 8-14 9 Operational data store (ODS)... 9-15 10 OLTP and ODS... 10-17 11 Problem with historical data... 11-18 12 References:... 12-19 Page : 1
2 IN-MEMORY BI Business intelligence (BI) refers to skills, technologies, applications and practices used to help a business acquire a better understanding of its commercial context And, as companies strive to maximize the value of their enterprise information assets, new technologies and techniques continue to emerge at a rapid pace. The latest BI methodology that could find a lot of enthusiast among users and absorb industry experts attention considerably is in- memory analytics. The primary goal of in-memory analytics is to eliminate standard disk-based BI deployments, which are typically relational or OLAP-based. These traditional implementations come with numerous drawbacks such as poor flexibility, limited scope of analysis, and slow response times. With in-memory analytics, the reporting software performs all needed analytical functions at runtime including data retrieval and storage, manipulation, calculation, formatting, etc. within the memory of a 64-bit server. In order to make a better comparison between the disk-based and in-memory technologies of Business Intelligence solutions in the first place, it is wise to get familiar with them and their architecture more. Then we can recognize the advantages and disadvantages of each ones in the next stage. So in the first place we review the development process from traditional approach to new one. OLAP and MOLAP are two kinds of technologies that in-memory BI replaces. In other words, the Business Intelligence (BI) market is moving from OLAP solutions to adopt In- -Memory business intelligence solutions. MOLAP is multidimensional OLAP. The distinctive feature of MOLAP is that it stores the results of a cube in a multidimensional store. In order to speed query performance, OLAP tools used anticipated query requirements and then provide aggregate and precalculate views of data(data explosion). This data explosion is called a cube, since the data is populated in tables with many dimensional views. It is required that designers spend significant time and care to find a suitable architect for the designing of the cube and then built it. After building it takes significant effort to change the design to add new dimensions or views of data. As long as users use cubes for making reports or queries, query performance is accomplished with high speed and efficiency but on the other hands, cubes are less flexible to respond to ad hoc queries which seek to access data that is not prepopulated. In addition, since data in a cube is most often aggregated, the ability to drill down to detailed data is only limited. Page : 2
At this time, business intelligence planners decides to use a traditional RDBMS with native SQL for flexibility in query authoring or a cube for speed in processing preplanned queries. A native SQL query can be very flexible: it can ask a lot of question and perform calculations, but its response time can be slow. On the other hand, a cube, with careful planning and prepopulating of dimensional views, can answer to queries quickly, but may be unable to answer an unanticipated query questions.. As is shown in figure 1, the first shift occurred with the arrival of column-centric centric database systems. The column-centric centric database, however, caused the continuum between flexibility and query performance. This systems make it possible to one choose fast performance and flexibility. In 2005, a new company, Vertica, emerged with a column-centric centric database offering; MonetDB is an open source option. The main difference between traditional database systems and column-centric centric database systems is that traditional database systems organize data by row to optimize for record inserts (transactions), however, column-centric centric database systems organize data by columns (fields). In Column-centric systems data is stored vertically (by column). If the data is stored vertically, it is possible to access only a specific column of data and retrieve only the data in Page : 3
the query. This retrieves data with a minimum amount of disk I/O, and thus improves query time by a factor of 10 or more. These kind of systems compress data because not all of the columns are query candidates and some data is repetitive within the column (such as state or gender) and other columns are blank or zero (so they easily will be compressed).but by this way we cannot achieve query performance improvements measured in factors of 100 compared to SQL queries on traditional relational systems. As a result of the entry of column-centric the second paradigm shift is occurring now: inmemory technology which uses IMOLAP mechanism. IMOLAP stands for the in-memory OLAP. It is different from OLAP and MOLAP in aspect of primary storage mechanism for data to be analyzed is memory. Typically in this kind of technology it is not needed to precalculate measures, while users can rely on speed of the memory to allow values to be calculated as they are needed. In addition, it would be useful to mention that different vendors have different approach regarding this new application, some offer only fast queries and no calculation or User Interface (UI), others are simply implementations of cubes which are held in memory. Page : 4
3 In-Memory BI solutions and architecture SALESLOGIX VISUAL ANALYZER For example one of the applications uses In-Memory Association technology is SalesLogix Visual Analyzer. It is built on industry-leading BI solution QlikView and its provider is QlikTech which is known as the fastest growing BI software company by IDC, and in press reports as the leader in In-Memory Analysis. It has a simple architecture; all data and operates (queries and aggregates) should be held in memory, and all calculations should be performed when requested and not prior. Association technology refers to associative mapping between data elements. It shows information that is included and excluded at the same time(not linearly), and allows the user a unique path through the data. This approach results in very fast query times. In QlikView the memory-centric aspect is shared with SAP and Applix. In particular, it is very similar to SAP s BI Accelerator technically. SalesLogix Visual Analyzer offers three components in an integrated solution: Fast Query Engine: Load data to the memory and enable users to have access to their own queries instantly tly just when they request them; On Demand Calculation Engine: SalesLogix Visual Analyzer shows one or more measures (metrics, Key Performance Indicators, expressions, s, etc.) across one or more dimensions by Charts, graphs, and tables of all types which are multidimensional analysis. Al of these will be performed on user demand not prior; Visually Interactive UI: it provides a lot of UI pre-built elements and tools such as variety kinds of dashboards(forecast, pipeline, win/loss, marketing, and customer service) and list boxes for navigating dimensions, statistic boxes, and many other UI elements. Furthermore, it defines all the data of a query should be shown, as well as data that was excluded from the query(associative model). Page : 5
BPS Another example is Sterna Business Positioning System (BPS) which is an in-memory Business Intelligence (BI) platform. It empowers managers to continuously optimize operational and financial performance. It derives managers to take advantage of financial opportunities and reduce the risks as they occur. At the core of BPS is the Sterna BPS Server which is highly scalable and performs complex mathematical calculations on a very large data sets, on demand, in memory and in real time. This kind of server sort the organization's data into an in-memory data-store so there will be no need to build time-consuming, costly and inflexible OLAP cubes. Data is read from the organization s data stores via Data Collectors and transformed into Sterna in-memory virtual data stores called "Business Matrices". Business Matrices are maintained completely separate from the mathematical computations, ons, providing an extremely high level of flexibility. ALTOSOFT The third example is Altosoft which is an innovative provider of business intelligence (BI) solutions. One of the products it released is Altosoft Insight 3.0 which includes major new capabilities including a web-based based reporting module, new process analysis features, and the industry s only second-generation in-memory BI architecture As it is mentioned above, this product also uses unique second-generation in-memory BI architecture. In this architecture there is no need to data warehouse development (that is normally associated with enterprise BI implementations) anymore. In environments where an existing warehouse is available as a data source, Altosoft can join and analyze data with information on are outside the warehouse environment while supporting dashboard monitoring, analytics, and reporting on warehoused data. Page : 6
It should be mentioned that the main difference between first generation and second generation in-memory BI is that although first generation in-memory BI systems offer high performance analytics but are disconnected from source data,however, the second generation ones are connected to the data source and for example enable dynamic updates directly from operational data sources. In addition, it added high-speed, 64-bit in-memory calculation engine for in-memory data aggregation, transformation, and optimization. Other technology used in Altosoft Insight, cache accelerators in this kind of technology accelerate complex operations by intelligent calculation and compression algorithms that causes optimizing memory management. In addition, analytics-oriented optimization and smart aggregation technology enables Altosoft Insight to optimize in-memory data structures based on requirements ents neeeded for delivering KPI. Moreover, it combined this efficiency with 100% codeless development. So, its implementation and deploying can be done in less than a tenth of the time and costs compared with other BI products. Altosoft Insight 3.0 is a complete, enterprise BI platform that can be easily deployed without the high implementation and infrastructure costs normally associated with other business intelligence solutions, said Scott Optiz, CEO of Altosoft Corporation. Because Altosoft enables direct access to source data without any loss of performance, the ETL and data warehouse components of traditional BI architecture are not required in an Altosoft solution. Not only does this represent a significant cost-savings savings for our customers, but it also eliminates the major architectural bottlenecks to low-latency, latency, high performance BI. SAP SAP offers an in-memory technology called BI Accelerator. The BI Accelerator is one of the most popular and quickest among SAP's NetWeaver BI appliance. It has a pre-installed 64- bit Intel Xeon processor. It also has 4 blades, 4 server instances from HP and IBM, 4 TREX(text retrieval and classification) in-memory search engine which is installed on each blade to support parallel search capability, and file system for storage (for Indexes/data) but no database engine. It is running on a Linux Operation System SAP is broadening the scope of its in-memory technology to meet the needs of customers who need fast query and transaction capabilities but on the other hand this process it is potentially disrupting Oracles core relational database business. By using this technology there is no need to store data externally(about 55 percent of SAPs customers applications use an Oracle database). Page : 7
SAP's position has remained ed relatively unchanged. It made a good vision but not-so-great execution, compared with Microstrategy and QlikTech. COGNOS One of the products that Business intelligence software developer Cognos made is Celequest which is a developer of an operational dashboard appliance. It develops a BI appliance that utilizes in-memory technology that stores data in memory rather than at the database level. The result is much faster query times on BI requests, with the capability to support thousands of concurrent users with little delay. APPLIX Applix offers TM1 as a financial application and has several large financial institutions as enterprise customers, providing financial consolidations and reporting with impressive a set of tools for analysis in areas such as profitability and price-volume variance. MICROSTRATEGY BUSINESS SS INTELLIGENCE SOLUTIONS MicroStrategy, is one of the leaders in business intelligence and performance management technology. One of the products it announced is MicroStrategy 9. It provides integrated reporting, analysis, and monitoring software that help to make better business decisions. One of the significant technical advances by which MicroStrategy extend its BI platform is in- memory capability. In-memory technology takes advantage of hardware-based memory and multi-core processors to speed queries and ease data exploration while also eliminating cube building. By extending its existing ROLAP architecture, MicroStrategy s new in-memory ROLAP takes advantage of the addressable memory available with the newest 64-bit operating systems, including Microsoft Windows 64, IBM AIX, HP-UX, Sun Solaris, Red Hat Linux, and SUSE Linux. In-memory ROLAP technology uses the extensive memory space available in 64-bit servers as multi-dimensional memory in which both data and calculations can reside as multi- dimensional datasets called ROLAP cubes. This product's reports and dashboards automatically directs queries to in-memory ROLAP cubes by this way, increase query performance (compared to database-resident storage)especially when executing complex, process-intensive queries. Page : 8
When each report is run, the MicroStrategy engine determines whether the report s data can first be obtained from a ROLAP cube. If it cannot, the MicroStrategy engine will fetch the data from the databases. ROLAP cubes improve performance for reports, dashboards, and OLAP analyses. In addition, in-memory ROLAP cubes can be populated quickly and easily from any database source or data sources such as spreadsheets. Once the data resides inside an in-memory ROLAP cube, it can be a source of reusable definitions and data for reports, dashboards, and OLAP investigations by workgroup users. By using in-memory ROLAP it is not required repetitive and expensive processing. By creating in-memory ROLAP cubes for the most expensive and time-consuming queries, the database engine will process these queries only once to form the various ROLAP cubes instead of at every report execution. By using in-memory ROLAP cubes in database servers, companies can enjoy faster performance and produce more reports per hour, and free database servers for additional users and BI applications. SUMMARY OVERVIEW Vendor QlikTech SAP BI Accelerator Siebel Analytics Cognos Response 64-bit In In Memory Platform Memory Aggregate Query Granted patents on in- Y Y Y memory associative technology Y Y Y Acquired Siebel Y _ Y Analytics Acquire Applix (TM1) in Y Y Y 2007 Visualization UI Y _ Page : 9
4 ADVANTAGES OF IN-MEMORY BI In-Memory BI solutions has been in use for more than a decade, but these formerly technologies are now emerging into mainstream use because of the following factors that give a lot of advantages to users of this technology: 1. The key advantage of in-memory analytics is speed. Memory is significantly faster than disk, which results in fast queries and calculations. In in-memory approach, queries and related data reside in the server s memory, so for generating reports it is not needed to access to any network or disk I/O. This will increase the performance and reliability of the data warehouses and databases in which the required report data exists,particularly when the report in question has a large answer. Moreover, users who demanded a query will get faster answers regardless of the size and complexity of the query. Another reason of high speed in in-memory analytics is eliminating the building of cubes which speeds deployment and analyzing. Fast access to queries and aggregates allows new ways to visualize and manipulate data (such as QlikView s Association Technology). However, with traditional OLAP, constructing cubes is time-consuming and requires expert skills. This process can take months, and sometimes more than a year. In addition, the cube must be constructed before it can be calculated, a process which maybe take hours. And, all of this must occur before analysis or reporting can be performed and before user see his/her answer. 2. The second significant factor in the current and future success of in-memory technology is its affordability. In the past, memory was so expensive, and memories of 32-bit make limitation in processing power and storage compared with the 64-bit ones (with 32-bit systems, most operating systems were limited to 3 or 4 GB of usable RAM). However, with a 64-bit memory, it is possible to have as much as 100 GB or more space. But today, the price of memory has been declined while accessing to memory of 64-bit got easier and it is possible to put as much as 16 GB on a single board and add up to 16 boards in a single box. 3. One of the other benefits of In-memory analytics is that there will be no need for the IT personnel for building, deploying and maintaining OLAP cubes and managing data for reporting and analysis. Personnel don t have to be highly skilled IT professionals. They don t have to understand what a star schema is or what a snowflake schema is Page : 10
or what a parent-child relationship is. For example one of the in-memory analysis solutions is Palo server that focuses on both the analysis and planning sides of BI and working with it is as easy as working with Excel and still have the power of the centralized multidimensional database. 4. This technology is enable to define and plug in and out system architecture components easily, thus there will be demand for very detailed requirements analysis and architecture design. In addition disk-based BI has more low flexibility compared with in-memory BI. Once the OLAP cube structure is built and populated with data, it is hard and time consuming to adapt the structure to business changes. In addition, new dimensions and measures created in OLAP must be defined into cube by coding in hard disk (an IT task). So by every change it is needed to change the code. Also the cube must be refreshed so the flexibility is so low against changes, moreover, this process can take a very long time and may be costly. 5. And at last, the scope of analysis in disk-based BI is limited and only small set of predefined dimensions can be analyzed in it. Page : 11
5 DISADVANTAGES OF IN-MEMORY BI: There are some different evidences prove that it is possible in-memory BI solutions has some disadvantage: "An in-memory database is limited by the available RAM," said Steven Graves, president and co-founder of McObject, which develops the extremedb in-memory database system, in Issaquah, Wash. "With 64-bit memory it is possible to have a terabyte size, but the time it takes to provision it is rather large. And also there is the question of the survivability of the database. If someone trips over a cord, that in- memory goes away. So in-memory is not going to replace conventional databases; Some of the disadvantages of In-Memory includes: It should be considered that the process of refreshing is usually time consuming because data should be loaded in to memory (unless product supports incremental reload); Without 64-bit technology there is a significant limit to the amount of data that can be held in memory; Data is analyzed in memory, not in the data store. Therefore data in memory is always out of date. This eliminates the possibility of real-time analysis. Page : 12
6 DOES IN-MEMORY SOLUTIONS NEED STAR-SCHEME S OR NOT? As it is mentioned, RDBMS systems run on hard-drives drives so their speed is limited and require the creation and maintenance of many intermediate aggregation tables. The process of loading data into star/snowflake schemas is also difficult and requires management of complex ETL code. But changes will be easier if all the data is resident in memory. No indexes are needed. No recalculation or aggregation is necessary. Cubes and star schemas do not have to be designed. Disk I/O is eliminated, as the data is already in RAM. There is no need to build cubes and/or indexes. Query response times is fast, drill-down down is possible to the detail level. The theoretical improvement in data access from silicon is 10,000 to 1,000,000 times faster than from disk. Page : 13
7 CAN WE OPERATE DIRECTLY ON OLTP SYSTEM OR DO WE NEED AN ODS? 8 ONLINE TRANSACTION PROCESSING (OLTP) OLTP (online transaction processing) is a class of program that facilitates and manages transaction-oriented oriented applications, typically for data entry and retrieval transactions in a number of industries, including banking, airlines, mail order, supermarkets, and manufacturers. OLTP refers to application systems that facilitate and manage transaction- oriented applications, typically for data entry and retrieval transaction processing. Online Transaction Processing has two key benefits: Simplicity: Reduced paper trails and faster, more accurate forecasts for revenues and expenses are both examples of how OLTP makes things simpler for businesses. Efficient: it vastly broadens the consumer base for an organization, the individual processes are faster, and it s available 24/7. Another way, Online Transaction Processing has some disadvantages: The problem of OLTP is security. It works on local network or internet and therefore more susceptible to intruders and hackers. Another problem is economic costs, it is the potential for server failures. This can cause delays or even wipe out an immeasurable amount of data. Today's online transaction processing increasingly requires support for transactions that span a network and may include more than one company. For this reason, new OLTP software uses client/server processing and brokering software that allows transactions to run on different computer platforms in a network. Page : 14
9 OPERATIONAL DATA STORE (ODS) An operational data store is a database designed to integrate data from multiple sources to make analysis and reporting easier. Because the data originates from multiple sources, the integration often involves cleaning, resolving redundancy and checking against business rules for integrity (see figure 2.1). An ODS is usually designed to contain low level or atomic data (such as transactions and prices) with limited history that is captured "real time" or "near real time" as opposed to the much greater volumes of data stored in the Data warehouse generally on a less frequent basis. Figure 2.1 integrated data by ODS Page : 15
Figure 2.2 In Figure 2.2 the ODS is seen to be an architectural structure that is fed by integration and transformation (i/t) programs. These i/t programs can be the same programs as the ones that feed the data warehouse or they can be separate programs. The ODS, in turn, feeds data to the data warehouse. Some operational data traverses directly into the data warehouse through the i/t layer while other operational data passes from the operational foundation into the i/t layer, then into the ODS and on into the data warehouse. An ODS is an integrated, subject- oriented, volatile (including update), current-valued structure designed to serve operational users as they do high performance integrated processing. The essence of an ODS is the enablement of integrated, collective on-line processing. An ODS delivers consistent high transaction performance--two to three seconds. An ODS supports on-line update. An ODS is integrated across many applications. An ODS provides a foundation for collective, up-to- the-second views of the enterprise. And, at the same time, the ODS supports decision support processing. Because of the many roles that an ODS fulfills, it is a complex structure. Its underlying technology is complex. Its design is complex. Monitoring and maintaining the ODS is complex. The ODS takes a long time to implement. The ODS requires changing or replacing old legacy systems that are not integrated. Page : 16
10 OLTP AND ODS OLTP is a class application that manages transactions in a system including a command sequence: collected (gathering) of data input, processing data, and updating old data with new data is entered and processed. An OLTP system is always available when any employee works for the company. Today, more and more companies demand working schedules of 24 hours a day, and even 7 days per week. Along with the ability to correct errors, OLTP systems should also reduce the influence of unusual activities such as upgrading hardware, software changes, the conversion work, data storage, and re-organization. ODS (Operational Data Store) is one of the directly access-able able Data Store Objects. ODS objects are flat database tables without dimensional structures for reporting and analysis purpose.the ODS objects can contain any type of data, structured, unstructured, while Cubes hold only numeric data (measures) in its Fact tables. On figure 2.1, Data will be loaded from OLTP database to data warehouse by using ODS or without it. So in BI system, we need OLTP but an ODS depends on our demand and design of system. For example in SAP NetWeaver you cannot use BI accelerator for ODS tables. It works only for Cubes defined in SAP BI. Perhaps future versions of BI accelerator will support other SAP NetWeaver data stores as well. The events are cached in memory and do not have to be made persistent in a data store (such as an ODS) before they can be analyzed. This approach extends data consolidation to what can be thought of as an event-driven data architecture. This architecture is particularly useful for applications that require close to zero data latency. Another example is Qlikview. Some people believe it is a suitable tool for end-users. These users are looking for a BI tool which can be easily used to analyse user data. These users are not prepared to invest in a data warehouse as they believe creating cubes will be too time consuming. It matches their needs as it connects directly into the OLTP system. On the other hands, some other people claim that it eliminates the need to pre-build cubes, instead lazy-loading loading data directly from OLTP data structures into internal memory-based structures (a bit like Business Objects operated). Page : 17
11 PROBLEM WITH HISTORICAL DATA Over the past few years, companies have started to present their data warehouses as Web services for use by other applications and processes connected by SOA or middleware such as an enterprise service bus (ESB). One of the limitations to this approach is that the data warehouse is the wrong place to look for intelligence about the performance of a current process. Real-time process state data, so relevant to this in-process intelligence, is unlikely to be in the data warehouse anyway. Even using BI dashboards is inadequate for many operational tasks because they rely on a user noticing a problem based on out-of-date data. Dashboards aggregate and average. They remove details and context and present only a view of the past. Decisions require detail and need to be made in the present. It's clear that data warehouses will remain, but this time they operate as the system of record, as opposed to the only place that BI is done. In other words, it will be designed in the way that reporting and presentation of historical data will be done in them. There are some challenges when trying to move to a real-time data warehouse, however, it is clear that information required to support and indeed drive daily operational decisions must come from a different approach to avoid the latency introduced through the extract, transform, load and query cycle. For solving this problem we can introduce BI 2.0 which is goal reducing latency to cut the time between when an event occurs and when an action is taken in order to improve business performance. For achieving this goal existing BI architectures will be changed. With BI 2.0, data isn't stored in a database or extracted for analysis; BI 2.0 uses event-stream processing. As the name implies, this approach processes streams of events in memory, either in parallel with actual business processes or as a process step itself. This means looking for scenarios of events, such as patterns and combinations of events, which are significant for the business problem at hand. The outputs of these systems are usually realtime metrics and alerts and the initiation of immediate actions in other applications. The effect is that analysis processes are automated and don't rely on human action, but can call for human action where it is required. BI 2.0 gets data directly from middleware, the natural place to turn for real-time data. Standard middleware can easily create streams of events for analysis, which is performed in memory. When these real-time events are compared to past performance, problems and opportunities can be readily and automatically identified. Page : 18
12 REFERENCES: http://www.tdwi.org/publications/bijournal/display.aspx http://www.sagenorthamerica.com/imageserver/portal_managed_assets/sagenorthamerica/ http://www.sternatech.com/in-memory_bi/ memory_bi/ http://www.reuters.com/article/pressrelease http://www.sqlmag.com/ http://blog.croyten.com/2008/05/in-memory-analy.html Page : 19