INTERACTIVE MANIPULATION, VISUALIZATION AND ANALYSIS OF LARGE SETS OF MULTIDIMENSIONAL TIME SERIES IN HEALTH INFORMATICS

Size: px
Start display at page:

Download "INTERACTIVE MANIPULATION, VISUALIZATION AND ANALYSIS OF LARGE SETS OF MULTIDIMENSIONAL TIME SERIES IN HEALTH INFORMATICS"

Transcription

1 Proceedings of the 3 rd INFORMS Workshop on Data Mining and Health Informatics (DM-HI 2008) J. Li, D. Aleman, R. Sikora, eds. INTERACTIVE MANIPULATION, VISUALIZATION AND ANALYSIS OF LARGE SETS OF MULTIDIMENSIONAL TIME SERIES IN HEALTH INFORMATICS Artur Dubrawski, Maheshkumar Sabhnani, Saswati Ray, Michael Baysek, Lujie Chen, John Ostlund and Michael Knight The Auton Lab Carnegie Mellon University Pittsburgh, Pennsylvania awd@cs.cmu.edu Abstract We present a scalable data representation structure which supports interactive access to large sets of time series data, the type of data frequently encountered in health informatics. This structure, called T-Cube, is a cached sufficient statistic equivalent to the data cube concept known in OLAP applications. It differs from a regular data cube in that it can be used to very efficiently answer all conceivable (not just the most common) queries against multidimensional time series databases, including disjunctive queries. Rapid access to complex data not only makes advanced analytics feasible, but it also enables user-level data navigation (drill-downs, roll-ups, visualization) at the interactive speeds. T-Cube also allows for rapid execution of massive screening through data for statistically significant patterns at different levels of aggregation. The exhaustive search strategy guarantees that no event of interest will ever be missed. Successful applications in the domains of public health and food safety indicate that the combined benefits lead to improved situational awareness of the analysts working with information systems powered with T-Cube. We are on the outlook for other applications where it may be of help. Keywords: multidimensional time series, OLAP, scalable analytics, rapid retrieval. Introduction Time series data is abundant in many domains including finance, weather forecasting, epidemiology, food safety and many others. For instance, large scale bio-surveillance programs monitor status of public health against adverse events such as outbreaks of infectious diseases and emerging patterns of factors affecting public health. They rely on data collected throughout a health management system (hospital records, health insurance companies records, lab test requests and results, issued and filled prescriptions, ambulance and emergency phone service calls, etc) as well as outside of it (school/workplace absenteeism, sales of non-prescription medicines, etc). The key objective is to as early as possible and as reliably as possible detect changes in statistics of data sources which may be indicative of a developing public health problem. One of the challenges the users of such systems face is data overload. The actual number of e.g. daily transactions of drug sales in pharmacies across country may be very large. The users need tools to enable timely analysis of those massive data sources. The analyses can be performed automatically (with a data mining software), but typically automatically discovered 1

2 patterns are subject to careful follow-ups through manual drill-downs. In both scenarios, massive screening of very large collections of data must be executed really fast in order to make the biosurveillance systems useful in practice. A saving of just a few hours of detection latency of an outbreak of a lethal infectious disease can yield enormous monetary and social benefits. Most of the kinds of data mentioned above can be interpreted as time series of interval (e.g. daily) counts of events (such as number of certain type of drugs, e.g. anti-diarrheals sold; number of patients reporting to emergency department with specific symptoms, number of positive results of microbial tests of food samples taken in a production facility, etc.). These time series can be sliced-and-diced across multiple categorical dimensions such as location, gender and age group of patients, and so on. Computational efficiency of data mining operations which can be applied to such data, as well as the efficiency of interactive manual drill-downs, heavily depends on the efficiency of extraction of series of counts aggregated for specific values of the categorical dimensions. We use a data structure called T-Cube to rapidly retrieve such aggregates for any complex query. It achieves its efficiency by pre-computing and caching responses to all possible queries against the underlying temporal database of counts annotated with sets of categorical labels, while keeping the storage size in check. It has already been successfully used in support of health informatics applications in food safety and disease surveillance domains, but its potential applicability reaches further. Related Work Standard approach to handling ad-hoc queries in commercial databases is that of On-Line Analytical Processing (OLAP). The idea relies on data cubes, cached data structures extracted from (usually only parts of) the original data and made in the form allowing for fast ad-hoc querying of pre-selected subsets of aggregated data. For the sake of brevity we do not review the details of OLAP technology here, but these methods are known to often suffer from long build times (typically hours for the databases of sizes and complexities typical to applications considered in this paper) and huge memory requirements (causing the need to rely on high-end database servers). Additionally, as we observed empirically, data cubes still typically need a second to respond to a complex query on the datasets which we tested. Such latency is an inconvenience to users who want to perform multiple ad-hoc queries on-the-fly. It also hampers statistical analyses which may require execution of millions of complex queries, and which could take days of processing time using industry-standard OLAP data cubes. Data cubes are closely related to another technology originating from computer science research: Cached Sufficient Statistics. Similarly to data cubes, cached statistics structures pre-compute answers to queries, however they cover all possible future queries, aiming at efficiency of not only data retrieval, but also their memory representations. All Dimensional Tree (AD-Tree, [1]) is a very good example of such data structure. AD-Trees are designed to efficiently represent counts of all possible co-occurrences among values of multiple dimensions of categorical data. This is very important in many scenarios involving statistical modeling of such data, where most operations require computing aggregate counts, ratios of counts or their products. Quick access to counts of arbitrary subsets of demographic properties is essential for overall performance of analytic tools relying on them. AD-Trees have been shown to dramatically speed-up notoriously expensive machine learning algorithms including Bayesian Network structure learning [1], 2

3 Decision Tree learning and Association Rule learning [2]. The attainable speedups range from one to four orders of magnitude with respect to previously known efficient implementations. These efficiencies are available at moderate memory requirements, which are easy to control. Dynamic AD-Trees [3] can grow on demand allowing for even more memory efficiencies. AD- Trees are the best of the existing solutions to categorical data representation when it comes to very quickly responding to ad-hoc queries against large datasets. T-Cube T-Cube is an extension of the idea of AD-Trees designed for very fast retrieval and analysis of additive data streams such as e.g. time series of counts. Technical description of the fundamental concepts of T-Cube can be found in [4], which also details techniques leading to further performance improvements such as: specific arrangements of demographic attributes, mostcommon-value-based pruning, and controlling the depth of the tree. They help to balance building time, query response time and physical memory requirements of the tool. Here we provide only a brief introduction to the main ideas underlying T-Cube. T-Cube addresses the algorithmic question of storing and searching the combinatorially large set of possible time series that can be derived from queries on attributes of data. Figure 1 conveys the basic idea, and illustrates its extension to time series data. For brevity we do not discuss here the details of the data structure, nor algorithms for construction and querying, but the essential property of the T-Cube is that once built, time series for any query (in a general class including conjunctions of disjunctions) can be obtained in constant time (independent of the number of records in the raw dataset). One example of such a query is get me the time series by day for all males in zip codes 15213, and 15206, excluding children, and specific to GI or Respiratory syndromes. The drawback of the simplistic approach above is that the size of the T- Cube structure grows impractically large if there are more than a few attributes. There are, however, a few simple innovations which can make this kind of approach practical. Firstly, we do not need to store any node which in fact corresponds to a time series of all zeros. Other series can be stored in space proportional to the number of non-zero values, and due to additional compression approaches we can achieve an average of less than four bits per nonzero time step per time series on real health informatics data. Secondly, even when there are no or few nodes with zero counts, an additional trick can make a large difference. This takes every Vary node in the diagram in Figure 1, and considers the most common value (MCV) of the variable that is to be instantiated. It is relatively easy to prove that all such nodes can be removed, together with any of their descendents, and the T-Cube will still contain sufficient information to retrieve any requested time series with no loss of accuracy. In our previous work the use of this innovation has taken the memory requirements for a 50-dimensional AD-tree from more than 100 terabytes down to 200 megabytes. The third trick involves the use of leaf lists in which nodes which occur infrequently in the raw data are replaced with a set of pointers to the raw data. This can reduce the memory requirements another 1-3 orders of magnitude, with a tradeoff in access time. Those relatively straightforward efficiencies of data caching allow T-Cube to perform time series queries 2-3 orders of magnitude faster than standard state-of-the-art data cube technologies. This speedup has been already found highly beneficial in the practice of bio-surveillance and food safety [5-6], where the need for rapid analysis of massive collection of time series data is very 3

4 common. T-Cube can be very useful in such applications for two main reasons: (1) It enables fast anomaly detection by simultaneous statistical analysis of many thousands of time series, and (2) It allows the users to perform many complex, ad hoc time series queries on the fly without inconvenient delays. The potential benefits are manifold and include scalable contextual labeling of queries and retrieval of patterns by content. The users can perform inverted queries in which they ask the server to search through thousands or even millions of previous time series to find series with given properties, and answer the question: Which demographic features of the series from historical data best explain a situation like this?. Figure 1. Simple example of a T-Cube built for data typical to the public health domain. T-Cube has been tested on synthetic and real-world datasets containing millions of records and hundreds of dimensions. Results show that its response time can be 1,000 times shorter than that of the state-of-the-art commercial database tools. The utility of the T-Cube structure has been already extensively demonstrated in practice in applications ranging from bio-surveillance, to monitoring food safety, to detection of emerging patterns of failures in maintenance and supply management systems. In one of those applications, the data under consideration included several relatively small sets of about 80 thousand records of transactions with 33 categorical variables of arities varying from 2 to over 100. The application Called for massive screening through all combinations of attribute-value pairs of sizes 1 and 2, the total number of such combinations approaching 1 million. The analytic task used expectation-based temporal scan algorithm to retrospectively detect unusual short-term increases in counts of specific aggregate time series. The total number of individual temporal scan tests for one such data set exceeded 2 billion. Each such test involved a Chi-square test of independence performed on a 2-by-2 contingency table formed by the counts corresponding to the time series of interest (one of the 1 million series) and 4

5 the baseline counts, within the current temporal window of interest (one of 2,000), and outside of it. The complete sequence, including the time necessary to retrieve and aggregate all the involved time series, compute and store the test results, load source data and build the T-Cube structure, etc., took about 1 hour of machine time. Using one of the commercial data cube tools, the time needed to retrieve the data corresponding to one of the involved queries was in the range of 280 milliseconds. Therefore, without the T-Cube, it would take about 3 days to just pull all the required data, not including any processing of it or execution of statistical tests. Table 1 presents results of a controlled experiment involving a data with 12 million records of transactions and 3 categorical fields of arities 1000, 10 and 5 respectively, covering a period of 5 years at daily resolution [4]. It compares complex query response times for 3 commercial data cube tools (their names have been anonymized upon requests from their vendors) and two configurations of T- Cube (one favoring rapid responses, the other memory-savvy). Each of the commercial tools required a different amount of memory to represent the test data, and the response time improved with the increase of the amount of used memory. However, they needed seconds to respond to a complex query on average. T-Cube on is able to respond in milliseconds, even in the memoryconscious mode. Tool A B C T-Cube 1 T-Cube 2 Memory [MB] over 1, Response time [s] Table 1. Performance of T-Cube compared against 3 commercial data cube tools. Figure 2. Screen shot of the T-Cube Web Interface displaying a time series chart and a screen shot of it showing spatial distribution of multivariate data of temporal counts. T-Cube Web Interface The original uses of the T-Cube data structure were focused on speeding up complex data mining operations rather than on supporting human users experience of the direct interaction with data. The T-Cube web interface attempts to fill that gap. It is a publicly accessible tool for interactive visualization and manipulation of large scale multivariate data of time series of counts [7]. It allows the user to execute complex queries and to run various types of statistical analyses on an 5

6 uploaded dataset. It can be accessed using any Java-enabled browser. The interface, still under incremental development and testing, includes a suite of visualization and statistical analysis tools allowing intuitive navigation through the data. After uploading a data file, complex queries and statistical tests can be performed. The interface also enables running massive searches for statistically significant patterns rapidly and at different levels of data aggregation. The left part of Figure 2 shows a time series chart and a menu of selectable categorical attributes for an example bio-event dataset; the right part of it presents a spatial representation of the same data. Conclusion T-Cube is an efficient tool for representing additive time series data labeled with a set of categorical attributes. It is especially useful for retrieving responses to ad-hoc complex queries against large datasets of that kind, where it significantly outperforms the existing commercial data cubes. T-Cubes are simple to setup and easy to use. Typically, it takes only minutes to build one from data. Database users do not need to define any stored procedures, or materialized views in order to make that happen. Once it is built, it is ready to rapidly respond to any simple or complex query. It can be used as a general tool for any application requesting access to time series data from a database. From the application s perspective it is transparent: it acts just like the database itself, but an incredibly quickly responding one. The T-Cube web interface is intended to become a user-level platform for variety of analytic endeavors which can benefit from T-Cube efficiency. The key areas include sub-domains of health informatics and tasks in which rapid analyses of large sets of time series data or interactive drill-downs are of interest. Its ease of use and availability should hopefully increase popularity and tangible success of datadriven methods of rapid detection of adverse events. We hope to see T-Cubes widely used. Acknowledgements This material is based upon work that was partially supported by the National Science Foundation under grant number IIS This work was partially supported by the Centers of Disease Control (award number R01-PH000028). References 1. A. Moore, M. Lee. Cached sufficient statistics for efficient machine learning with large datasets. Journal of Artificial Intelligence research, 8, 67-91, B. Anderson, A. Moore. AD-trees for fast counting and for fast learning of association rules. 4th International Conference on Knowledge Discovery and Data Mining, P. Komarek, A. Moore. A dynamic adaptation of AD-trees for efficient machine learning on large data sets. Proceedings of the 17th International Conference on Machine Learning, M. Sabhnani, A. Moore, A. Dubrawski. T-Cube: Fast extraction of time series from large datasets. Technical Report, Carnegie Mellon University, CMU-ML , M. Sabhnani, A. Dubrawski, J. Schneider. Multivariate time series analyses using primitive univariate algorithms. Advances in Disease Surveillance 3, A. Dubrawski, M. Sabhnani, S. Ray, J. Roure, M. Baysek. T-Cube as an enabling technology in surveillance applications. Advances in Disease Surveillance 3, T-Cube Web Interface: 6

T-Cube: Quick Response to Ad-Hoc Time Series Queries against Large Datasets

T-Cube: Quick Response to Ad-Hoc Time Series Queries against Large Datasets T-Cube: Quick Response to Ad-Hoc Time Series Queries against Large Datasets Maheshkumar Sabhnani, Artur Dubrawski, Andrew Moore The Auton Lab, Carnegie Mellon University, Pittsburgh, PA, USA {sabhnani,awd,awm}@cs.cmu.edu

More information

Explorable Visual Analytics (EVA) Interactive Exploration of LEHD. Saman Amraii - Amir Yahyavi Carnegie Mellon University

Explorable Visual Analytics (EVA) Interactive Exploration of LEHD. Saman Amraii - Amir Yahyavi Carnegie Mellon University Explorable Visual Analytics (EVA) Interactive Exploration of LEHD Saman Amraii - Amir Yahyavi Carnegie Mellon University Motivation Tuesday, June 23rd 2015 Explorable Visual Analytics (EVA) 2 Motivation

More information

Understanding the Value of In-Memory in the IT Landscape

Understanding the Value of In-Memory in the IT Landscape February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to

More information

Techniques for Early Warning of Systematic Failures of Aerospace Components

Techniques for Early Warning of Systematic Failures of Aerospace Components Techniques for Early Warning of Systematic Failures of Aerospace Components Artur Dubrawski Auton Lab, Carnegie Mellon University 5000 Forbes Avenue, NSH 3121 Pittsburgh, PA 15213 412-268-6233 awd@cs.umass.edu

More information

A Review of Data Mining Techniques

A Review of Data Mining Techniques Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

ANALYTICS BUILT FOR INTERNET OF THINGS

ANALYTICS BUILT FOR INTERNET OF THINGS ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that

More information

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers 60 Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

Healthcare Big Data Exploration in Real-Time

Healthcare Big Data Exploration in Real-Time Healthcare Big Data Exploration in Real-Time Muaz A Mian A Project Submitted in partial fulfillment of the requirements for degree of Masters of Science in Computer Science and Systems University of Washington

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

Tap Unexplored Markets Using Segmentation The Advantages of Real-Time Dynamic Segmentation

Tap Unexplored Markets Using Segmentation The Advantages of Real-Time Dynamic Segmentation Tap Unexplored Markets Using Segmentation The Advantages of Real-Time Dynamic Segmentation What is Segmentation and How Does it Apply to Website Traffic? Market segmentation, the practice of breaking down

More information

Whitepaper. Innovations in Business Intelligence Database Technology. www.sisense.com

Whitepaper. Innovations in Business Intelligence Database Technology. www.sisense.com Whitepaper Innovations in Business Intelligence Database Technology The State of Database Technology in 2015 Database technology has seen rapid developments in the past two decades. Online Analytical Processing

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

When to consider OLAP?

When to consider OLAP? When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: erg@evaltech.com Abstract: Do you need an OLAP

More information

Data Warehouse: Introduction

Data Warehouse: Introduction Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,

More information

Using Tableau Software with Hortonworks Data Platform

Using Tableau Software with Hortonworks Data Platform Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data

More information

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives

More information

CHAPTER 5: BUSINESS ANALYTICS

CHAPTER 5: BUSINESS ANALYTICS Chapter 5: Business Analytics CHAPTER 5: BUSINESS ANALYTICS Objectives The objectives are: Describe Business Analytics. Explain the terminology associated with Business Analytics. Describe the data warehouse

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing Class Projects Class projects are going very well! Project presentations: 15 minutes On Wednesday

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1 Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics

More information

Adobe Insight, powered by Omniture

Adobe Insight, powered by Omniture Adobe Insight, powered by Omniture Accelerating government intelligence to the speed of thought 1 Challenges that analysts face 2 Analysis tools and functionality 3 Adobe Insight 4 Summary Never before

More information

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778 Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778 Course Outline Module 1: Introduction to Business Intelligence and Data Modeling This module provides an introduction to Business

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

OLAP Data Scalability

OLAP Data Scalability OLAP Data Scalability White Paper Ignore OLAP Data Explosion at great cost. many organisations will never know that they figuratively bought a very expensive rowing boat, when they could have traveled

More information

Bayesian Network Scan Statistics for Multivariate Pattern Detection

Bayesian Network Scan Statistics for Multivariate Pattern Detection 1 Bayesian Network Scan Statistics for Multivariate Pattern Detection Daniel B. Neill 1,2, Gregory F. Cooper 3, Kaustav Das 2, Xia Jiang 3, and Jeff Schneider 2 1 Carnegie Mellon University, Heinz School

More information

Information management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse

Information management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse Information management software solutions White paper Powerful data warehousing performance with IBM Red Brick Warehouse April 2004 Page 1 Contents 1 Data warehousing for the masses 2 Single step load

More information

2015 Workshops for Professors

2015 Workshops for Professors SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

More information

Using In-Memory Computing to Simplify Big Data Analytics

Using In-Memory Computing to Simplify Big Data Analytics SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed

More information

Crime Pattern Analysis

Crime Pattern Analysis Crime Pattern Analysis Megaputer Case Study in Text Mining Vijay Kollepara Sergei Ananyan www.megaputer.com Megaputer Intelligence 120 West Seventh Street, Suite 310 Bloomington, IN 47404 USA +1 812-330-01

More information

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau Powered by Vertica Solution Series in conjunction with: hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau The cost of healthcare in the US continues to escalate. Consumers, employers,

More information

CHAPTER 4: BUSINESS ANALYTICS

CHAPTER 4: BUSINESS ANALYTICS Chapter 4: Business Analytics CHAPTER 4: BUSINESS ANALYTICS Objectives Introduction The objectives are: Describe Business Analytics Explain the terminology associated with Business Analytics Describe the

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Application of Business Intelligence in Transportation for a Transportation Service Provider

Application of Business Intelligence in Transportation for a Transportation Service Provider Application of Business Intelligence in Transportation for a Transportation Service Provider Mohamed Sheriff Business Analyst Satyam Computer Services Ltd Email: mohameda_sheriff@satyam.com, mail2sheriff@sify.com

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

End to End Microsoft BI with SQL 2008 R2 and SharePoint 2010

End to End Microsoft BI with SQL 2008 R2 and SharePoint 2010 www.etidaho.com (208) 327-0768 End to End Microsoft BI with SQL 2008 R2 and SharePoint 2010 5 Days About This Course This instructor-led course provides students with the knowledge and skills to develop

More information

Collective Mind. Early Warnings of Systematic Failures of Equipment. Big Data Analytics for Proactive Fleet Management

Collective Mind. Early Warnings of Systematic Failures of Equipment. Big Data Analytics for Proactive Fleet Management Collective Mind Early Warnings of Systematic Failures of Equipment Big Data Analytics for Proactive Fleet Management Dr. Artur Dubrawski Dr. Norman Sondheimer Auton Lab Carnegie Mellon University University

More information

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Journal of Advances in Information Technology Vol. 6, No. 4, November 2015 Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Jiangping Wang and Janet L. Kourik Walker

More information

CRGroup Whitepaper: Digging through the Data. www.crgroup.com. Reporting Options in Microsoft Dynamics GP

CRGroup Whitepaper: Digging through the Data. www.crgroup.com. Reporting Options in Microsoft Dynamics GP CRGroup Whitepaper: Digging through the Data Reporting Options in Microsoft Dynamics GP The objective of this paper is to provide greater insight on each of the reporting options available to you within

More information

Data Mining and Database Systems: Where is the Intersection?

Data Mining and Database Systems: Where is the Intersection? Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: surajitc@microsoft.com 1 Introduction The promise of decision support systems is to exploit enterprise

More information

Information Visualization WS 2013/14 11 Visual Analytics

Information Visualization WS 2013/14 11 Visual Analytics 1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and

More information

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days or 2008 Five Days Prerequisites Students should have experience with any relational database management system as well as experience with data warehouses and star schemas. It would be helpful if students

More information

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are

More information

Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning

Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning By: Shan Suthaharan Suthaharan, S. (2014). Big data classification: Problems and challenges in network

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

Business Intelligence Systems

Business Intelligence Systems 12 Business Intelligence Systems Business Intelligence Systems Bogdan NEDELCU University of Economic Studies, Bucharest, Romania bogdannedelcu@hotmail.com The aim of this article is to show the importance

More information

Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I

Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Data is Important because it: Helps in Corporate Aims Basis of Business Decisions Engineering Decisions Energy

More information

Web Data Mining: A Case Study. Abstract. Introduction

Web Data Mining: A Case Study. Abstract. Introduction Web Data Mining: A Case Study Samia Jones Galveston College, Galveston, TX 77550 Omprakash K. Gupta Prairie View A&M, Prairie View, TX 77446 okgupta@pvamu.edu Abstract With an enormous amount of data stored

More information

DATA WAREHOUSING - OLAP

DATA WAREHOUSING - OLAP http://www.tutorialspoint.com/dwh/dwh_olap.htm DATA WAREHOUSING - OLAP Copyright tutorialspoint.com Online Analytical Processing Server OLAP is based on the multidimensional data model. It allows managers,

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Supply chain intelligence: benefits, techniques and future trends

Supply chain intelligence: benefits, techniques and future trends MEB 2010 8 th International Conference on Management, Enterprise and Benchmarking June 4 5, 2010 Budapest, Hungary Supply chain intelligence: benefits, techniques and future trends Zoltán Bátori Óbuda

More information

COURSE SYLLABUS COURSE TITLE:

COURSE SYLLABUS COURSE TITLE: 1 COURSE SYLLABUS COURSE TITLE: FORMAT: CERTIFICATION EXAMS: 55043AC Microsoft End to End Business Intelligence Boot Camp Instructor-led None This course syllabus should be used to determine whether the

More information

INSIGHT NAV. White Paper

INSIGHT NAV. White Paper INSIGHT Microsoft DynamicsTM NAV Business Intelligence Driving business performance for companies with changing needs White Paper January 2008 www.microsoft.com/dynamics/nav/ Table of Contents 1. Introduction...

More information

InfiniteGraph: The Distributed Graph Database

InfiniteGraph: The Distributed Graph Database A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086

More information

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many

More information

HYPERION MASTER DATA MANAGEMENT SOLUTIONS FOR IT

HYPERION MASTER DATA MANAGEMENT SOLUTIONS FOR IT HYPERION MASTER DATA MANAGEMENT SOLUTIONS FOR IT POINT-AND-SYNC MASTER DATA MANAGEMENT 04.2005 Hyperion s new master data management solution provides a centralized, transparent process for managing critical

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

NHPSS An Automated OTC Pharmaceutical Sales Surveillance System

NHPSS An Automated OTC Pharmaceutical Sales Surveillance System NHPSS An Automated OTC Pharmaceutical Sales Surveillance System Xiaohui Zhang, Ph.D., Reno Fiedler, and Michael Popovich Introduction Development of public health surveillance systems requires multiple

More information

Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services Length: Delivery Method: 3 Days Instructor-led (classroom) About this Course Elements of this syllabus are subject

More information

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING M.Gnanavel 1 & Dr.E.R.Naganathan 2 1. Research Scholar, SCSVMV University, Kanchipuram,Tamil Nadu,India. 2. Professor

More information

Business Intelligence Solutions. Cognos BI 8. by Adis Terzić

Business Intelligence Solutions. Cognos BI 8. by Adis Terzić Business Intelligence Solutions Cognos BI 8 by Adis Terzić Fairfax, Virginia August, 2008 Table of Content Table of Content... 2 Introduction... 3 Cognos BI 8 Solutions... 3 Cognos 8 Components... 3 Cognos

More information

Analyzing the Customer Experience. With Q-Flow and SSAS

Analyzing the Customer Experience. With Q-Flow and SSAS Q.nomy Analyzing the Customer Experience With Q-Flow and SSAS Using Microsoft SQL Server Analysis Service to analyze Q-Flow data, and to gain an insight of customer experience. July, 2012 Analyzing the

More information

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering

More information

Report Model (SMDL) Alternatives in SQL Server 2012. A Guided Tour of Microsoft Business Intelligence

Report Model (SMDL) Alternatives in SQL Server 2012. A Guided Tour of Microsoft Business Intelligence Report Model (SMDL) Alternatives in SQL Server 2012 A Guided Tour of Microsoft Business Intelligence Technical Article Author: Mark Vaillancourt Published: August 2013 Table of Contents Report Model (SMDL)

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Nursing Diagnosis and Multidimensional Design

Nursing Diagnosis and Multidimensional Design Proceedings of the 3 rd INFORMS Workshop on Data Mining and Health Informatics (DM-HI 2008) J. Li, D. Aleman, R. Sikora, eds. NursingCareWare: Warehousing for Nursing Care Research and Knowledge Discovery

More information

Customer Analytics. Turn Big Data into Big Value

Customer Analytics. Turn Big Data into Big Value Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data

More information

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering June 2014 Page 1 Contents Introduction... 3 About Amazon Web Services (AWS)... 3 About Amazon Redshift... 3 QlikView on AWS...

More information

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process ORACLE OLAP KEY FEATURES AND BENEFITS FAST ANSWERS TO TOUGH QUESTIONS EASILY KEY FEATURES & BENEFITS World class analytic engine Superior query performance Simple SQL access to advanced analytics Enhanced

More information

OLAP Theory-English version

OLAP Theory-English version OLAP Theory-English version On-Line Analytical processing (Business Intelligence) [Ing.J.Skorkovský,CSc.] Department of corporate economy Agenda The Market Why OLAP (On-Line-Analytic-Processing Introduction

More information

Best Practices for Deploying Managed Self-Service Analytics and Why Tableau and QlikView Fall Short

Best Practices for Deploying Managed Self-Service Analytics and Why Tableau and QlikView Fall Short Best Practices for Deploying Managed Self-Service Analytics and Why Tableau and QlikView Fall Short Vijay Anand, Director, Product Marketing Agenda 1. Managed self-service» The need of managed self-service»

More information

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities April, 2013 gaddsoftware.com Table of content 1. Introduction... 3 2. Vendor briefings questions and answers... 3 2.1.

More information

Microsoft Services Exceed your business with Microsoft SharePoint Server 2010

Microsoft Services Exceed your business with Microsoft SharePoint Server 2010 Microsoft Services Exceed your business with Microsoft SharePoint Server 2010 Business Intelligence Suite Alexandre Mendeiros, SQL Server Premier Field Engineer January 2012 Agenda Microsoft Business Intelligence

More information

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1

More information

Reporting Services. White Paper. Published: August 2007 Updated: July 2008

Reporting Services. White Paper. Published: August 2007 Updated: July 2008 Reporting Services White Paper Published: August 2007 Updated: July 2008 Summary: Microsoft SQL Server 2008 Reporting Services provides a complete server-based platform that is designed to support a wide

More information

Business Intelligence for Excel

Business Intelligence for Excel Business Intelligence for Excel White Paper Business Intelligence Technologies, Inc. Copyright 2002 All Rights Reserved Business Intelligence for Excel This white paper concerns business intelligence for

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

Data Mining in the Swamp

Data Mining in the Swamp WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Decoding DNS data. Using DNS traffic analysis to identify cyber security threats, server misconfigurations and software bugs

Decoding DNS data. Using DNS traffic analysis to identify cyber security threats, server misconfigurations and software bugs Decoding DNS data Using DNS traffic analysis to identify cyber security threats, server misconfigurations and software bugs The Domain Name System (DNS) is a core component of the Internet infrastructure,

More information

SAP HANA. SAP HANA Performance Efficient Speed and Scale-Out for Real-Time Business Intelligence

SAP HANA. SAP HANA Performance Efficient Speed and Scale-Out for Real-Time Business Intelligence SAP HANA SAP HANA Performance Efficient Speed and Scale-Out for Real-Time Business Intelligence SAP HANA Performance Table of Contents 3 Introduction 4 The Test Environment Database Schema Test Data System

More information

University of Gaziantep, Department of Business Administration

University of Gaziantep, Department of Business Administration University of Gaziantep, Department of Business Administration The extensive use of information technology enables organizations to collect huge amounts of data about almost every aspect of their businesses.

More information

IBM Software Information Management. Scaling strategies for mission-critical discovery and navigation applications

IBM Software Information Management. Scaling strategies for mission-critical discovery and navigation applications IBM Software Information Management Scaling strategies for mission-critical discovery and navigation applications Scaling strategies for mission-critical discovery and navigation applications Contents

More information

Business Intelligence & Product Analytics

Business Intelligence & Product Analytics 2010 International Conference Business Intelligence & Product Analytics Rob McAveney www. 300 Brickstone Square Suite 904 Andover, MA 01810 [978] 691 8900 www. Copyright 2010 Aras All Rights Reserved.

More information

A Brief Tutorial on Database Queries, Data Mining, and OLAP

A Brief Tutorial on Database Queries, Data Mining, and OLAP A Brief Tutorial on Database Queries, Data Mining, and OLAP Lutz Hamel Department of Computer Science and Statistics University of Rhode Island Tyler Hall Kingston, RI 02881 Tel: (401) 480-9499 Fax: (401)

More information

The Microsoft Business Intelligence 2010 Stack Course 50511A; 5 Days, Instructor-led

The Microsoft Business Intelligence 2010 Stack Course 50511A; 5 Days, Instructor-led The Microsoft Business Intelligence 2010 Stack Course 50511A; 5 Days, Instructor-led Course Description This instructor-led course provides students with the knowledge and skills to develop Microsoft End-to-

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimensional

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes Final Exam Overview Open books and open notes No laptops and no other mobile devices

More information

Week 3 lecture slides

Week 3 lecture slides Week 3 lecture slides Topics Data Warehouses Online Analytical Processing Introduction to Data Cubes Textbook reference: Chapter 3 Data Warehouses A data warehouse is a collection of data specifically

More information

Delivering Real-Time Business Value for Healthcare Providers SAP Business Suite Powered by SAP HANA

Delivering Real-Time Business Value for Healthcare Providers SAP Business Suite Powered by SAP HANA Delivering Real-Time Business Value for Healthcare Providers SAP Business Suite Powered by SAP HANA July 2013 Public The real-time opportunity Best-run healthcare facilities improve patient outcomes by

More information

Multi-dimensional index structures Part I: motivation

Multi-dimensional index structures Part I: motivation Multi-dimensional index structures Part I: motivation 144 Motivation: Data Warehouse A definition A data warehouse is a repository of integrated enterprise data. A data warehouse is used specifically for

More information

BUSINESS INTELLIGENCE

BUSINESS INTELLIGENCE BUSINESS INTELLIGENCE Microsoft Dynamics NAV BUSINESS INTELLIGENCE Driving better business performance for companies with changing needs White Paper Date: January 2007 www.microsoft.com/dynamics/nav Table

More information

IBM Cognos Express Essential BI and planning for midsize companies

IBM Cognos Express Essential BI and planning for midsize companies Data Sheet IBM Cognos Express Essential BI and planning for midsize companies Overview IBM Cognos Express is the first and only integrated business intelligence (BI) and planning solution purposebuilt

More information

AdTheorent s. The Intelligent Solution for Real-time Predictive Technology in Mobile Advertising. The Intelligent Impression TM

AdTheorent s. The Intelligent Solution for Real-time Predictive Technology in Mobile Advertising. The Intelligent Impression TM AdTheorent s Real-Time Learning Machine (RTLM) The Intelligent Solution for Real-time Predictive Technology in Mobile Advertising Worldwide mobile advertising revenue is forecast to reach $11.4 billion

More information

<no narration for this slide>

<no narration for this slide> 1 2 The standard narration text is : After completing this lesson, you will be able to: < > SAP Visual Intelligence is our latest innovation

More information

Enterprise and Standard Feature Compare

Enterprise and Standard Feature Compare www.blytheco.com Enterprise and Standard Feature Compare SQL Server 2008 Enterprise SQL Server 2008 Enterprise is a comprehensive data platform for running mission critical online transaction processing

More information

Practical Considerations for Real-Time Business Intelligence. Donovan Schneider Yahoo! September 11, 2006

Practical Considerations for Real-Time Business Intelligence. Donovan Schneider Yahoo! September 11, 2006 Practical Considerations for Real-Time Business Intelligence Donovan Schneider Yahoo! September 11, 2006 Outline Business Intelligence (BI) Background Real-Time Business Intelligence Examples Two Requirements

More information

LEARNING SOLUTIONS website milner.com/learning email training@milner.com phone 800 875 5042

LEARNING SOLUTIONS website milner.com/learning email training@milner.com phone 800 875 5042 Course 20467A: Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Length: 5 Days Published: December 21, 2012 Language(s): English Audience(s): IT Professionals Overview Level: 300

More information

INTELLIGENT DEFECT ANALYSIS, FRAMEWORK FOR INTEGRATED DATA MANAGEMENT

INTELLIGENT DEFECT ANALYSIS, FRAMEWORK FOR INTEGRATED DATA MANAGEMENT INTELLIGENT DEFECT ANALYSIS, FRAMEWORK FOR INTEGRATED DATA MANAGEMENT Website: http://www.siglaz.com Abstract Spatial signature analysis (SSA) is one of the key technologies that semiconductor manufacturers

More information