ANALYTICS IN BIG DATA ERA

Similar documents
ANALYTICS IN BIG DATA ERA

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Professor, D.Sc. (Tech.) Eugene Kovshov MSTU «STANKIN», Moscow, Russia

Big Data and Advanced Analytics Technologies for the Smart Grid

Data, Measurements, Features

BIG DATA What it is and how to use?

The Data Mining Process

MSCA Introduction to Statistical Concepts

Azure Machine Learning, SQL Data Mining and R

Advanced analytics at your hands

Study Plan for the Master Degree In Industrial Engineering / Management. (Thesis Track)

Chapter 12 Discovering New Knowledge Data Mining

Introduction to Data Mining

Chapter ML:XI. XI. Cluster Analysis

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

ANALYTICS CENTER LEARNING PROGRAM

Prerequisites. Course Outline

Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India

XML enabled databases. Non relational databases. Guido Rotondi

Learning outcomes. Knowledge and understanding. Competence and skills

Big Data and Complex Networks Analytics. Timos Sellis, CSIT Kathy Horadam, MGS

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

Role of Social Networking in Marketing using Data Mining

The University of Jordan

Industrial Roadmap for Connected Machines. Sal Spada Research Director ARC Advisory Group

Machine Learning over Big Data

QUALITY ENGINEERING PROGRAM

Big Data Analytics and Optimization

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

MSCA Introduction to Statistical Concepts

OUTLIER ANALYSIS. Data Mining 1

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

Information Management course

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Industrial and Systems Engineering Master of Science Program Data Analytics and Optimization

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS November 7, Machine Learning Group

Research Statement Immanuel Trummer

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Challenges for Data Driven Systems

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Our Raison d'être. Identify major choice decision points. Leverage Analytical Tools and Techniques to solve problems hindering these decision points

Server Load Prediction

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

Data Mining Algorithms Part 1. Dejan Sarka

Sanjeev Kumar. contribute

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Bayesian networks - Time-series models - Apache Spark & Scala

SURVEY REPORT DATA SCIENCE SOCIETY 2014

8. Machine Learning Applied Artificial Intelligence

Advanced In-Database Analytics

HP Service Health Analyzer: Decoding the DNA of IT performance problems

List of Ph.D. Courses

SureSense Software Suite Overview

Big Data Analysis: Apache Storm Perspective

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM. Aniket Bochare - aniketb1@umbc.edu. CMSC Presentation

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011

CHAPTER 1 INTRODUCTION

The Internet of Things and Big Data: Intro

CHAPTER 5 INTELLIGENT TECHNIQUES TO PREVENT SQL INJECTION ATTACKS

Steven C.H. Hoi School of Information Systems Singapore Management University

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

BIG DATA: PROMISE, POWER AND PITFALLS NISHANT MEHTA

Application of Predictive Analytics for Better Alignment of Business and IT

Data Isn't Everything

Sunnie Chung. Cleveland State University

itesla Project Innovative Tools for Electrical System Security within Large Areas

Software Development Training Camp 1 (0-3) Prerequisite : Program development skill enhancement camp, at least 48 person-hours.

Program description for the Master s Degree Program in Mathematics and Finance

Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

A Survey on Intrusion Detection System with Data Mining Techniques

Topology Aware Analytics for Elastic Cloud Services

Intrusion Detection via Machine Learning for SCADA System Protection

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Proactive database performance management

Data Mining and Analysis of Online Social Networks

Better decision making under uncertain conditions using Monte Carlo Simulation

DATA MINING IN FINANCE

CoolaData Predictive Analytics

Transforming the Telecoms Business using Big Data and Analytics

Statistics for BIG data

secure intelligence collection and assessment system Your business technologists. Powering progress

Marketing Mix Modelling and Big Data P. M Cain

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

How To Get A Computer Engineering Degree

Transcription:

ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d.

NEW QUESTIONS WITH BIG DATA Not always data are in structured data model Often we need to join data with not same keys Often data coming with periodic flow in real time Often we need to recognize pattern from data changing frequently New ways to manage distributed and not structured in classical way data are needed: We need different paradigm to organize data and, above all, to query them. Collect several sources and manage them open several new problems: Relational data (GRAPH DATA) can be useful to understand event spreading in a population. Data in motion coming from several tools on field (sensor devices) provide dynamic pattern often without an history of their form

ANALYSIS Not always you can apply sampling to extract data Not always you can join data to define ABT Often you need to know how environment can influence event changements. Often we need to merging information collected in different time window. SQL Queries often are useless to reach these data: Information are not organized into DB structures Data are very different way to provides information: i.e. text are not easy to query using traditional query languages. Merging are driven by fuzzy keys where you can assign group information according statistic relationship. Event can be happen driven from relational with other data rather from specific behavior.

SAS PROCEDURES BIG DATA REQUIRES ALSO SEVERAL METHODOLOGICAL STRATEGIES: methods for pattern recognition coming from statistical inference analysis using SEMMA paradigm for supervised and unsupervised data patterns. Other coming from stochastic process analysis both for continue time and discrete events like diffusion process or markov chains process. Time series forecasting: stochastic processes in continue time with continue space Multivariate analysis applied on semantic rules to discover text patterns Graph analysis

ANALYTICAL CATEGORIES AND TARGET USAGE Statistics Data Mining Text Mining Forecasting Econometrics Optimization Binary target & continuous no. predictions Linear, Non- Linear, & Mixed Linear modeling Complex relationships Tree-based Classification Variable Selection Parsing large-scale text collections Extract entities Auto. Stemming & synonym detection Large-scale, multiple hierarchy problems Probability of events Severity of random events Local search optimization Large-scale linear & mixed integer problems Graph theory

Data coming from different sources can be tie using different methods like linear or not linear canonical decomposition. Data pattern variability on data in motion like data coming from devices can be sampled or simulate pattern distribution. Sparse vector data with missing values can be simulate using particular regression methods Discrete choice among different events can be defined using multinomial discrete models. Automatic time series forecast considering many series at the same time

GRAPH ANALYSIS Network Graph Analysis can be used to: Node Link Measuring nodes importance and relationships among them. Measuring changes over time into a net. Identify how events spreading into the net using particular diffusion process.

Scenario REAL TIME MONITORING SYSTEM: Building and managing the behavioral patterns of the measures for each type sensor to detect abnormal process by rules of alarm (offline process). Building scenario how events spreading and influence different part of system Monitoring measures to detect anomalies and the validity of the rules over time (online process). Produce models to predict abnormalities in the medium term.

Scenario INTEGRATED PROCESS CONTROL: Shewhart type control charts with identification of the role of the history of the measures and trend-cycle components according to the Box-Jenkins methodology Multivariate analysis of processes: This is the main tool for statistical process control measures in relation to each other considering Markov chain process or diffusion processes Classification system components: The machines can be classified according to their behavior and some information about the specific characteristics of the same Identifying patterns of alarm: Rules of diagnostic thresholds identified by the control charts to minimize false alarms, depending on the history of the event to be monitored in real time

ADMINISTRATION SYSTEM: EXAMPLE System interface Extraction rules DABT Pattern recognition and event handling Module Event process thresholds managing for alert process Measures Metadata and classification Historical process data storage

REAL TIME MONITORING SYSTEM: EXAMPLE Alert Rules and pattern thresholds Module in real time check Real time modelling. Data streaming analysis and update historical data. Real time Feedback