Lecture 9: Data Mining, Data Analytics and Big Data



Similar documents
Big Data Analytics. for the Exploitation of the CERN Accelerator Complex. Antonio Romero Marín

Oracle: Database and Data Management Innovations with CERN Public Day

Testing the In-Memory Column Store for in-database physics analysis. Dr. Maaike Limper

Oracle Advanced Analytics Oracle R Enterprise & Oracle Data Mining

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Big Data Analytics. Big Data is

Tax Fraud in Increasing

Getting Started with Oracle Data Miner 11g R2. Brendan Tierney

Spotfire and Tableau Positioning. Summary

Microsoft Services Exceed your business with Microsoft SharePoint Server 2010

Predictive Analytics. Noam Zeigerson, CTO

XpoLog Center Suite Log Management & Analysis platform

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features

2015 Workshops for Professors

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

Using OBIEE for Location-Aware Predictive Analytics

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Extend your analytic capabilities with SAP Predictive Analysis

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

XpoLog Center Suite Data Sheet

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Management Consulting Systems Integration Managed Services WHITE PAPER DATA DISCOVERY VS ENTERPRISE BUSINESS INTELLIGENCE

Implementation of Big Data and Analytics Projects with Big Data Discovery and BICS March 2015

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

General Session Oracle Business Analytics Executive Briefing

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Big Data Are You Ready? Thomas Kyte

Starting Smart with Oracle Advanced Analytics

XpoLog Competitive Comparison Sheet

ANALYTICS IN BIG DATA ERA

AV-24 Advanced Analytics for Predictive Maintenance

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Web based monitoring in the CMS experiment at CERN

IBM SPSS Modeler Professional

Advanced analytics at your hands

Survey of Big Data Architecture and Framework from the Industry

How To Create A Business Intelligence (Bi)

Harnessing the power of advanced analytics with IBM Netezza

Oracle Big Data Handbook

Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA

MOC 20467B: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

"FRAMEWORKING": A COLLABORATIVE APPROACH TO CONTROL SYSTEMS DEVELOPMENT

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE. Luigi Grimaudo Database And Data Mining Research Group

Cloud Computing and Advanced Relationship Analytics

IBM BigInsights for Apache Hadoop

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW

Proven Testing Techniques in Large Data Warehousing Projects

Oracle Big Data Building A Big Data Management System

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Data Analysis with Various Oracle Business Intelligence and Analytic Tools

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Fusion Applications Overview of Business Intelligence and Reporting components

Understanding the Value of In-Memory in the IT Landscape

Anomaly and Fraud Detection with Oracle Data Mining 11g Release 2

Introducing Oracle Exalytics In-Memory Machine

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Safe Harbor Statement

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Unified Data Integration Across Big Data Platforms

White Paper. Unified Data Integration Across Big Data Platforms

Development of Monitoring and Analysis Tools for the Huawei Cloud Storage

Test Data Management Concepts

Reference Architecture, Requirements, Gaps, Roles

IT Infrastructure Management

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

KnowledgeSEEKER POWERFUL SEGMENTATION, STRATEGY DESIGN AND VISUALIZATION SOFTWARE

Hadoop and Map-Reduce. Swati Gore

Oracle Real Time Decisions

EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data

ORACLE UTILITIES ANALYTICS

Lofan Abrams Data Services for Big Data Session # 2987

Oracle Big Data Strategy Simplified Infrastrcuture

Big Data Analytics with Oracle Advanced Analytics

Majed Al-Ghandour, PhD, PE, CPM Division of Planning and Programming NCDOT 2016 NCAMPO Conference- Greensboro, NC May 12, 2016

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BEYOND BI: Big Data Analytic Use Cases

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

<no narration for this slide>

OWB Users, Enter The New ODI World

Agile BI With SQL Server 2012

How To Make Data Streaming A Real Time Intelligence

CRITEO INTERNSHIP PROGRAM 2015/2016

Big Data at Cloud Scale

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

TUT NoSQL Seminar (Oracle) Big Data

Big Data Use Cases Update

Global outlook on the perspectives of technologies like Power Hub

SQL Server 2012 Gives You More Advanced Features (Out-Of-The-Box)

Oracle Database 11g Comparison Chart

Transcription:

Lecture 9: Data Mining, Data Analytics and Big Data Maaike Limper, Antonio Romero, Manuel Martin 1

Introduction Two openlab Projects in IT-DB Data Analytics In-Database Physics Analysis Both using data analytics Projects have different scopes 2

Summary CERN environment Data Analytic Project R and Oracle R Data Discovery 3

CERN is a extreme data environment Control and operations Million of sensors, signals Large number of control devices Equipment Monitoring and logging Supporting IT infrastructure Databases Network Services 4

CERN Data Investment CERN generates huge amount of data everyday Accelerator Logging Service around 275 GB/day CERN has great monitoring and logging systems Large amount of data has been stored over years DIAMON 5

Faults Some faults cannot be avoid Decrease the availability for running physics 6

A look into the Future Future LHC upgrades will increase luminosity Computing resources needs will be higher Data generated will increase drastically Next accelerators Future Circular Collider (80-100 km) 7

The objective Improve our systems Monitoring and Diagnostics Systems Predictive and Proactive systems 8

Summary CERN environment Data Analytic Project R and Oracle R Data Discovery 9

openlab Data Analytics Project Optimize our systems Reducing and predicting faults and corrective interventions Increase the availability and operations efficiency Profit from CERN data investment by using data analytics Extract knowledge Discover useful information Suggest conclusions Support decision making Control and Monitoring Systems Proactive Predictive Intelligent 10

Data analytic challenges at CERN Very dynamic and heterogeneous environment Projects (number of projects) Requirements (data analytic needs) Technologies (problem-driven) Data sources (raw, structured and unstructured) Large amount of data Education and Training Users know the data and the questions Help them how to connect both 11

Overview 12

Summary CERN environment Data Analytic Project R and Oracle R Data Discovery 13

What is R Language for statistical computing and graphics Standard and advanced statistical techniques Integrated suite of software facilities Data manipulation Calculation Graphical display Free and Open Source 14

Why R is good for data analysis Powerful Advanced statistics Plotting Extensible Over 5800 CRAN packages extending base functionality Standard de facto for data analytics Great user community Over 2 million R users worldwide Active development, frequently updated 15

IDE for R Multiple IDEs for R RStudio Open source version Windows, Mac, Linux Web (RStudio server) 16

R example New York Air Quality Measurements 17

R example 18

Some R Resources CRAN (http://cran.r-project.org) R Tutorial (http://www.r-tutor.com/) R-bloggers (http://www.r-bloggers.com/) Free courses Coursera - Data Science (Johns Hopkins University) Lynda - Up and Running with R O Really - Try R (http://tryr.codeschool.com) Learn R, in R Swirl (http://swirlstats.com) 19

Traditional R and Database Interaction Data has to be moved from database to client Client need to store it in HDD R user needs to know SQL Mixing R and SQL Not multithreaded or parallel Use special packages increasing complexity R client has to load everything in memory 20

Oracle R Enterprise A database-centric environment for analytical processes in R Allows to use the database server to run R scripts R working on data directly in the database Integration with the SQL language Run R scripts from SQL Transparency Layer Transparently interact with the database in R 21

Benefits of using ORE Allows in-database data analysis Provide data parallelism and resource management Execute R scripts in database server machine Scalability and performance Eliminate memory constraint Oracle databases are widely used at CERN 22

Cryogenics Use Case: Faulty Valves Detection What is the objective? Predict faulty valves before they actually fail How? Valve receive an aperture order value (aperture order) Effective aperture realized by the valve (aperture measured) Analyzing the difference between both (S = aperture order - aperture measured) 23

Cryogenics Use Case: Faulty Valves Detection Signals used S = aperture order - aperture measured Features extractions based on S Variance Percentile 99.9 Rope distance Noise Band Automatic Faulty Valves Detection System SVM - Support Vector Machine The learning set 44 valves Three different status Faulty, Not faulty Unknown Data comes from Accelerator Logging Service 24

Summary CERN environment Data Analytic Project R and Oracle R Data Discovery 25

Data Discovery Interactive and visual analytics Intended to be used by the end users Enabling them to use their intuition and knowledge of the data Powerful customization of dashboards and visualizations Without intervention of IT Structure and unstructured data Mainstream field in Data Analytics 26

Endeca Information Discovery Data discovery platform Analyze information of any type and any source Flexible and user friendly Professional ETL tool Powerful text analysis Sentiment Text tagging, entities extraction Multi language features 27

Endeca Use Case Data Discovery for improving the Accelerator Complex Operations Electronic Logbook Log of events in the accelerator complex 28

Endeca Use Case Electronic Logbook Endeca PoC DEMO 29