Big Data and Analytics: Challenges and Opportunities



Similar documents
COMP9321 Web Application Engineering

Big Data Explained. An introduction to Big Data Science.

Transforming the Telecoms Business using Big Data and Analytics

How To Make Sense Of Data With Altilia

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

HDP Hadoop From concept to deployment.

Machine Learning and Cloud Computing. trends, issues, solutions. EGI-InSPIRE RI

Sunnie Chung. Cleveland State University

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Getting to Know Big Data

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Integrating a Big Data Platform into Government:

Bayesian networks - Time-series models - Apache Spark & Scala

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

BIG DATA CHALLENGES AND PERSPECTIVES

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Scaling Out With Apache Spark. DTL Meeting Slides based on

BIG DATA TRENDS AND TECHNOLOGIES

Industry 4.0 and Big Data

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Oracle Big Data SQL Technical Update

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

How Companies are! Using Spark

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Business Intelligence for Big Data

BIG DATA IN BUSINESS ENVIRONMENT

The 4 Pillars of Technosoft s Big Data Practice

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

IBM Solution Framework for Lifecycle Management of Research Data IBM Corporation

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Hadoop Ecosystem B Y R A H I M A.

Big Data and Data Science. The globally recognised training program

Are You Ready for Big Data?

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business

The Future of Data Management

International Journal of Innovative Research in Computer and Communication Engineering

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Real Time Data Processing using Spark Streaming

Big Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic

Are You Ready for Big Data?

The Future of Business Analytics is Now! 2013 IBM Corporation

Hadoop2, Spark Big Data, real time, machine learning & use cases. Cédric Carbone Twitter

How To Handle Big Data With A Data Scientist

Big Data and Open Data

Data processing goes big

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Big Data: Tools and Technologies in Big Data

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Safe Harbor Statement

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

Big Data and Analytics (Fall 2015)

SQL + NOSQL + NEWSQL + REALTIME FOR INVESTMENT BANKS

Big Data on Microsoft Platform

Jenny Woodruff Innovation & Low Carbon Networks Engineer Steve Burns Innovation & Low Carbon Networks Engineer LCNF2013 Thursday 14 th November 2013

INTRODUCTION TO CASSANDRA

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Big Data and Data Science: Behind the Buzz Words

Data Refinery with Big Data Aspects

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

The Internet of Things and Big Data: Intro

Big Data Analytics Hadoop and Spark

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

ANALYTICS CENTER LEARNING PROGRAM

Introduction to Data Mining

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Statistics for BIG data

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

Hadoop. Sunday, November 25, 12

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

Harnessing the Data Flood: Oracle s Visionary Platform from Device to Data Center. Chris Baker Senior Vice President Worldwide ISV/OEM Java Sales

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Apache Hadoop. Alexandru Costan

Fast Data in the Era of Big Data: Twitter s Real-

The Need for Training in Big Data: Experiences and Case Studies

Chapter 7. Using Hadoop Cluster and MapReduce

CiteSeer x in the Cloud

Luncheon Webinar Series May 13, 2013

CIS492 Special Topics: Cloud Computing د. منذر الطزاونة

Where is... How do I get to...

BIG DATA ANALYTICS For REAL TIME SYSTEM

Oracle Big Data Spatial and Graph

Manifest for Big Data Pig, Hive & Jaql

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Transcription:

Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif University of Technology 4 July 2015

We are Generating Vast Amounts of Data!! 2 Remote patient monitoring Product sensors Healthcare Social media Manufacturing books, music, videos, etc. Retail Real time location data Digitalization of Artefacts Location-Based Services

We are Generating Vast Amounts of Data!! Air Bus A380: generate 10 TB every 30 min Twitter: Generate approximately 12 TB of data per day. Facebook: Facebook data grows by over 500 TB daily. New York Stock: Exchange 1TB of data everyday. 3

We are Generating Vast Amounts of Meta-data!! 4 Provenance Data Versioning Privacy Security We are Tracing everything: Who did What? When? Where?

We are Generating Vast Amounts of Meta-data!! 5 Provenance Data Versioning Privacy Security We are Tracing everything: Who did What? When? Where?

We are Generating Vast Amounts of Meta-data!! Reading a book, e.g. Kindle tracks: what you are reading, when you are reading it, how often you read it, etc. Listening to music, e.g. mp3 player tracks: what you are listening to, when and how often, in what order, etc. Smart phones, e.g. iphone tracks: our location, our speed, what apps we are using, who we are ringing, etc. 6

We are Generating Vast Amounts of Meta-data!! Reading a book, e.g. Kindle tracks: what you are reading, when you are reading it, how often you read it, etc. Listening to music, e.g. mp3 player tracks: what you are listening to, when and how often, in what order, etc. Smart phones, e.g. iphone tracks: our location, our speed, what apps we are using, who we are ringing, etc. 7

Big Data and Big Meta-Data 8 Big share, comment, review, crowdsource, etc.

So, What is Big Data? Big data refers to our ability to collect and analyse the ever expanding amounts of data and meta-data that we are generating every second! Challenges: Capture, Storage, Search, Sharing, Transfer, Analysis, Visualization, etc. 9

So, What is Big Data? Big data refers to our ability to collect and analyse the ever expanding amounts of data and meta-data that we are generating every second! Challenges: Capture, Storage, Search, Sharing, Transfer, Analysis, Visualization, etc. 10

Volume What Makes it Big Data? the vast amounts of data generated every second. Velocity the speed at which new data is generated and moves around. Variety the increasingly different types of data. Veracity the quality of data, e.g. the messiness of the data. Needs detecting and correcting noisy and inconsistent data Value Statistical, Events, Correlation, Hypothetical 11

Challenges: How to Store and Process? 12 Big data is high volume, high velocity, and/or high variety information assets. Require new forms of storage and processing. On-hand database management tools? Traditional data processing applications?

Challenges: Big Data Storage NoSQL databases: 13 Employs less constrained consistency models. Simple retrieval and appending operations. Significant performance benefits. Examples: Key value Store Document Store Graph Database

(Graphs are Everywhere) Challenges: Big Data Storage 14 Social Network User Collaborative Filtering Netflix Movie Probabilistic Analysis Text Analysis Docs Wiki Words

Challenges: Big Data Analytics (Graphs are Everywhere) 15 Social Network User Collaborative Filtering Netflix Movie Probabilistic Analysis Text Analysis Docs Wiki Words

Challenges: Big Data Analytics (Graphs are Everywhere) 16 Social Network User Collaborative Filtering Netflix Movie Probabilistic Analysis Text Analysis Docs Wiki Words

Challenges: Big Data Processing Apache Hadoop: 17 Hadoop is an open source framework that uses a simple programming model to enable distributed processing of large data sets on clusters of computers. Apache Hadoop solution: Distributed File System (HDFS) MapReduce Pig HCatalog Who Use Hadoop? Amazon Facebook Google IBM New York Times Yahoo!

Challenges: Big Data Processing Apache Spark: Fast and Expressive Cluster Computing Engine Compatible with Apache Hadoop 18 Efficient In-memory storage Usable Rich APIs in Java, Scala, Python

Challenges: Big Data Processing Apache Spark: Fast and Expressive Cluster Computing Engine Compatible with Apache Hadoop 19 Efficient In-memory storage Usable Rich APIs in Java, Scala, Python Resilient Distributed Dataset (RDD), Spark's data storage model

Challenges: Big Data Integration 20 Workflows IT Systems Web Services People Example Scenario: Business Processes (BPs) BPs Execution Log..

Challenges: Big Data Integration 21 Workflows IT Systems Web Services People Example Scenario: Business Processes (BPs) BPs Execution Log..

Challenges: Big Data Integration Messy, schema-less and complex Big Data world. Less than 10% of Big Data world are genuinely relational. 22 e.g. Linked Data

Challenges: Big Data Integration Big Data-as-a-Service: Effective processing of big data within acceptable processing time Easy access of the big data and the big data analysis results API Engineering 23 ProgrammableWeb - APIs, Mashups and the Web as Platform; www.programmableweb.com/ DataSift, CSDL

Challenges: Big data requires a broad set of skills 24 Math and Operations Research Expertise Data Experts Data architecture, management, governance, policy Develop analytic algorithms Decision Making Executive and Management Apply information to solve business issues Tool Developers Mask complexity and analytics to lower skills boundaries Visualization Expertise Interpret data sets, determine correlations and present in meaningful ways Industry Vertical Domain Expertise Develop hypothesis, identify relevant business issues, ask the right questions

Challenges: Big Data Analytics Analytics can be defined in many ways, but what matters is the purpose of analytics. Most definitions agree on the following: 25 Analytics is used to gain insights from data in order to make better decisions, using mathematical or scientific methods. Data Insight Action Analyse Decide Manage the Data Understand the Data Act on the Data

Challenges: Big Data Analytics Analytics can be defined in many ways, but what matters is the purpose of analytics. Most definitions agree on the following: 26 Analytics is used to gain insights from data in order to make better decisions, using mathematical or scientific methods. Data Insight Action Analyse Decide Manage the Data Understand the Data Act on the Data

Challenges: Big Data Analytics 27

Challenges: Big Data Analytics 28

Challenges: Big Data Analytics Example: Beheshti et al., Scalable Graph-based OLAP Analytics over Process Execution Data, DAPD Journal (2015). Beheshti et al., A Framework and a Language for On-Line Analytical Processing on Graphs, WISE Conference (2012). 29 OLAP, is an approach to answering multi-dimensional analytical queries swiftly. Problem: extension of existing OLAP techniques to analysis of graphs is not straightforward. key business insights remain hidden in the interactions among objects. Solution: On-Line Analytical Processing on Graphs

Challenges: Big Data Analytics 30

(Graph Data Model) Challenges: Big Data Analytics Nodes (Entities, Folders, and Paths) Entities: 31 Structured/Unstructured typed/un-typed data objects. Paper Author Venue

(Graph Data Model) Challenges: Big Data Analytics Nodes (Entities, Folders, and Paths) Folder Nodes: Contains a set of inter-related entities (e.g. events). Can be a placeholder for the result of a given query. 32 Set of related authors. Set of related papers.

(Graph Data Model) Challenges: Big Data Analytics Nodes (Entities, Folders, and Paths) Path Nodes: Contains a set of paths (i.e. a path is a transitive relationship between two entities) 33 Alex author-of published-in VLDB Paper: Big Data

(Graph Data Model) Challenges: Big Data Analytics Nodes (Entities, Folders, and Paths) Relationships: Is a directed link between a pair of entities. Can be explicit or implicit. 34 Folder Implicit (e.g. member-of) Explicit (e.g. triggered-by)

(Graph OLAP) Challenges: Big Data Analytics 35

(Graph OLAP) Challenges: Big Data Analytics 36

(Graph OLAP) Challenges: Big Data Analytics 37

(Graph OLAP) Challenges: Big Data Analytics 38

Challenges: Big Data Analytics Big Data Analytic benefits from: NLP Machine Learning pattern recognition, learning, KG NLP Example: Beheshti, et al,, A Systematic Review and Comparative Analysis of Cross-Document Coreference Resolution Methods and Tools, Computing Journal (2015), submitted. 39

(Graph Data Model) Challenges: Big Data Analytics 40 NLP Example: Beheshti S.M.R., et al,, A Systematic Review and Comparative Analysis of Cross-Document Coreference Resolution Methods and Tools, Computing Journal (2015), submitted.

(Graph Data Model) Challenges: Big Data Analytics NLP Example: Beheshti S.M.R., et al,, A Systematic Review and Comparative Analysis of Cross- Document Coreference Resolution Methods and Tools, Computing Journal (2015), submitted. 41

(Graph Data Model) Challenges: Big Data Analytics 42 Big Data Analytics benefits from: NLP Machine Learning Pattern recognition, Learning, Extraction, Classification, Enrichment, Linking, etc. Knowledge Graph (KG) KG Construction Open Source Data

(Graph Data Model) Challenges: Big Data Analytics 43 Big Data Analytics benefits from: NLP Machine Learning Pattern recognition, Learning, Extraction, Classification, Enrichment, Linking, etc. Knowledge Graph (KG) KG Construction Open Source Data

Big Data Leadership!! Industry has been in the lead Google, Amazon, Yahoo!, etc. University researchers have been left behind!! due to lack of access to large-scale cluster computing facilities Government agencies are making heavy investments Investments in big-data computing will have extraordinary near-term and long-term benefits. Cloud computing must be considered a strategic resource 44

Big Data: Opportunities 45 Varieties of Data Text Social Media Networks Multimedia Machine Data Sensors Analytics Organizing Big Data Navigating through data Summarizing Big Data Process Analytics Support decision-making Integration Integrating enterprise and public data Linking data/context Entity Extraction and Integration Knowledge Graph Big Data Performance In memory New Benchmarks and Architecture User Experience automation and intelligent guidance Visualizing with Analytics Interacting with Analytics Storytelling

Big Data: Opportunities 46 Varieties of Data Text Social Media Networks Multimedia Machine Data Sensors Analytics Organizing Big Data Navigating through data Summarizing Big Data Process Analytics Support decision-making Integration Integrating enterprise and public data Linking data/context Entity Extraction and Integration Knowledge Graph Big Data Performance In memory New Benchmarks and Architecture User Experience automation and intelligent guidance Visualizing with Analytics Interacting with Analytics Storytelling

Conclusion Why Big Data is different from past Very Large Datasets? Meta-Data!! 47 Having the ability to analyse Big Data is of limited value if users cannot understand the analysis. How can the industry and academia collaborate towards solving Big Data challenges!! What is big today maybe not be big tomorrow!

Questions / Suggestions 48