Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Similar documents
Big Data Processing and Analytics for Mouse Embryo Images

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Doing Multidisciplinary Research in Data Science

BIG DATA CHALLENGES AND PERSPECTIVES

A Survey on Big Data Concepts and Tools

HP Vertica at MIT Sloan Sports Analytics Conference March 1, 2013 Will Cairns, Senior Data Scientist, HP Vertica

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Introduction to Predictive Analytics. Dr. Ronen Meiri

Introduction to the Mathematics of Big Data. Philippe B. Laval

BIG DATA What it is and how to use?

Hadoop. Sunday, November 25, 12

The Big Deal about Big Data. Mike Skinner, CPA CISA CITP HORNE LLP

Big Data Explained. An introduction to Big Data Science.

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

COMP9321 Web Application Engineering

BIG DATA TRENDS AND TECHNOLOGIES

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Age of Big data. Presented by: Mohammad Iqbal BCM -2014

Big Data and Analytics: Challenges and Opportunities

Transforming the Telecoms Business using Big Data and Analytics

Large-Scale Data Processing

Large scale processing using Hadoop. Ján Vaňo

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Big Data Big Data/Data Analytics & Software Development

Changing the face of Business Intelligence & Information Management

Are You Ready for Big Data?

Big Data Technologies Compared June 2014

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

How To Scale Out Of A Nosql Database

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Data Refinery with Big Data Aspects

Architectures for massive data management

Big Data a threat or a chance?

Managing large clusters resources

Hadoop implementation of MapReduce computational model. Ján Vaňo

Chapter 1. Contrasting traditional and visual analytics approaches

Open source Google-style large scale data analysis with Hadoop

BIG DATA: ARE YOU READY? Andy Kyiet Demand Flow Intelligence May, 2013

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Big Data. Lyle Ungar, University of Pennsylvania

NoSQL for SQL Professionals William McKnight

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Big Data and Industrial Internet

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Real Time Big Data Processing

Il mondo dei DB Cambia : Tecnologie e opportunita`

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Chapter 7. Using Hadoop Cluster and MapReduce

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

Are You Ready for Big Data?

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Getting Started Practical Input For Your Roadmap

The 4 Pillars of Technosoft s Big Data Practice

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Big Data: Study in Structured and Unstructured Data

The Internet of Things and Big Data: Intro

Majed Al-Ghandour, PhD, PE, CPM Division of Planning and Programming NCDOT 2016 NCAMPO Conference- Greensboro, NC May 12, 2016

Journal of Environmental Science, Computer Science and Engineering & Technology

Next-Generation Cloud Analytics with Amazon Redshift

The Next Wave of Data Management. Is Big Data The New Normal?

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Intro to Big Data and Business Intelligence

Teradata Unified Big Data Architecture

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

The future: Big Data, IoT, VR, AR. Leif Granholm Tekla / Trimble buildings Senior Vice President / BIM Ambassador

Challenges for Data Driven Systems

Integrating a Big Data Platform into Government:

Scaling Out With Apache Spark. DTL Meeting Slides based on

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

Lecture 10 - Functional programming: Hadoop and MapReduce

Customized Report- Big Data

Hadoop Big Data for Processing Data and Performing Workload

So Just What Is Big Data? James E. Tcheng, MD, FACC, FSCAI

Ramesh Bhashyam Teradata Fellow Teradata Corporation

So What s the Big Deal?

Impact of Big Data in Oil & Gas Industry. Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India.

BIG DATA IN BUSINESS ENVIRONMENT

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Big Data Analytics. Genoveva Vargas-Solar French Council of Scientific Research, LIG & LAFMIA Labs

Big Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe May, 2013

Information Builders Mission & Value Proposition

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

INTRODUCTION TO CASSANDRA

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

HDP Hadoop From concept to deployment.

Transcription:

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology, Manchester Metropolitan University http://www2.docm.mmu.ac.uk/staff/l.han/ June, 2014

Outline Data tsunami What is big data? Value of big data Challenges of big data Technologies for big data Data exploration for future roadmap @Funds.MMU

Data tsunami Increased capability of generating and capturing data (e.g. Petascale simulations, experimental devices, the Internet, sensors, etc.) 300m photos, 2.5m contents shared per day cabig: 4.7+millions for cancers

Data tsunami Gene expression data in GEO and ArrayExpress: over 1 millions Climate data from NASA: 32 Petabytes (2 50 ) SKA(The Square Kilometre Array): The data collected by the SKA in a single day would take nearly two million years to playback on an ipod.

Slide Credit: Intel

Data tsunami Data intensive era -- big data/data rich/datacentric/data-driven era 40 Data Volumes 35 30 20 10 0 1.3 2 1 2 2010 2011 2020

What is big data? Data size representation Binary digit (bit) Byte(B): 8 bits Kilobyte (KB): 210 bytes Megabyte (MB): 220 bytes Gigabyte (GB): 230 bytes Terabyte (TB): 240 bytes; Petabyte (PB): 250 bytes Exabyte (EB): 260 bytes; Zettabyte (ZB): 270 bytes Yottabyte (YB): 280 bytes

What is big data? Big Data... and the Next Wave of InfraStress John R. Mashey Chief Scientist, SGI Timeline Technology Waves: NOT technology for technology s sake IT S WHAT YOU DO WITH IT But if you don t understand the trends IT S WHAT IT WILL DO TO YOU OK! Uh oh! 4/25/98 page 1 1998 - the origins of big data 2001-3 D Data Management: Controlling Data Volume,Velocity, and Variety by Doug Laney 2010- widespread in the Economist 2012- Gartner, IBM, Cisco, Microsoft, etc Data, data everywhere A special report on managing information

What is big data? A relative term ( don t define it in terms of size being larger than a certain number of terabytes or petabytes) Larger, more complex and hard to access, organise and analysis beyond the capability of the existing tools (varying on sectors) The data volume, velocity or variety/complexity (3 V) limits the ability to perform effective analysis using traditional approaches

What is big data? Big data is about pushing limits!

What is big data? Volume (Data at rest) The size and scale of the data By 2015,it will reach 8 Zettabytes (IDC)

What is big data? Velocity (data in motion) Real time capture and analytics/streaming processing and analytics Stock exchange, fraud analysis/customer churn predictions

What is big data? Variety/complexity (data in many forms) Various formats, types and structures Structured data, e.g. data defined by schema, relational databases, or semi-structured (xml) Unstructured data, e.g. free form text, emails, logs, images, audio, video, social media data (e.g. graph)

What is big data? Two more Vs Value: business value to be derived Veracity( data in doubt): the quality and understandability of the data!$#,-"))!"#$%&'()!"#$ %&'&$!*#,")!"+*%&'()!*+&"'()

Value of big data Next frontier for innovation, competition and productivity: Commerce and economy Science discovery in all most every science and engineering discipline for addressing societal challenges ( health, food, energy, environment, etc)

Value of big data Source: wikibon

Value of big data New paradigm Big Data leads science discovery

Source: samsung Challenges of big data Bottleneck in Technology: IT infrastructures

Source: eskills Challenges of big data Bottleneck in technical skills: professionals to handle big data!"#$%&'&$$ ())*+',-"./0$

Technologies for big data What kind of big data technologies in your mind? Cloud computing?...

Technologies for big data Big data processing and analytics Architectures for efficiently processing big data Data analytics for filtering, analysing and generating actionable insights 2+-3!"#"$6'078)9$ 6"&+)#19$ ('8%0):+#19$!"#"$%&'()**+,- $",.$/,"01*+* 4'5 4'5!"#"$6"07) 2+-3

Architectures Traditional approaches, for example, OLTP( online transaction processing) OLAP(online analytical processing): data warehouse Operations OLTP Business strategy Business process OLAP Informations Data analytics Decision making Business datawarehouse

Architectures Issues Relational databases (RDBMS), dealing with structured data only doesn t support complex analytics lacks scalability and performance

Architectures Current solutions Apache Hadoop: an open source for storage and large-scale processing of data-sets (both structured and unstructured data (nosql)); major components: HDFS, MapReduce, HCommon, HYarn Google File System and MapReduce Apache Spark: combine SQL, streaming, and complex analytics and in-memory computing

Architectures Parallel and distributed computing for data processing!"#$%"&'$()**!"#"$$%$&&

Architectures Parallelisation: a sought after solution for speeding up an application, particularly for data intensive applications Three considerations: How to distribute workloads or decompose an algorithm into parts How to map the tasks onto various computing nodes and execute subtasks in parallel How to coordinate and communicate subtasks on those computing nodes.

Architectures Data parallelism: workload are distributed into different computing nodes and the same task can be executed on different subsets of the data simultaneously Task parallelism: tasks are independent and can be executed purely in parallel Pipelining: an iteration of a task consisting of many stages, where each stage in the task is chained and executed in order and the output of one stage is the input of the next one. Pipelining can be implemented with streaming and without using streaming

Architectures Programming models for parallel and distributed computing (e.g. MPI, MapReduce, POSIX Threads, OpenMP, etc) Bridging the gap between the underlying hardware and the supporting layers of software available to applications Independent of programming languages and API

Architectures MapReduce: a programming model for processing large scale datasets Implementations(e.g. Google DFS, Apache Hadoop)

Architectures Map and Reduce functions Map: perform a function on an individual value of a data set and return a new list of values Given a dataset: A={1, 2, 3}, Map function: Square = X*X. After Map process, it returns {2,4,9}

Architectures Reduce: performa a function by combining values in a data set Given a dataset: B={2,4,9}, Reduce function: sum = X1+X2+X3. After Reduce process, it returns 15!!"#! "#$%&! '#(#!!!! $$%&'()& )*(+*(,-&./!0#1*&2!!

MapReduce Architectures <Hello, 4> <big data, 5> <Hello, 2> <big data, 3> <Hello, 6> <big data, 8> Map Reduce The question to count the words called hello and big data from big data

Architectures Comparison of RDBMS-based approaches, spark MapReduce

Data analytics Data analytics: discovery of useful, possibly unexpected, patterns in data, automation of data exploration and analysis Statistics analysis Machine learning/data mining, for example Classification, Clustering, Association rule, Regression, graph mining,...

Data analytics Classification AMD Diabetic Retinopathy

!!"#$%&'(')*+',-$.)&%.'+/'0,+)-123'$."2#'456&12',-$.)&%"2#'1-#+%")76' ' Data analytics! Clustering! Clustering of the fish industry in UK ''''''''''''''''''!"#$%&'8''!".7"2#'"23$.)%9''!!

Data analytics Graph mining... and so on Community detection in Facebook friends Source: http://wisonets.wordpress.com/

Big data technology = Architectures for data processing and management +Data analytics

Technologies for big data Future development: programming abstractions need to be developed to support and facilitate big data processing and analytics Apps Programming abstractions to support big data processing and analytics RDBMS NoSQL DFS...

Data exploration for future roadmap@funds.mmu We focus on: ----both fundamental and applied research in large-scale data processing and analytics --- Intelligent management and optimisation of largescale networked distributed systems ( challenges: reliability, scalability, security, resilience, autonomy and self-adaptation) http://www.scmdt.mmu.ac.uk/research/funds/

Data exploration for future roadmap@funds.mmu Food Health Future Energy People sustainability society Planning Manufacturing

Acknowledgement & Collaboration BBSRC EPSRC-DHPA Sustainability Society Network+ Amazon MMU Optos Fera MRC HGU

Acknowledgement & Collaboration Outside: MRC HGU, University of Edinburgh University of Manchester University of StrathClyde Heriot Watt University Loughborough University University of Glasgow... Optos Fera... University of Melbourne

Acknowledgement & Collaboration Inside: School of Science and Environment Business School School of Engineering Department of Sociology...

Thank You!