Big Data and Industrial Internet



Similar documents
Hadoop-BAM and SeqPig

Processing NGS Data with Hadoop-BAM and SeqPig

CSE-E5430 Scalable Cloud Computing. Lecture 4

Scalable Cloud Computing

BIG DATA TRENDS AND TECHNOLOGIES

CSE-E5430 Scalable Cloud Computing Lecture 2

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

HDP Enabling the Modern Data Architecture

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

The Inside Scoop on Hadoop

Dominik Wagenknecht Accenture

Large scale processing using Hadoop. Ján Vaňo

CSE-E5430 Scalable Cloud Computing Lecture 11

Modernizing Your Data Warehouse for Hadoop

Information Builders Mission & Value Proposition

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Hadoop & Spark Using Amazon EMR

BIG DATA USING HADOOP

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Bringing Big Data to People

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Step by Step: Big Data Technology. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015

Comprehensive Analytics on the Hortonworks Data Platform

Hadoop implementation of MapReduce computational model. Ján Vaňo

HDP Hadoop From concept to deployment.

The Future of Data Management

Accelerating Enterprise Big Data Success. Tim Stevens, VP of Business and Corporate Development Cloudera

Big Data Analytics for Cyber

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Talend Big Data. Delivering instant value from all your data. Talend

Big Data and Hadoop for the Executive A Reference Guide

Hadoop Ecosystem B Y R A H I M A.

Big Data and Data Science: Behind the Buzz Words

The little elephant driving Big Data

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Big Data on Microsoft Platform

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

<Insert Picture Here> Big Data

Real Time Big Data Processing

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Open source Google-style large scale data analysis with Hadoop

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

How Companies are! Using Spark

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex

Hadoop IST 734 SS CHUNG

Big Data Analytics - Accelerated. stream-horizon.com

Big Data: Tools and Technologies in Big Data

Hadoop: Distributed Data Processing. Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010

Virtual Machine (VM) These VMs are to be used for teaching: they are not workstations for calculation.

Big Data Explained. An introduction to Big Data Science.

TRAINING PROGRAM ON BIGDATA/HADOOP

Big Data Course Highlights

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

The Internet of Things and Big Data: Intro

Hadoop Introduction coreservlets.com and Dima May coreservlets.com and Dima May

Hadoop and Map-Reduce. Swati Gore

Big Data. Lyle Ungar, University of Pennsylvania

Big Analytics in the Cloud. Matt Winkler PM, Big

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Big Data Technologies Compared June 2014

Unified Big Data Processing with Apache Spark. Matei

How To Scale Out Of A Nosql Database

HADOOP VENDOR DISTRIBUTIONS THE WHY, THE WHO AND THE HOW? Guruprasad K.N. Enterprise Architect Wipro BOTWORKS

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Next-Gen Big Data Analytics using the Spark stack

Data processing goes big

Dell In-Memory Appliance for Cloudera Enterprise

The Future of Data Management with Hadoop and the Enterprise Data Hub

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Big Data Processing: Past, Present and Future

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Upcoming Announcements

Deploying Hadoop with Manager

Virtualizing Apache Hadoop. June, 2012

Ali Ghodsi Head of PM and Engineering Databricks

#TalendSandbox for Big Data

Big Data Big Data/Data Analytics & Software Development

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Hadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent.

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer,

BIG DATA What it is and how to use?

Big Data Too Big To Ignore

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Big Data Realities Hadoop in the Enterprise Architecture

Big Data Workshop. dattamsha.com

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Microsoft Big Data. Solution Brief

From Internet Data Centers to Data Centers in the Cloud

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

A Modern Data Architecture with Apache Hadoop

Data Analyst Program- 0 to 100

Transcription:

Big Data and Industrial Internet Keijo Heljanko Department of Computer Science and Helsinki Institute for Information Technology HIIT School of Science, Aalto University keijo.heljanko@aalto.fi 16.6-2015 1/13

Big Data EMC sponsored IDC study estimates the Digital Universe to be 4.4 Zettabytes (4 400 000 000 TB) in 2013 The amount of data is estimated to grow to 44 Zettabytes by 2020 Data comes from: Video, digital images, sensor data, biological data, Internet sites, social media, Internet of Things,... Some examples: Netflix is collecting 1 PB of data per month from its video service user behaviour Rovio is collecting in the order of 1 TB of data per day of games logs The problem of such large data massed, termed Big Data calls for new approaches to analyzing these data masses 2/13

No Single Threaded Performance Increases Herb Sutter: The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software. Dr. Dobb s Journal, 30(3), March 2005 (updated graph in August 2009). 3/13

Implications of the End of Free Lunch The clock speeds of microprocessors are not going to improve much in the foreseeable future The efficiency gains in single threaded performance are going to be only moderate The number of transistors in a microprocessor is still growing at a high rate One of the main uses of transistors has been to increase the number of computing cores the processor has The number of cores in a low end workstation (as those employed in large scale datacenters) is going to keep on steadily growing Programming models need to change to efficiently exploit all the available concurrency - scalability to high number of cores/processors will need to be a major focus 4/13

Hadoop - Linux of Big Data Hadoop = Open Source Distributed Operating System Distribution for Big Data Based on cloning the Google architecture design Fault tolerant distributed filesystem: HDFS Batch processing for unstructured data: Hadoop MapReduce and Apache Pig (HDD), Apache Spark (RAM) Distributed SQL database queries for analytics: Apache Hive, Spark SQL, Cloudera Impala, Facebook Presto Fault tolerant real-time distributed database: HBase Distributed machine learning libraries, text indexing & search, etc. Data import and export with traditional databases: Sqoop 5/13

Commercial Hadoop Support Cloudera: Probably the largest Hadoop distributor, partially owned by Intel (740 million USD investment for 18% share). Available from: http://www.cloudera.com/ Hortonworks: Yahoo! spin-off from their large Hadoop development team: http://www.hortonworks.com/ MapR: A rewrite of much of Apache Hadoop in C++, including a new filesystem. API-compatible with Apache Hadoop. http://www.mapr.com/ 6/13

Example: Cloudera s Hadoop Stack 7/13

Example: Hortonworks Hadoop Stack 8/13

Lambda Architecture Batch computing systems like Hadoop have a large latency in order of minutes to hours but have extremely good fault tolerance Stream processing systems such as Apache Storm combined with Apache Kafka allow for large scale real time processing with reduce fault tolerance These stream processing frameworks can provide latencies in the order of tens of milliseconds Lambda architecture is the combination of consistent / exact but high latency batch processing and available / approximate low latency stream processing 9/13

Lambda Architecture (cnt.) For more details, see e.g.,: http://lambda-architecture.net/ 10/13

Lambda Architecture for IoT Use 3 11/13

Big Data at Aalto We have been working with Hadoop since 2010 Teaching Hadoop and Big data technologies since 2011 - Course CSE-E5430 Scalable Cloud Computing, materials freely available on the Web: https://noppa.aalto.fi/noppa/kurssi/cse-e5430/luennot Analytics and Data Science Minor for Aalto students since 2014 Collaborating in Data to Intelligence project on Hadoop and Apache Spark related topics with CSC, Tieto, and PacketVideo Collaborating with QPR Software on business process mining 12/13

Conclusions Hadoop is becoming the Linux distribution for Big Data It consists of a number of interoperable open source components Commercial support is available from commercial vendors such as: Cloudera, HortonWorks, MapR, IBM, and Intel; as well as their partners Amazon is providing Elastic MapReduce (EMR), which is basically Hadoop as a service offering Microsoft HDInsight is a similar service for Hadoop on Microsoft Azure We are happy to discuss your Big Data issues, email: keijo.heljanko@aalto.fi 13/13