Big Data Management and Analytics



Similar documents
Big Data Analytics. Genoveva Vargas-Solar French Council of Scientific Research, LIG & LAFMIA Labs

Big Data and Analytics (Fall 2015)

CSE 427 CLOUD COMPUTING WITH BIG DATA APPLICATIONS

Big Data Analytics. Lucas Rego Drumond

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)

Teaching Scheme Credits Assigned Course Code Course Hrs./Week. BEITC802 Big Data Analytics. Theory Marks

Estimating PageRank Values of Wikipedia Articles using MapReduce

Big Data a threat or a chance?

SoSe 2014: M-TANI: Big Data Analytics

DSBA6100-U01 And U90 - Big Data Analytics for Competitive Advantage (Cross listed as MBAD7090, ITCS 6100, HCIP 6103) Fall 2015

Astrophysics with Terabyte Datasets. Alex Szalay, JHU and Jim Gray, Microsoft Research

I N T E L L I G E N T S O L U T I O N S, I N C. DATA MINING IMPLEMENTING THE PARADIGM SHIFT IN ANALYSIS & MODELING OF THE OILFIELD

XML enabled databases. Non relational databases. Guido Rotondi

Big Data. Donald Kossmann & Nesime Tatbul Systems Group ETH Zurich

How To Learn Data Analytics

Information Management course

Big Data for Big Intel

Big Data Analytics: Where is it Going and How Can it Be Taught at the Undergraduate Level?

Log Mining Based on Hadoop s Map and Reduce Technique

Collaborative Filtering Scalable Data Analysis Algorithms Claudia Lehmann, Andrina Mascher

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Oracle Big Data Strategy Simplified Infrastrcuture

How To Handle Big Data With A Data Scientist

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Are You Ready for Big Data?

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Advanced Big Data Analytics with R and Hadoop

CS 5890: Introduction to Data Science Syllabus, Utah State University, Fall

Challenges for Data Driven Systems

ANALYTICS CENTER LEARNING PROGRAM

Big Data Introduction

Workshop on Hadoop with Big Data

Big Data Systems CS 5965/6965 FALL 2015

HiBench Installation. Sunil Raiyani, Jayam Modi

Big Data Frameworks Course. Prof. Sasu Tarkoma

Transforming the Telecoms Business using Big Data and Analytics

Introduction to Data Science: CptS Syllabus First Offering: Fall 2015

Large-Scale Data Processing

Introduction to Data Mining

Big Data Analytics For Competitive Advantage Cross listed as MBAD7090, ITCS 6100, and HCIP 6103

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Massive Cloud Auditing using Data Mining on Hadoop

CS Data Science and Visualization Spring 2016

Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle

B490 Mining the Big Data. 0 Introduction

Using Data Mining and Machine Learning in Retail

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Intelligent Learning and Analysis Systems: Data Mining and Knowledge Discovery Prof. Dr. Stefan Wrobel; Dr. Tamas Horvath

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Integrating a Big Data Platform into Government:

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

BIG DATA AND ANALYTICS

HadoopRDF : A Scalable RDF Data Analysis System

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Are You Ready for Big Data?

USC Viterbi School of Engineering

BBM467 Data Intensive ApplicaAons

Oracle Big Data SQL Technical Update

Systems Engineering II. Pramod Bhatotia TU Dresden dresden.de

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

HDP Hadoop From concept to deployment.

Big Data and Data Science: Behind the Buzz Words

Big Data and Your Data Warehouse Philip Russom

class 1 welcome to CS265! BIG DATA SYSTEMS prof. Stratos Idreos

Big Data Training - Hackveda

Architecting for the Internet of Things & Big Data

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.

a Data Science Univ. Piraeus [GR]

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Data Management Course Syllabus

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

Streamdrill: Analyzing Big Data Streams in Realtime

BIG DATA & DATA SCIENCE

NextGen Infrastructure for Big DATA Analytics.

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

On a Hadoop-based Analytics Service System

Kafka & Redis for Big Data Solutions

Applications for Big Data Analytics

WHITE PAPER ON. Operational Analytics. HTC Global Services Inc. Do not copy or distribute.

Chapter 5: Stream Processing. Big Data Management and Analytics 193

Infrastructures for big data

COMP9321 Web Application Engineering

Big Data & Scripting Part II Streaming Algorithms

Complex, true real-time analytics on massive, changing datasets.

In-Memory Analytics for Big Data

Open Source Technologies on Microsoft Azure

Radoop: Analyzing Big Data with RapidMiner and Hadoop

Customer Case Study. Automatic Labs

Transcription:

Big Data Management and Analytics Lecture Notes Winter semester 2015 / 2016 Ludwig-Maximilians-University Munich Prof. Dr. Matthias Renz 2015 Based on lectures by Donald Kossmann (ETH Zürich), as well as Jure Leskovec, Anand Rajaraman, and Jeff Ullman (Stanford University)

Course Logistics Course website: http://www.dbs.ifi.lmu.de/cms/big_data_management_and_analytics Registration for this lecture is now open via Uniworx Registration required to attend the exams!!! Organization: Load: 3+2 hours weekly Required: Lecture "Database Systems I" or equivalent Beneficial: Lecture "Knowledge Discovery in Databases I" or equivalent Lecture: Prof. Dr. Matthias Renz Assisting: Klaus Arthur Schmid, Felix Borutta, Evgeniy Faermann, Christian Frey Tutors: TBA Big Data Management and Analytics 2

Why this course? Big Data is big $ and science: choose your poison Big Data Management and Analytics 3

We are drowning in data but starving for information Exponential grows in data GPS DB RF-ID http://www.popsci.com/announcements/article/2011-10/november-2011-data-power J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Data contains value and knowledge Big Data Management and Analytics 4

We are drowning in data but starving for information Exponential grows in data We are drowning in GPS data DB RF-ID http://www.popsci.com/announcements/article/2011-10/november-2011-data-power J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Data contains value and knowledge Big Data Management and Analytics 5

We are drowning in data but starving for information Exponential grows in data We are drowning in GPS data DB RF-ID http://www.popsci.com/announcements/article/2011-10/november-2011-data-power but starving for J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org information Data contains value and knowledge Big Data Management and Analytics 6

Why this course? Big Data is big $ and science: choose your poison Big Data approaches required for Data Science move data from raw to relevant Big Data Management and Analytics 7

Data Science (~escience/industry 4.0) The Fourth Paradigm: Age of data driven exploration Data Science (escience / Industry 4.0) [Informatik Pionier Jim Gray] Science Paradigms [Hey, Tansley, Tolle: Fourth Paradigm, 2009] Data driven Data Science unify theory, experiment, and simulation Computational simulating complex phenomena Theoretical using models, generalizations Last few hundred years Today Last few decades Empirical - describing natural phenomena Thousand years ago Time Big Data Management and Analytics 8

Data Science (~escience/industry 4.0) The Fourth Paradigm: Age of data driven exploration Data Science (escience / Industry 4.0) [Informatik Pionier Jim Gray] Data Science Data captured by instruments or generated by simulator Processed by software Information/knowledge stored in computer Scientist/Analyst analyzes database / files using data management and statistics [Hey, Tansley, Tolle: Fourth Paradigm, 2009] Big Data Management and Analytics 9

Data Science (~escience/industry 4.0) The Fourth Paradigm: Age of data driven exploration Data Science (escience / Industry 4.0) [Informatik Pionier Jim Gray] Data Science Data captured Modern science by instruments increasingly relies or on integrated generated process, by and simulator analyze complex data. Processed [Hey, Tansley, by Tolle: software Fourth Paradigm, 2009] Information/knowledge stored in computer information technologies and computation to collect, Scientist/Analyst analyzes database / files using data management and statistics [Hey, Tansley, Tolle: Fourth Paradigm, 2009] Big Data Management and Analytics 10

Why this course? Big Data is big $ and science: choose your poison Big Data approaches required for Data Science move data from raw to relevant Big Data is exciting gives a new twist to almost everything allows you to reinvent the wheel Big Data Management and Analytics 11

Why this course? Big Data is big $ and science: choose your poison Big Data approaches required for Data Science move data from raw to relevant Big Data is exciting gives a new twist to almost everything allows you to reinvent the wheel Big data is old opportunity to teach you some fundamental technology Big Data Management and Analytics 12

Outline of this course Introduction (Motivation and Overview) Introduction to Big Data the four V's NoSQL Hadoop / HDFS / MapReduce & Applications Spark Data Stream Processing & Applications & Algorithms Text Processing High-Dimensional Data Graph Data Processing (Link Analysis, Page Rank, Community Detection) Uncertain Data Processing (Concepts of probabilistic query processing and mining) Big Data Management and Analytics 13

Literature This course is mainly based on a mixture of existing external lectures, Surveys, Papers and Reports on Big Data There is NO, or better, I m not aware of a single book or script that is equivalent to this course (and addresses all issues discussed in this course) Since Big Data is a quite new and hot topic, standards and basic concepts are quite dynamic => The Web is a very appropriate source of relevant information External lectures basically used for this course: Big Data: Donald Kossmann & Nesime Tatbul, Systems Group ETH Zurich - http://www.systems.ethz.ch/node/217 Mining of Massive Datasets: Jure Leskovec, Anand Rajaraman, Jeff Ullman, Stanford University - http://www.mmds.org Further material will appear at our web page (check for updates during the course / open to further suggestions!) Big Data Management and Analytics 14