Distributed Data Management Summer Semester 2013 TU Kaiserslautern

Size: px
Start display at page:

Download "Distributed Data Management Summer Semester 2013 TU Kaiserslautern"

Transcription

1 Distributed Data Management Summer Semester 2013 TU Kaiserslautern Dr.- Ing. Sebas4an Michel saarland.de 1

2 Lecture 8+ (DISTRIBUTED) DATA STREAM PROCESSING (INTRODUCTION) 2

3 So Far: Databases/NoSQL Datastores Data is changing, yes, but this is more due to inserts and update to stored data items Historic data is kept Queries operate on full data (tables) MapReduce is extreme, Write- once & Read- many 4mes Data warehousing, too: periodically loading data in store for deep(er) analy4cs Data mining 3

4 Data Stream Management vs. Tradi4onal Data Management Query & Results DATA Base/ Store Insert Update Delete At query 4me, data is accessed as a whole Data is persistently stored Queries are ad- hoc (mainly) 4

5 Data Stream Management vs. Tradi4onal Data Management (Cont d) results Set of queries DATA STREAM Data is moving! Con4nuously generated (assumed infinite!) At high pace Queries are (mainly) con4nuous (aka. standing). Registered once, observed forever. Answer to queries in (near) real- 4me required (o`en) Probabilis4c methods for efficiency or considering only part of the stream (sliding window) 5

6 Sensor Networks (Distributed) Sensor Networks; Smart Dust Mainly numeric measurements of (natural) phenomena Computa4on of queries like max, average, min, quan4les, value > τ 6

7 Mobile Ad- Hoc Networks Connec4on between sensors are ad- hoc Efficient and reliable rou4ng Understanding (changing) topology Power consump4on Also: Vehicular Ad- Hoc Netw. 7

8 Social Sensors Explicitly: Snow Tweets (hkp://snowcore.uwaterloo.ca/snowtweets/) #snowtweets 50.0 cm. at K1A 0A2 #snowtweets 10.0 in. at #snowtweets 4 cm at Palmerston North 4414 Implicitly: By men4oning topics, people, in social communica4on 4me 8

9 Earthquake News on Twiker source: hkp://blog.socialflow.com/ 9

10 Stock Market Real- 4me analysis of stock marked changes Compu4ng sta4s4cs over streams, e.g., for decision support Opportuni4es for reac4ng in real- 4me Even with fully automated means: algorithmic trading 10

11 DBMS vs. DSMS Database management system (DBMS) Persistent data (rela4ons) Random access One- 4me queries Data stream management system (DSMS) vola4le data streams Sequen4al access Con4nuous queries (theore4cally) unlimited secondary storage limited main memory Only the current state is relevant rela4vely low update rate Likle or no 4me requirements Assumes exact data Plannable query processing Considera4on of the order of the input poten4ally extremely high update rate Real- 4me requirements Assumes outdated/inaccurate data Variable data arrival and data characteris4cs h"p://en.wikipedia.org/wiki/data- stream_management_system 11

12 Data Stream Model Stream of data items is unbounded (available memory is not) No way to store en4re stream (how could we, its (probably) not ending) To compute query results, need to devise algorithm with likle memory consump4on 12

13 Overview of Data Stream Topics Synopses: concise representa4ons of stream content tailored to tasks, e.g., coun4ng dis4nct elements usually not exact, but approxima4ons (es4mators) of true values. Windows: focus of certain recent subset of data computa4on of func4ons/joins over window(s) content 13

14 Data Stream Mining: Teasers I tell you integer numbers between 1 and N I will tell all but one number A`er N- 1 numbers I ask: which number was missing. 14

15 Data Stream Mining: Teasers (Cont d) Keep Boolean array of length N: Mark posi4on for observed number Size required: N Computa4on at end: N to find missing number Much beker: keep sum of numbers: S Missing number is N*(N+1)/2 - S 15

16 Coun4ng Occurrences Consider a stream of elements a i, a 2, a 84, a 41, a 2, a 77, a 231, a 2, a 4, a 54, How o`en does a 2 occur? How to implement? Keep counter for each id Required space #ids (=N) Not feasible of N is very large 16

17 Probabilis4c Coun4ng: Count- Min Sketch Keep 2- dim array (h, r) h hash func4ons* that map to range 0 (r- 1) h h 2 h 3 h 4 Arriving item a For each j: array[j, h j (a)]++ Cormode, Muthukrishnan (2004). An Improved Data Stream Summary: The Count- Min Sketch and its Applica4ons. J. Algorithms 55:

18 Count- Min Sketch: Coun4ng h 1 h 2 h 3 h How o`en did we see item a? h 1 (a) = 4, h 2 (a)=5, h 3 (a)=0, h 4 (a)=2 Take minimum of the corresponding values in the 2- d array. Here: 4 Es4mate is never underes4ma4ng Overes4ma4on probabilis4cally bounded 18

19 Outlook Some more basics of stream processing Few more fundamentals of data stream mining Then, window based data stream management Then going distributed: Distributed DSMS Distributed (ad- hoc) sensor networks Scalable computa4on of massive data streams (think: Hadoop for data streams) 19

Distributed Data Management Summer Semester 2013 TU Kaiserslautern

Distributed Data Management Summer Semester 2013 TU Kaiserslautern Distributed Data Management Summer Semester 2013 TU Kaiserslautern Dr.- Ing. Sebas4an Michel smichel@mmci.uni- saarland.de 1 Lecture 1 MOTIVATION AND OVERVIEW 2 Distributed Data Management What does distributed

More information

Ins+tuto Superior Técnico Technical University of Lisbon. Big Data. Bruno Lopes Catarina Moreira João Pinho

Ins+tuto Superior Técnico Technical University of Lisbon. Big Data. Bruno Lopes Catarina Moreira João Pinho Ins+tuto Superior Técnico Technical University of Lisbon Big Data Bruno Lopes Catarina Moreira João Pinho Mo#va#on 2 220 PetaBytes Of data that people create every day! 2 Mo#va#on 90 % of Data UNSTRUCTURED

More information

Data Stream Algorithms in Storm and R. Radek Maciaszek

Data Stream Algorithms in Storm and R. Radek Maciaszek Data Stream Algorithms in Storm and R Radek Maciaszek Who Am I? l Radek Maciaszek l l l l l l Consul9ng at DataMine Lab (www.dataminelab.com) - Data mining, business intelligence and data warehouse consultancy.

More information

Data Mining on Streams

Data Mining on Streams Data Mining on Streams Using Decision Trees CS 536: Machine Learning Instructor: Michael Littman TA: Yihua Wu Outline Introduction to data streams Overview of traditional DT learning ALG DT learning ALGs

More information

Keeping Pace with Big Data

Keeping Pace with Big Data - A Data Mining Perspec>ve Huan Liu, Tempe, AZ hep://www.public.asu.edu/~huanliu NSF Workshop on Big Data Analy6cs for Infrastructure and Building Resilience and Sustainability, Beijing, China Sept 19-20,

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand

More information

Ibis: Scaling Python Analy=cs on Hadoop and Impala

Ibis: Scaling Python Analy=cs on Hadoop and Impala Ibis: Scaling Python Analy=cs on Hadoop and Impala Wes McKinney, Budapest BI Forum 2015-10- 14 @wesmckinn 1 Me R&D at Cloudera Serial creator of structured data tools / user interfaces Mathema=cian MIT

More information

Scalable Machine Learning - or what to do with all that Big Data infrastructure

Scalable Machine Learning - or what to do with all that Big Data infrastructure - or what to do with all that Big Data infrastructure TU Berlin blog.mikiobraun.de Strata+Hadoop World London, 2015 1 Complex Data Analysis at Scale Click-through prediction Personalized Spam Detection

More information

Some Security Challenges of Cloud Compu6ng. Kui Ren Associate Professor Department of Computer Science and Engineering SUNY at Buffalo

Some Security Challenges of Cloud Compu6ng. Kui Ren Associate Professor Department of Computer Science and Engineering SUNY at Buffalo Some Security Challenges of Cloud Compu6ng Kui Ren Associate Professor Department of Computer Science and Engineering SUNY at Buffalo Cloud Compu6ng: the Next Big Thing Tremendous momentum ahead: Prediction

More information

Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman

Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman To motivate the Bloom-filter idea, consider a web crawler. It keeps, centrally, a list of all the URL s it has found so far. It

More information

Big Data & Scripting Part II Streaming Algorithms

Big Data & Scripting Part II Streaming Algorithms Big Data & Scripting Part II Streaming Algorithms 1, 2, a note on sampling and filtering sampling: (randomly) choose a representative subset filtering: given some criterion (e.g. membership in a set),

More information

The Library (Big) Data scien4st

The Library (Big) Data scien4st The Library (Big) Data scien4st IFLA/ALA webinar: Big Data: new roles and opportuni4es for new librarians June 15 th 2016 IFLA Big Data Special Interest Group (SIG) Wouter Klapwijk, Stellenbosch University,

More information

Big Data : The New Oil

Big Data : The New Oil Big Data : The New Oil Data Driven Innova7on: $ 15 billion by 2015 Data Driven Innova,on for Growth and Well Being: OECD Report: Interim Synthesis Report 2014 The new data- driven era of scien7fic discovery

More information

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

résumé de flux de données

résumé de flux de données résumé de flux de données CLEROT Fabrice fabrice.clerot@orange-ftgroup.com Orange Labs data streams, why bother? massive data is the talk of the town... data streams, why bother? service platform production

More information

An Open Dynamic Big Data Driven Applica3on System Toolkit

An Open Dynamic Big Data Driven Applica3on System Toolkit An Open Dynamic Big Data Driven Applica3on System Toolkit Craig C. Douglas University of Wyoming and KAUST This research is supported in part by the Na3onal Science Founda3on and King Abdullah University

More information

SQream Technologies Ltd - Confiden7al

SQream Technologies Ltd - Confiden7al SQream Technologies Ltd - Confiden7al 1 Ge#ng Big Data Done On a GPU- Based Database Ori Netzer VP Product 26- Mar- 14 Analy7cs Performance - 3 TB, 18 Billion records SQream Database 400x More Cost Efficient!

More information

Summary of Cloud Compu.ng (CC) from the paper Abovce the Clouds: A Berkeley View of Cloud Compu.ng (Feb. 2009)

Summary of Cloud Compu.ng (CC) from the paper Abovce the Clouds: A Berkeley View of Cloud Compu.ng (Feb. 2009) Summary of Cloud Compu.ng (CC) from the paper Abovce the Clouds: A Berkeley View of Cloud Compu.ng (Feb. 2009) Defini.ons (I) Cloud Compu)ng refers to both the applica)ons delivered as services over the

More information

Data Warehousing. Yeow Wei Choong Anne Laurent

Data Warehousing. Yeow Wei Choong Anne Laurent Data Warehousing Yeow Wei Choong Anne Laurent Databases Databases are developed on the IDEA that DATA is one of the cri>cal materials of the Informa>on Age Informa>on, which is created by data, becomes

More information

INCREMENTAL, APPROXIMATE DATABASE QUERIES AND UNCERTAINTY FOR EXPLORATORY VISUALIZATION. Danyel Fisher Microso0 Research

INCREMENTAL, APPROXIMATE DATABASE QUERIES AND UNCERTAINTY FOR EXPLORATORY VISUALIZATION. Danyel Fisher Microso0 Research INCREMENTAL, APPROXIMATE DATABASE QUERIES AND UNCERTAINTY FOR EXPLORATORY VISUALIZATION Danyel Fisher Microso0 Research Exploratory Visualiza9on Ini9al Query Process query Get a response Change parameters

More information

Big Data Visualiza9on

Big Data Visualiza9on Big Data Visualiza9on Dr. Steve Cutchin Associate Professor Computer Science 2012 Boise State University 1 Computer Science Department 10 Faculty + 3 Lectures + 2 New hires. 400 Undergraduates Enrolled

More information

NextGen Infrastructure for Big DATA Analytics.

NextGen Infrastructure for Big DATA Analytics. NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures

More information

Big Data and Clouds: Challenges and Opportuni5es

Big Data and Clouds: Challenges and Opportuni5es Big Data and Clouds: Challenges and Opportuni5es NIST January 15 2013 Geoffrey Fox gcf@indiana.edu h"p://www.infomall.org h"p://www.futuregrid.org School of Informa;cs and Compu;ng Digital Science Center

More information

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment

More information

Infrastructures for big data

Infrastructures for big data Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)

More information

Data Mining for Knowledge Management. Mining Data Streams

Data Mining for Knowledge Management. Mining Data Streams Data Mining for Knowledge Management Mining Data Streams Themis Palpanas University of Trento http://dit.unitn.it/~themis Spring 2007 Data Mining for Knowledge Management 1 Motivating Examples: Production

More information

Cloud Compu?ng & Big Data in Higher Educa?on and Research: African Academic Experience

Cloud Compu?ng & Big Data in Higher Educa?on and Research: African Academic Experience 3 rd SG13 Regional Workshop for Africa on ITU- T Standardiza?on Challenges for Developing Countries Working for a Connected Africa (Livingstone, Zambia, 23-24 February 2015) Cloud Compu?ng & Big Data in

More information

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #5: En-ty/Rela-onal Models- - - Part 1

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #5: En-ty/Rela-onal Models- - - Part 1 CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #5: En-ty/Rela-onal Models- - - Part 1 Announcements- - - Project Goal: design a database system applica-on with a web front-

More information

CS 378 Big Data Programming. Lecture 5 Summariza9on Pa:erns

CS 378 Big Data Programming. Lecture 5 Summariza9on Pa:erns CS 378 Big Data Programming Lecture 5 Summariza9on Pa:erns Review Assignment 2 Ques9ons? If you d like to use guava (Google collec9ons classes) pom.xml available for assignment 2 Includes dependency for

More information

Real Time Analytics for Big Data. NtiSh Nati Shalom @natishalom

Real Time Analytics for Big Data. NtiSh Nati Shalom @natishalom Real Time Analytics for Big Data A Twitter Inspired Case Study NtiSh Nati Shalom @natishalom Big Data Predictions Overthe next few years we'll see the adoption of scalable frameworks and platforms for

More information

Lake Tuggeranong College Unit Outline

Lake Tuggeranong College Unit Outline Lake Tuggeranong College Unit Outline COURSE: Informa*on Technology Unit: Computer Games Programming and Design Year: 2014 Session: Semester 2 Standard Units: 1.0 Class Code(s): 13639 (T) & 13684 (A) Teachers(s):

More information

High Performance Big Data Analy5cs powered by Unique Web Accelera5on and NoSQL. The Big Data Engine

High Performance Big Data Analy5cs powered by Unique Web Accelera5on and NoSQL. The Big Data Engine High Performance Big Data Analy5cs powered by Unique Web Accelera5on and NoSQL Foster City, CA July 31, 2012 Big Data requires new thinking The challenges and opportuni5es of Big Data Big Data requires

More information

Research at the Department of Computer Science and Software Engineering. Professor Yong Yue BEng, PhD, CEng, FIET, FIMechE 17 October 2014

Research at the Department of Computer Science and Software Engineering. Professor Yong Yue BEng, PhD, CEng, FIET, FIMechE 17 October 2014 Research at the Department of Computer Science and Software Engineering Professor Yong Yue BEng, PhD, CEng, FIET, FIMechE 17 October 2014 Research Areas Ar%ficial intelligence Robo%cs Data mining Image

More information

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management

More information

Phone Systems Buyer s Guide

Phone Systems Buyer s Guide Phone Systems Buyer s Guide Contents How Cri(cal is Communica(on to Your Business? 3 Fundamental Issues 4 Phone Systems Basic Features 6 Features for Users with Advanced Needs 10 Key Ques(ons for All Buyers

More information

Understanding Cloud Compu2ng Services. Rain in business success with amazing solu2ons in Cloud technology

Understanding Cloud Compu2ng Services. Rain in business success with amazing solu2ons in Cloud technology Understanding Cloud Compu2ng Services Rain in business success with amazing solu2ons in Cloud technology What is Cloud Compu2ng? Cloud compu2ng encompasses various services and ac2vi2es carried out over

More information

ChE-1800 H-2: Flowchart Diagrams (last updated January 13, 2013)

ChE-1800 H-2: Flowchart Diagrams (last updated January 13, 2013) ChE-1800 H-2: Flowchart Diagrams (last updated January 13, 2013) This handout contains important information for the development of flowchart diagrams Common Symbols for Algorithms The first step before

More information

CONTENTS. Introduc on 2. Undergraduate Program 4. BSC in Informa on Systems 4. Graduate Program 7. MSC in Informa on Science 7

CONTENTS. Introduc on 2. Undergraduate Program 4. BSC in Informa on Systems 4. Graduate Program 7. MSC in Informa on Science 7 1 1 2 CONTENTS Introducon 2 Undergraduate Program 4 BSC in Informaon Systems 4 Graduate Program 7 MSC in Informaon Science 7 MSC in Health Informacs 13 2 3 Introducon The School of Informaon Science at

More information

Map- reduce, Hadoop and The communica3on bo5leneck. Yoav Freund UCSD / Computer Science and Engineering

Map- reduce, Hadoop and The communica3on bo5leneck. Yoav Freund UCSD / Computer Science and Engineering Map- reduce, Hadoop and The communica3on bo5leneck Yoav Freund UCSD / Computer Science and Engineering Plan of the talk Why is Hadoop so popular? HDFS Map Reduce Word Count example using Hadoop streaming

More information

B669 Sublinear Algorithms for Big Data

B669 Sublinear Algorithms for Big Data B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 Now about the Big Data Big data is everywhere : over 2.5 petabytes of sales transactions : an index of over 19 billion web pages : over 40 billion of

More information

Data Mining. Supervised Methods. Ciro Donalek donalek@astro.caltech.edu. Ay/Bi 199ab: Methods of Computa@onal Sciences hcp://esci101.blogspot.

Data Mining. Supervised Methods. Ciro Donalek donalek@astro.caltech.edu. Ay/Bi 199ab: Methods of Computa@onal Sciences hcp://esci101.blogspot. Data Mining Supervised Methods Ciro Donalek donalek@astro.caltech.edu Supervised Methods Summary Ar@ficial Neural Networks Mul@layer Perceptron Support Vector Machines SoLwares Supervised Models: Supervised

More information

Content Distribu-on Networks (CDNs)

Content Distribu-on Networks (CDNs) Content Distribu-on Networks (CDNs) Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:0am in Architecture N101 hjp://www.cs.princeton.edu/courses/archive/spr12/cos461/ Second Half of the Course

More information

Improving Query Performance of Holistic Aggregate Queries for Real-Time Data Exploration

Improving Query Performance of Holistic Aggregate Queries for Real-Time Data Exploration July, 2014 Master Thesis Improving Query Performance of Holistic Aggregate Queries for Real-Time Data Exploration Dennis Pallett (s0167304) Faculty of Electrical Engineering, Mathematics and Computer Science

More information

978-1-4799-0913-1/14/$31.00 2014 IEEE

978-1-4799-0913-1/14/$31.00 2014 IEEE This paper introduces CMDB pa4erns as an approach to help address conceptual issues in CMDB implementa7ons and provide prac77oners with a common set of terms for useful designs. Configura7on Management

More information

Fixed Scope Offering (FSO) for Oracle SRM

Fixed Scope Offering (FSO) for Oracle SRM Fixed Scope Offering (FSO) for Oracle SRM Agenda iapps Introduc.on Execu.ve Summary Business Objec.ves Solu.on Proposal Scope - Business Process Scope Applica.on Implementa.on Methodology Time Frames Team,

More information

BIG DATA AND INVESTIGATIVE ANALYTICS

BIG DATA AND INVESTIGATIVE ANALYTICS The New Fron+er BIG DATA AND INVESTIGATIVE ANALYTICS A Publication of Infobright Table of Contents Introduc+on 3 Chapter 1: What Is Inves+ga+ve Analy+cs?. 4 Chapter 2: Top Five Requirements for Inves+ga+ve

More information

Algorithmic Aspects of Big Data. Nikhil Bansal (TU Eindhoven)

Algorithmic Aspects of Big Data. Nikhil Bansal (TU Eindhoven) Algorithmic Aspects of Big Data Nikhil Bansal (TU Eindhoven) Algorithm design Algorithm: Set of steps to solve a problem (by a computer) Studied since 1950 s. Given a problem: Find (i) best solution (ii)

More information

Internet of Things (IoT) CSE237A Introduc1on to Embedded Compu1ng

Internet of Things (IoT) CSE237A Introduc1on to Embedded Compu1ng Internet of Things (IoT) CSE237A Introduc1on to Embedded Compu1ng Outline Introduc1on to IoT Enabling technologies Open problems and future challenges Applica1ons 2 What is IoT? A phenomenon which connects

More information

Introduc)on to Map- Reduce. Vincent Leroy

Introduc)on to Map- Reduce. Vincent Leroy Introduc)on to Map- Reduce Vincent Leroy Sources Apache Hadoop Yahoo! Developer Network Hortonworks Cloudera Prac)cal Problem Solving with Hadoop and Pig Slides will be available at hgp://lig- membres.imag.fr/leroyv/

More information

B2B Offerings. Helping businesses op2mize. Infolob s amazing b2b offerings helps your company achieve maximum produc2vity

B2B Offerings. Helping businesses op2mize. Infolob s amazing b2b offerings helps your company achieve maximum produc2vity B2B Offerings Helping businesses op2mize Infolob s amazing b2b offerings helps your company achieve maximum produc2vity What is B2B? B2B is shorthand for the sales prac4ce called business- to- business

More information

HBase Schema Design. NoSQL Ma4ers, Cologne, April 2013. Lars George Director EMEA Services

HBase Schema Design. NoSQL Ma4ers, Cologne, April 2013. Lars George Director EMEA Services HBase Schema Design NoSQL Ma4ers, Cologne, April 2013 Lars George Director EMEA Services About Me Director EMEA Services @ Cloudera ConsulFng on Hadoop projects (everywhere) Apache Commi4er HBase and Whirr

More information

Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster. A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech

Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster. A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster Fang (Cherry) Liu, PhD fang.liu@oit.gatech.edu A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech Targets

More information

14.1 Rent-or-buy problem

14.1 Rent-or-buy problem CS787: Advanced Algorithms Lecture 14: Online algorithms We now shift focus to a different kind of algorithmic problem where we need to perform some optimization without knowing the input in advance. Algorithms

More information

Hadoop Internals for Oracle Developers and DBAs Exploring the HDFS and MapReduce Data Flow

Hadoop Internals for Oracle Developers and DBAs Exploring the HDFS and MapReduce Data Flow Hadoop Internals for Oracle Developers and DBAs Exploring the HDFS and MapReduce Data Flow Tanel Põder Enkitec h=p:// h=p://blog.tanelpoder.com @tanelpoder 1 Intro: About me Tanel Põder Former Oracle Database

More information

Cloud and Big Data Summer School, Stockholm, Aug. 2015 Jeffrey D. Ullman

Cloud and Big Data Summer School, Stockholm, Aug. 2015 Jeffrey D. Ullman Cloud and Big Data Summer School, Stockholm, Aug. 2015 Jeffrey D. Ullman 2 In a DBMS, input is under the control of the programming staff. SQL INSERT commands or bulk loaders. Stream management is important

More information

Theo JD Bothma Department of Informa1on Science theo.bothma@up.ac.za

Theo JD Bothma Department of Informa1on Science theo.bothma@up.ac.za Theo JD Bothma Department of Informa1on Science theo.bothma@up.ac.za Reflec1ons on the role of corpora and big data in e- lexicography in rela1on to end user informa1on needs CILC 2015 7th Interna1onal

More information

Wireless and Mobile Networks

Wireless and Mobile Networks Wireless and Mobile Networks Reading: Sec7ons 2.8 and 4.2.5 COS 461: Computer Networks Spring 2009 (MW 1:30 2:50 in COS 105) Mike Freedman Teaching Assistants: WyaO Lloyd and Jeff Terrace hop://www.cs.princeton.edu/courses/archive/spring09/cos461/

More information

Combining Sequence Databases and Data Stream Management Systems Technical Report Philipp Bichsel ETH Zurich, 2-12-2007

Combining Sequence Databases and Data Stream Management Systems Technical Report Philipp Bichsel ETH Zurich, 2-12-2007 Combining Sequence Databases and Data Stream Management Systems Technical Report Philipp Bichsel ETH Zurich, 2-12-2007 Abstract This technical report explains the differences and similarities between the

More information

LUNCH & LEARN ADWORDS (BASICS) Network808 & German Google Guy Daniel Hildebrandt

LUNCH & LEARN ADWORDS (BASICS) Network808 & German Google Guy Daniel Hildebrandt LUNCH & LEARN ADWORDS (BASICS) Network808 & German Google Guy Daniel Hildebrandt AGENDA What is the difference between SEO / SEM? What is Google AdWords? Important AdWords vocabulary Best structure for

More information

8. 網路流量管理 Network Traffic Management

8. 網路流量管理 Network Traffic Management 8. 網路流量管理 Network Traffic Management Measurement vs. Metrics end-to-end performance topology, configuration, routing, link properties state active measurements active routes active topology link bit error

More information

ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION

ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION CSE 537 Ar@ficial Intelligence Professor Anita Wasilewska GROUP 2 TEAM MEMBERS: SAEED BOOR BOOR - 110564337 SHIH- YU TSAI - 110385129 HAN LI 110168054 SOURCES

More information

The DATA Difference Targe.ng for Stronger ROI!

The DATA Difference Targe.ng for Stronger ROI! The DATA Difference Targe.ng for Stronger ROI! Presented by: Dr. John Leininger Department of Graphic Communica

More information

How To Understand Cloud Compueng

How To Understand Cloud Compueng Data Management in the Cloud Introduc)on (Lecture 1) Do one thing every day that scares you. Eleanor Roosevelt 1 Data Management in the Cloud LOGISTICS AND ORGANIZATION 2 Kris)n TuCe FAB 115-09 Personnel

More information

YOUR PROCESS MANAGEMENT AND CONTROLLING SUITE FOR MULTI-CHANNEL ONLINE MARKETING.!

YOUR PROCESS MANAGEMENT AND CONTROLLING SUITE FOR MULTI-CHANNEL ONLINE MARKETING.! YOUR PROCESS MANAGEMENT AND CONTROLLING SUITE FOR MULTI-CHANNEL ONLINE MARKETING.! AGENDA! 1. Challenges of Online Marke3ng 2. Applicata helps 3. Benefit and Pricing 4. About us! DIFFERENT STAKEHOLDER

More information

Load Balancing in Stream Processing Engines. Anis Nasir Coauthors: Gianmarco, Nicolas, David, Marco

Load Balancing in Stream Processing Engines. Anis Nasir Coauthors: Gianmarco, Nicolas, David, Marco Engines Anis Nasir Coauthors: Gianmarco, Nicolas, David, Marco Stream Processing Engines Online Machine Learning Real Time Query Processing ConCnuous ComputaCon Distributed RPC 2 Stream Processing Engines

More information

DDC Sequencing and Redundancy

DDC Sequencing and Redundancy DDC Sequencing and Redundancy Presenter Sequencing Importance of sequencing Essen%al piece to designing and delivering a successful project Defines how disparate components interact to make up a system

More information

The basic data mining algorithms introduced may be enhanced in a number of ways.

The basic data mining algorithms introduced may be enhanced in a number of ways. DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,

More information

Cloud Computing Summary and Preparation for Examination

Cloud Computing Summary and Preparation for Examination Basics of Cloud Computing Lecture 8 Cloud Computing Summary and Preparation for Examination Satish Srirama Outline Quick recap of what we have learnt as part of this course How to prepare for the examination

More information

Modeling and mining large scale biological seman0c networks using NEO4J

Modeling and mining large scale biological seman0c networks using NEO4J Modeling and mining large scale biological seman0c networks using NEO4J Junaid Gamieldien Principal Inves.gator Clinical Sequencing and Biomarker Discovery Neo4J Graph database Graph is composed of two

More information

D- t- Science, Big D- t- and Sta/s/cs: Can we all live together? Chalmers Ini-a-ve Seminar on Big Data Tuesday 25th March 2014

D- t- Science, Big D- t- and Sta/s/cs: Can we all live together? Chalmers Ini-a-ve Seminar on Big Data Tuesday 25th March 2014 D- t- Science, Big D- t- and Sta/s/cs: Can we all live together? Chalmers Ini-a-ve Seminar on Big Data Tuesday 25th March 2014 1 Big data is like teenage sex: everyone talks about it, nobody really knows

More information

Social Media Monitoring by Using Data Mining. Fuat Basık

Social Media Monitoring by Using Data Mining. Fuat Basık Social Media Monitoring by Using Data Mining Fuat Basık Presentation Plan Introduc0on Mo0va0on Stream Processing Data Set Turkish Language Pre Processing and Stemming Term Frequency and Inverse Document

More information

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m. Universal hashing No matter how we choose our hash function, it is always possible to devise a set of keys that will hash to the same slot, making the hash scheme perform poorly. To circumvent this, we

More information

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah Apache Hadoop: The Pla/orm for Big Data Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah 1 The Problems with Current Data Systems BI Reports + Interac7ve Apps RDBMS (aggregated

More information

SDN Controller Requirement

SDN Controller Requirement SDN Controller Requirement draft-gu-sdnrg-sdn-controller-requirement-00 Rong Gu (Presenter) Chen Li China Mobile Background l Public Cloud && Private Cloud in China Mobile Public Cloud (ecloud.10086.cn)

More information

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined

More information

Finding Value and Being Valuable in the Trough of Disillusionment

Finding Value and Being Valuable in the Trough of Disillusionment Finding Value and Being Valuable in the Trough of Disillusionment Jordan McIver uk.linkedin.com/in/jmcdatascience February 2016 Agenda Why you should stay awake What will it take Give me a break Agenda

More information

Distributed Aggregation in Cloud Databases. By: Aparna Tiwari tiwaria@umail.iu.edu

Distributed Aggregation in Cloud Databases. By: Aparna Tiwari tiwaria@umail.iu.edu Distributed Aggregation in Cloud Databases By: Aparna Tiwari tiwaria@umail.iu.edu ABSTRACT Data intensive applications rely heavily on aggregation functions for extraction of data according to user requirements.

More information

CS 5150 So(ware Engineering So(ware Development in Prac9ce

CS 5150 So(ware Engineering So(ware Development in Prac9ce Cornell University Compu1ng and Informa1on Science CS 5150 So(ware Engineering So(ware Development in Prac9ce William Y. Arms Overall Aim of the Course We assume that you are technically proficient. You

More information

CS 378 Big Data Programming. Lecture 2 Map- Reduce

CS 378 Big Data Programming. Lecture 2 Map- Reduce CS 378 Big Data Programming Lecture 2 Map- Reduce MapReduce Large data sets are not new What characterizes a problem suitable for MR? Most or all of the data is processed But viewed in small increments

More information

Distributed Data Management Summer Semester 2015 TU Kaiserslautern

Distributed Data Management Summer Semester 2015 TU Kaiserslautern Distributed Data Management Summer Semester 2015 TU Kaiserslautern Prof. Dr.-Ing. Sebastian Michel Databases and Information Systems Group (AG DBIS) http://dbis.informatik.uni-kl.de/ Distributed Data Management,

More information

Discrete Optimization

Discrete Optimization Discrete Optimization [Chen, Batson, Dang: Applied integer Programming] Chapter 3 and 4.1-4.3 by Johan Högdahl and Victoria Svedberg Seminar 2, 2015-03-31 Todays presentation Chapter 3 Transforms using

More information

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process

More information

Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

More information

CIS 700: algorithms for Big Data

CIS 700: algorithms for Big Data CIS 700: algorithms for Big Data Lecture 6: Graph Sketching Slides at http://grigory.us/big-data-class.html Grigory Yaroslavtsev http://grigory.us Sketching Graphs? We know how to sketch vectors: v Mv

More information

Big data coming soon... to an NSI near you. John Dunne. Central Statistics Office (CSO), Ireland John.Dunne@cso.ie

Big data coming soon... to an NSI near you. John Dunne. Central Statistics Office (CSO), Ireland John.Dunne@cso.ie Big data coming soon... to an NSI near you John Dunne Central Statistics Office (CSO), Ireland John.Dunne@cso.ie Big data is beginning to be explored and exploited to inform policy making. However these

More information

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

More information

SOAL-SOAL MICROSOFT EXCEL 1. The box on the chart that contains the name of each individual record is called the. A. cell B. title C. axis D.

SOAL-SOAL MICROSOFT EXCEL 1. The box on the chart that contains the name of each individual record is called the. A. cell B. title C. axis D. SOAL-SOAL MICROSOFT EXCEL 1. The box on the chart that contains the name of each individual record is called the. A. cell B. title C. axis D. legend 2. If you want all of the white cats grouped together

More information

CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions. Linda Shapiro Spring 2016

CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions. Linda Shapiro Spring 2016 CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions Linda Shapiro Spring 2016 Announcements Friday: Review List and go over answers to Practice Problems 2 Hash Tables: Review Aim for constant-time

More information

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 General Integer Linear Program: (ILP) min c T x Ax b x 0 integer Assumption: A, b integer The integrality condition

More information

Lecture 14: Managing Mobility. Rik Sarkar

Lecture 14: Managing Mobility. Rik Sarkar Lecture 14: Managing Mobility Rik Sarkar Communica9on with mobile nodes How do cell phones work? How can we talk to mobile nodes without the cell infrastructure? How do mobile phones work? There are large

More information

Evaluation of NoSQL databases for large-scale decentralized microblogging

Evaluation of NoSQL databases for large-scale decentralized microblogging Evaluation of NoSQL databases for large-scale decentralized microblogging Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Decentralized Systems - 2nd semester 2012/2013 Universitat Politècnica

More information

Financial, Telco, Retail, & Manufacturing: Hadoop Business Services for Industries

Financial, Telco, Retail, & Manufacturing: Hadoop Business Services for Industries Financial, Telco, Retail, & Manufacturing: Hadoop Business Services for Industries Ho Wing Leong, ASEAN 1 Cloudera company snapshot Founded Company Employees Today World Class Support Mission CriQcal 2008,

More information

Promo%ng Your OCS Business through Digital & Social Media. Presented by: John Healy

Promo%ng Your OCS Business through Digital & Social Media. Presented by: John Healy Promo%ng Your OCS Business through Digital & Social Media Presented by: John Healy Who Is John Healy? jhealy@healyco.com 25+ years in marke;ng, communica;ons & PR consul;ng Specialty: Helping clients balance

More information

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,

More information

Real Time Analy:cs for Big Data Lessons Learned from Facebook

Real Time Analy:cs for Big Data Lessons Learned from Facebook SINGLE PLATFORM. COMPLETE SCALABILITY. Real Time Analy:cs for Big Data Lessons Learned from Facebook @uri1803 Head of Product GigaSpaces About Me MTBK Junky A Proud Dad Technology addict Head of Product

More information

Driving MySQL to Big Data Scale. Thomas Hazel Founder, Chief Scien@st thomas@deepis.com

Driving MySQL to Big Data Scale. Thomas Hazel Founder, Chief Scien@st thomas@deepis.com Driving MySQL to Big Data Scale Thomas Hazel Founder, Chief Scien@st thomas@deepis.com Millions to Billions to Trillions Agenda Driving MySQL to Big Data Scale Market Trends Hardware Trends Current Computer

More information

How To Make A Cloud Based Computer Power Available To A Computer (For Free)

How To Make A Cloud Based Computer Power Available To A Computer (For Free) Cloud Compu)ng Adam Belloum Ins)tute of Informa)cs University of Amsterdam a.s.z.belloum@uva.nl High Performance compu)ng Curriculum, Jan 2015 hgp://www.hpc.uva.nl/ UvA- SURFsara What is Cloud Compu)ng?

More information

Aurora: a new model and architecture for data stream management

Aurora: a new model and architecture for data stream management Aurora: a new model and architecture for data stream management Daniel J. Abadi 1, Don Carney 2, Ugur Cetintemel 2, Mitch Cherniack 1, Christian Convey 2, Sangdon Lee 2, Michael Stonebraker 3, Nesime Tatbul

More information

CS 378 Big Data Programming

CS 378 Big Data Programming CS 378 Big Data Programming Lecture 2 Map- Reduce CS 378 - Fall 2015 Big Data Programming 1 MapReduce Large data sets are not new What characterizes a problem suitable for MR? Most or all of the data is

More information