B490 Mining the Big Data. 0 Introduction

Size: px
Start display at page:

Download "B490 Mining the Big Data. 0 Introduction"

Transcription

1 B490 Mining the Big Data 0 Introduction Qin Zhang 1-1

2 Data Mining What is Data Mining? A definition : Discovery of useful, possibly unexpected, patterns in data. 2-1

3 Data Mining What is Data Mining? A definition : Discovery of useful, possibly unexpected, patterns in data. I don t think this is practical, until a day machines have intelligence. (You can have different opinions) 2-2

4 Data Mining What is Data Mining? A definition : Discovery of useful, possibly unexpected, patterns in data. I don t think this is practical, until a day machines have intelligence. (You can have different opinions) I think, most of the time, people just mean to Compute some functions defined on the data (Efficient algorithms). Fit data into some concrete models (Statistical modeling). 2-3

5 In this course, we will talk about... In this course we will focus on efficient algorithms. In particular, we will discuss Finding similar items 3-1

6 In this course, we will talk about... In this course we will focus on efficient algorithms. In particular, we will discuss Finding similar items Mining frequent items 3-2

7 In this course, we will talk about... In this course we will focus on efficient algorithms. In particular, we will discuss Finding similar items Mining frequent items Clustering (aggregate similar items) 3-3

8 In this course, we will talk about... In this course we will focus on efficient algorithms. In particular, we will discuss Finding similar items Mining frequent items Link analysis (explore structure in large graphs) Clustering (aggregate similar items) 3-4

9 4-1 Big Data

10 Big Data Big data is everywhere : over 2.5 petabytes of sales transactions : an index of over 19 billion web pages : over 40 billion of pictures

11 Big Data Big data is everywhere : over 2.5 petabytes of sales transactions : an index of over 19 billion web pages : over 40 billion of pictures... Magazine covers Nature 06 Nature 08 CACM 08 Economist

12 Source and Challenge Source Retailer databases: Amazon, Walmart Logistics, financial & health data: Stock prices Social network: Facebook, twitter Pictures by mobile devices: iphone Internet traffic: IP addresses New forms of scientific data: Large Synoptic Survey Telescope 6-1

13 Source and Challenge Source Retailer databases: Amazon, Walmart Logistics, financial & health data: Stock prices Social network: Facebook, twitter Pictures by mobile devices: iphone Internet traffic: IP addresses New forms of scientific data: Large Synoptic Survey Telescope Challenge Volume Velocity Variety (Documents, Stock records, Personal profiles, Photographs, Audio & Video, 3D models, Location data,... ) 6-2

14 Source and Challenge Source Retailer databases: Amazon, Walmart Logistics, financial & health data: Stock prices Social network: Facebook, twitter Pictures by mobile devices: iphone Internet traffic: IP addresses New forms of scientific data: Large Synoptic Survey Telescope Challenge Volume Velocity } The focus of algorithm design Variety (Documents, Stock records, Personal profiles, Photographs, Audio & Video, 3D models, Location data,... ) 6-3

15 What does Big Data Really Mean? We don t define Big Data in terms of TB, PB, EB,... The data is too big to fit in memory. What can we do? 7-1

16 What does Big Data Really Mean? We don t define Big Data in terms of TB, PB, EB,... The data is too big to fit in memory. What can we do? Processing one by one as they come, and throw some of them away on the fly. 7-2

17 What does Big Data Really Mean? We don t define Big Data in terms of TB, PB, EB,... The data is too big to fit in memory. What can we do? Processing one by one as they come, and throw some of them away on the fly. Store in multiple machines, which collaborate via communication 7-3

18 What does Big Data Really Mean? We don t define Big Data in terms of TB, PB, EB,... The data is too big to fit in memory. What can we do? Processing one by one as they come, and throw some of them away on the fly. Store in multiple machines, which collaborate via communication RAM model does not fit RAM A processor and an infinite size memory Probing each cell of the memory has a unit cost CPU 7-4

19 8-1 Popular Models for Big Data

20 Data Streams The data stream model (Alon, Matias & Szegedy 1996) RAM CPU Widely used: Stanford Stream, Aurora, Telegraph, NiagaraCQ

21 Data Streams The data stream model (Alon, Matias & Szegedy 1996) RAM Applications Internet Router. Packets limited space Router CPU Widely used: Stanford Stream, Aurora, Telegraph, NiagaraCQ The router wants to maintain some statistics on data. E.g., want to detect anomalies for security. Stock data, ad auction, flight logs on tapes, etc.

22 Difficulty: See and forget! Game 1: A sequence of numbers 10-1

23 Difficulty: See and forget! Game 1: A sequence of numbers

24 Difficulty: See and forget! Game 1: A sequence of numbers

25 Difficulty: See and forget! Game 1: A sequence of numbers

26 Difficulty: See and forget! Game 1: A sequence of numbers

27 Difficulty: See and forget! Game 1: A sequence of numbers

28 Difficulty: See and forget! Game 1: A sequence of numbers

29 Difficulty: See and forget! Game 1: A sequence of numbers

30 Difficulty: See and forget! Game 1: A sequence of numbers

31 Difficulty: See and forget! Game 1: A sequence of numbers

32 Difficulty: See and forget! Game 1: A sequence of numbers

33 Difficulty: See and forget! Game 1: A sequence of numbers

34 Difficulty: See and forget! Game 1: A sequence of numbers Q: What s the median? 10-13

35 Difficulty: See and forget! Game 1: A sequence of numbers Q: What s the median? A:

36 Difficulty: See and forget! Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul 11-1

37 Difficulty: See and forget! Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Alice and Bob become friends 11-2

38 Difficulty: See and forget! Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Carol and Eva become friends 11-3

39 Difficulty: See and forget! Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Eva and Bob become friends 11-4

40 Difficulty: See and forget! Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Dave and Paul become friends 11-5

41 Difficulty: See and forget! Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Alice and Paul become friends 11-6

42 Difficulty: See and forget! Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Eva and Bob unfriends 11-7

43 Difficulty: See and forget! Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Alice and Dave become friends 11-8

44 Difficulty: See and forget! Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Bob and Paul become friends 11-9

45 Difficulty: See and forget! Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Dave and Paul unfriends 11-10

46 Difficulty: See and forget! Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Dave and Carol become friends 11-11

47 Difficulty: See and forget! Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Q: Are Eva and Bob connected by friends? 11-12

48 Difficulty: See and forget! Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Q: Are Eva and Bob connected by friends? A: YES. Eva Carol Dave Alice Bob 11-13

49 Difficulty: See and forget! Game 1: A sequence of numbers Q: What s the median? A: 33 Game 2: Relationships between Alice, Bob, Carol, Dave, Eva and Paul Q: Are Eva and Bob connected by friends? A: YES. Eva Carol Dave Alice Bob Have to allow approx/randomization given a small memory

50 MapReduce The MapReduce model (Dean & Ghemawat 2004) Input Output Map Shuffle Reduce Standard model in industry for massive data computation E.g., Hadoop. 12-1

51 MapReduce The MapReduce model (Dean & Ghemawat 2004) For each value x i, x i {(key 1, v 1 ), (key 2, v 2 ),...} {(key 1, v 1 ), (key 1, v 2 ),...} {y 1, y 2,...} Input Output Map Shuffle Aggregate keys Reduce Standard model in industry for massive data computation E.g., Hadoop. 12-2

52 MapReduce The MapReduce model (Dean & Ghemawat 2004) For each value x i, x i {(key 1, v 1 ), (key 2, v 2 ),...} {(key 1, v 1 ), (key 1, v 2 ),...} {y 1, y 2,...} Input Output Goal Map Shuffle Aggregate keys Reduce Minimize (1) total communication, (2) # rounds. Standard model in industry for massive data computation E.g., Hadoop. 12-3

53 ActiveDHT The ActiveDHT model (Bahmani, Chowdhury & Goel 2010) Update (key, a t ) Query (key) Used in Yahoo! S4 & Twitter Storm responsible for keys with hash = 4, 5 responsible for keys with hash = 6,

54 Tentative course plan 14-1 Part 0 : Introductions Part 1 : Finding Similar Items Jaccard Similarty and Min-Hashing Locality Sensitive Hashing (LSH) and Distances Implementing LSH in ActiveDHT Part 2 : Clustering Hierachical Clustering Assignment-based Clustering (k-center, k-mean, k-median) Spectural Clustering Part 3 : Mining Frequent Items Finding Frequent Itemsets Finding Frequent Items in Data Stream Part 4 : Link Analysis Markov Chain Basics Webpage Similarity and PageRank Implementing PageRank in MapReduce

55 Resources There is no official textbook for the class. Main reference book: Mining Massive Data Sets by Anand Rajaraman and Jeff Ullman Background on Randomized Algorithms: Probability and Computing by Mitzenmacher and Upfal 15-1

56 Instructors Instructor: Qin Zhang Office hours: By appointment Assitant Instructor: Prasanth Velamala Office hours: Thursdays, 2pm-3pm 16-1

57 Grading Assignments 50% : There will be several homework assignments. Solutions should be typeset in LaTeX (highly recommended) or Word. Project 50% : The project consists of three components: 1. Write a proposal. 2. Write a report. 3. Make a presentation. (Details will be posted online) Use A, B,... for each item (assignments or projects). Final grade will be a weighted average (according to XX%). 17-1

58 Grading 17-2 Assignments 50% : There will be several homework assignments. Solutions should be typeset in LaTeX (highly recommended) or Word. Project 50% : The project consists of three components: 1. Write a proposal. 2. Write a report. 3. Make a presentation. (Details will be posted online) Use A, B,... for each item (assignments or projects). Final grade will be a weighted average (according to XX%). Most important thing: Learn something about models / algorithmic techniques / theoretical analysis for Mining the Big Data.

59 LaTeX LaTeX: Highly recommended tools for assignments/reports 1. Read wiki articles: 2. Find a good LaTeX editor. 3. Learn how to use it, e.g., read A Not So Short Introduction to LaTeX 2e (Google it) 18-1

60 Prerequisites One is expected to know: Basics on algorithm design and analysis + probability + programming. e.g., have taken (Math) M365 Introduction to Probability and Statistics, (Math) M301 Linear Algebra and Applications, (CS) C241 Discrete Structures for Computer Science, (CS) B403 Introduction to Algorithm Design and Analysis, or equivalent courses. I will NOT start with things like big-o notations, the definitions of random variables and expectation. But, please always ask at any time if you don t understand sth. 19-1

61 Possible project topics 20-1 Part 1 : Finding Similar Items Locality Sensitive Hashing: Given a dictionary of a large number of documents (or other objects) and a set of query docs. For each query doc, find all docs in the dictionary that are similar. Compare LSH with other methods that you can think of (e.g., the trivial one: compute the query with each of the docs in the dictionary), in terms of the running time. Part 2 : Clustering Assignment-based Clustering (k-center, k-mean, k-median): Select clustering algorithms taught in class, and run them on large data sets. One can also try to compare it with the hierarchical clustering. Part 3 : Mining Frequent Items Finding Frequent Itemsets: Run the A-priori algorithms on large data sets to find frequent itemsets. Finding Frequent Items in Data Stream: Implement streaming algorithms taught in class, and run them on large data sets to find frequent items. Compare the results with the true frequent items/itemsets.

62 21-1 Basics on probability

63 Approximation and Randomization Approximation Return ˆf (A) instead of f (A) where f (A) ˆf (A) ɛf (A) is a (1 + ɛ)-approximation of f (A). 22-1

64 Approximation and Randomization Approximation Return ˆf (A) instead of f (A) where f (A) ˆf (A) ɛf (A) is a (1 + ɛ)-approximation of f (A). Randomization Return ˆf (A) instead of f (A) where [ Pr f (A) ˆf ] (A) ɛf (A) 1 δ is a (1 + ɛ, δ)-approximation of f (A). 22-2

65 Markov and Chebyshev inequalities Markov Inequality Let X 0 be a random variable. Then for all a > 0, Pr[X a] E[X ] a. 23-1

66 Markov and Chebyshev inequalities Markov Inequality Let X 0 be a random variable. Then for all a > 0, Pr[X a] E[X ] a. Chebyshev s Inequality Let X 0 be a random variable. Then for all a > 0, Pr[ X E[X ] a] Var[X ] a

67 Application: Birthday Paradox Birthday Paradox In a set of k randomly chosen people, what is the probability that there exists at least a pair of them will have the same birthday? Assuming each person s birthday is randomly chosen from Jan. 1 to Dec

68 Application: Birthday Paradox Birthday Paradox In a set of k randomly chosen people, what is the probability that there exists at least a pair of them will have the same birthday? Assuming each person s birthday is randomly chosen from Jan. 1 to Dec. 31. Take 1: For any pair of people, the probability that they have the same birthday is 1/n. For k people, we have ( k 2) pairs of people. The probability that none of them have the same birthday is (1 1/n) (k 2). Thus the answer is 1 (1 1/n) ( k 2). 24-2

69 Application: Birthday Paradox Birthday Paradox In a set of k randomly chosen people, what is the probability that there exists at least a pair of them will have the same birthday? Assuming each person s birthday is randomly chosen from Jan. 1 to Dec. 31. Take 1: For any pair of people, the probability that they have the same birthday is 1/n. For k people, we have ( k 2) pairs of people. The probability that none of them have the same birthday is (1 1/n) (k 2). Thus the answer is 1 (1 1/n) ( k 2). Wrong! 24-3

70 Application: Birthday Paradox Birthday Paradox In a set of k randomly chosen people, what is the probability that there exists at least a pair of them will have the same birthday? Assuming each person s birthday is randomly chosen from Jan. 1 to Dec. 31. Take 1: For any pair of people, the probability that they have the same birthday is 1/n. For k people, we have ( k 2) pairs of people. The probability that none of them have the same birthday is (1 1/n) (k 2). Thus the answer is 1 (1 1/n) ( k 2). Take 2: 1 ( n 0 n ) ( n 1 n Pr[exists collision] k 2 /(2n) ) ( n 2 n )... ( n (k 1) n ) Wrong! 24-4

71 Application: Coupon Collector Coupon Collector Suppose that each of box of cereal contains one of n different coupons. Once you obtain one of every type of coupon, you can send in for a prize. Assuming that the coupon in each box is chosen independently and uniformly at random from the n possibilities, how many boxes of cereal must you buy before you obtain at least one of every type of coupon? 25-1

72 Application: Coupon Collector Coupon Collector Suppose that each of box of cereal contains one of n different coupons. Once you obtain one of every type of coupon, you can send in for a prize. Assuming that the coupon in each box is chosen independently and uniformly at random from the n possibilities, how many boxes of cereal must you buy before you obtain at least one of every type of coupon? Analysis (on board) 25-2

73 The Union Bound The Union Bound Consider t possible dependent random events X 1,..., X t. The probability that all events occur is at least 1 t (1 Pr[X i occurs]) i=1 26-1

74 Summary for the introduction We have discussed Big Data and Data Mining We have introduced three popular models for modern computation. We have talked about the course plan and assessment. We have covered some basics on probability 27-1

75 28-1 Thank you!

B669 Sublinear Algorithms for Big Data

B669 Sublinear Algorithms for Big Data B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 Now about the Big Data Big data is everywhere : over 2.5 petabytes of sales transactions : an index of over 19 billion web pages : over 40 billion of

More information

B561 Advanced Database Concepts. 0 Introduction. Qin Zhang 1-1

B561 Advanced Database Concepts. 0 Introduction. Qin Zhang 1-1 B561 Advanced Database Concepts 0 Introduction Qin Zhang 1-1 Self introduction: my research interests Algorithms for Big Data: streaming/sketching algorithms; algorithms on distributed data; I/O-efficient

More information

CAS CS 565, Data Mining

CAS CS 565, Data Mining CAS CS 565, Data Mining Course logistics Course webpage: http://www.cs.bu.edu/~evimaria/cs565-10.html Schedule: Mon Wed, 4-5:30 Instructor: Evimaria Terzi, [email protected] Office hours: Mon 2:30-4pm,

More information

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Introduction to Big Data! with Apache Spark UC#BERKELEY# Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"

More information

CS 207 - Data Science and Visualization Spring 2016

CS 207 - Data Science and Visualization Spring 2016 CS 207 - Data Science and Visualization Spring 2016 Professor: Sorelle Friedler [email protected] An introduction to techniques for the automated and human-assisted analysis of data sets. These

More information

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business Instructor: Kunpeng Zhang ([email protected]) Lecture-Discussions:

More information

Big Data Analytics. Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs

Big Data Analytics. Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs 1 Big Data Analytics Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs Montevideo, 22 nd November 4 th December, 2015 INFORMATIQUE

More information

CSE 427 CLOUD COMPUTING WITH BIG DATA APPLICATIONS

CSE 427 CLOUD COMPUTING WITH BIG DATA APPLICATIONS CSE 427 CLOUD COMPUTING WITH BIG DATA APPLICATIONS COURSE OVERVIEW & STRUCTURE Fall 2015 Marion Neumann ABOUT Marion Neumann email: m dot neumann at wustl dot edu office: Jolley Hall 403 office hours:

More information

Algorithmic Aspects of Big Data. Nikhil Bansal (TU Eindhoven)

Algorithmic Aspects of Big Data. Nikhil Bansal (TU Eindhoven) Algorithmic Aspects of Big Data Nikhil Bansal (TU Eindhoven) Algorithm design Algorithm: Set of steps to solve a problem (by a computer) Studied since 1950 s. Given a problem: Find (i) best solution (ii)

More information

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

More information

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

Four Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014

Four Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014 Four Orders of Magnitude: Running Large Scale Accumulo Clusters Aaron Cordova Accumulo Summit, June 2014 Scale, Security, Schema Scale to scale 1 - (vt) to change the size of something let s scale the

More information

Clustering Big Data. Efficient Data Mining Technologies. J Singh and Teresa Brooks. June 4, 2015

Clustering Big Data. Efficient Data Mining Technologies. J Singh and Teresa Brooks. June 4, 2015 Clustering Big Data Efficient Data Mining Technologies J Singh and Teresa Brooks June 4, 2015 Hello Bulgaria (http://hello.bg/) A website with thousands of pages... Some pages identical to other pages

More information

Extreme Computing. Big Data. Stratis Viglas. School of Informatics University of Edinburgh [email protected]. Stratis Viglas Extreme Computing 1

Extreme Computing. Big Data. Stratis Viglas. School of Informatics University of Edinburgh sviglas@inf.ed.ac.uk. Stratis Viglas Extreme Computing 1 Extreme Computing Big Data Stratis Viglas School of Informatics University of Edinburgh [email protected] Stratis Viglas Extreme Computing 1 Petabyte Age Big Data Challenges Stratis Viglas Extreme Computing

More information

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Application and practice of parallel cloud computing in ISP Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Outline Mass data management problem Applications of parallel cloud computing in ISPs

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014 Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)

More information

Large-Scale Data Processing

Large-Scale Data Processing Large-Scale Data Processing Eiko Yoneki [email protected] http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase

More information

Teaching Scheme Credits Assigned Course Code Course Hrs./Week. BEITC802 Big Data 04 02 --- 04 01 --- 05 Analytics. Theory Marks

Teaching Scheme Credits Assigned Course Code Course Hrs./Week. BEITC802 Big Data 04 02 --- 04 01 --- 05 Analytics. Theory Marks Teaching Scheme Credits Assigned Course Code Course Hrs./Week Name Theory Practical Tutorial Theory Practical/Oral Tutorial Tota l BEITC802 Big Data 04 02 --- 04 01 --- 05 Analytics Examination Scheme

More information

Big Data Systems CS 5965/6965 FALL 2015

Big Data Systems CS 5965/6965 FALL 2015 Big Data Systems CS 5965/6965 FALL 2015 Today General course overview Expectations from this course Q&A Introduction to Big Data Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2015.html

More information

Infrastructures for big data

Infrastructures for big data Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)

More information

Estimating PageRank Values of Wikipedia Articles using MapReduce

Estimating PageRank Values of Wikipedia Articles using MapReduce Estimating PageRank Values of Wikipedia Articles using MapReduce Due: Sept. 30 Wednesday 5:00PM Submission: via Canvas, individual submission Instructor: Sangmi Pallickara Web page: http://www.cs.colostate.edu/~cs535/assignments.html

More information

Big Data and Analytics: Challenges and Opportunities

Big Data and Analytics: Challenges and Opportunities Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif

More information

Big Data Analytics Process & Building Blocks

Big Data Analytics Process & Building Blocks Big Data Analytics Process & Building Blocks Duen Horng (Polo) Chau Georgia Tech CSE 6242 A / CS 4803 DVA Jan 10, 2013 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos

More information

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of

More information

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address

More information

Management & Analysis of Big Data in Zenith Team

Management & Analysis of Big Data in Zenith Team Management & Analysis of Big Data in Zenith Team Zenith Team, INRIA & LIRMM Outline Introduction to MapReduce Dealing with Data Skew in Big Data Processing Data Partitioning for MapReduce Frequent Sequence

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

CPS 216: Advanced Database Systems (Data-intensive Computing Systems) Shivnath Babu

CPS 216: Advanced Database Systems (Data-intensive Computing Systems) Shivnath Babu CPS 216: Advanced Database Systems (Data-intensive Computing Systems) Shivnath Babu A Brief History Relational database management systems Time 1975-1985 1985-1995 1995-2005 Let us first see what a relational

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 36 Outline

More information

Parallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data

Parallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data Case Study 2: Document Retrieval Parallel Programming Map-Reduce Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 31 st, 2013 Carlos Guestrin

More information

Introduction to DISC and Hadoop

Introduction to DISC and Hadoop Introduction to DISC and Hadoop Alice E. Fischer April 24, 2009 Alice E. Fischer DISC... 1/20 1 2 History Hadoop provides a three-layer paradigm Alice E. Fischer DISC... 2/20 Parallel Computing Past and

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

This exam contains 13 pages (including this cover page) and 18 questions. Check to see if any pages are missing.

This exam contains 13 pages (including this cover page) and 18 questions. Check to see if any pages are missing. Big Data Processing 2013-2014 Q2 April 7, 2014 (Resit) Lecturer: Claudia Hauff Time Limit: 180 Minutes Name: Answer the questions in the spaces provided on this exam. If you run out of room for an answer,

More information

B490 Mining the Big Data. 2 Clustering

B490 Mining the Big Data. 2 Clustering B490 Mining the Big Data 2 Clustering Qin Zhang 1-1 Motivations Group together similar documents/webpages/images/people/proteins/products One of the most important problems in machine learning, pattern

More information

How To Learn To Use Big Data

How To Learn To Use Big Data Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate

More information

Machine Learning Big Data using Map Reduce

Machine Learning Big Data using Map Reduce Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories

More information

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] [email protected]

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] [email protected] Hadoop, Why? Need to process huge datasets on large clusters of computers

More information

MapReduce and Hadoop Distributed File System

MapReduce and Hadoop Distributed File System MapReduce and Hadoop Distributed File System 1 B. RAMAMURTHY Contact: Dr. Bina Ramamurthy CSE Department University at Buffalo (SUNY) [email protected] http://www.cse.buffalo.edu/faculty/bina Partially

More information

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置

More information

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,

More information

Massive Cloud Auditing using Data Mining on Hadoop

Massive Cloud Auditing using Data Mining on Hadoop Massive Cloud Auditing using Data Mining on Hadoop Prof. Sachin Shetty CyberBAT Team, AFRL/RIGD AFRL VFRP Tennessee State University Outline Massive Cloud Auditing Traffic Characterization Distributed

More information

Big Data Analytics Hadoop and Spark

Big Data Analytics Hadoop and Spark Big Data Analytics Hadoop and Spark Shelly Garion, Ph.D. IBM Research Haifa 1 What is Big Data? 2 What is Big Data? Big data usually includes data sets with sizes beyond the ability of commonly used software

More information

MapReduce: Algorithm Design Patterns

MapReduce: Algorithm Design Patterns Designing Algorithms for MapReduce MapReduce: Algorithm Design Patterns Need to adapt to a restricted model of computation Goals Scalability: adding machines will make the algo run faster Efficiency: resources

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15 Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 15 Big Data Management V (Big-data Analytics / Map-Reduce) Chapter 16 and 19: Abideboul et. Al. Demetris

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

What happens when Big Data and Master Data come together?

What happens when Big Data and Master Data come together? What happens when Big Data and Master Data come together? Jeremy Pritchard Master Data Management fgdd 1 What is Master Data? Master data is data that is shared by multiple computer systems. The Information

More information

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Agenda» Overview» What is Big Data?» Accelerates advances in computer & technologies» Revolutionizes data measurement»

More information

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Is a Data Scientist the New Quant? Stuart Kozola MathWorks Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by

More information

Web intelligence on Big Data in Today s Life. Web intelligence on Big Data in Today s Life,

Web intelligence on Big Data in Today s Life. Web intelligence on Big Data in Today s Life, Web intelligence on Big Data in Today s Life Updesh Kumar Jaiswal I.M.S Engineering College,Ghaziabad, U.P, India [email protected] Abhishek Gupta I.M.S. Engineering College, Ghaziabad, U.P, India [email protected]

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

Big Graph Processing: Some Background

Big Graph Processing: Some Background Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs

More information

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER Hur hanterar vi utmaningar inom området - Big Data Jan Östling Enterprise Technologies Intel Corporation, NER Legal Disclaimers All products, computer systems, dates, and figures specified are preliminary

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait CSC590: Selected Topics BIG DATA & DATA MINING Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait Agenda Introduction What is Big Data Why Big Data? Characteristics of Big Data Applications of Big Data Problems

More information

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the

More information

Big Data a threat or a chance?

Big Data a threat or a chance? Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but

More information

INTRO TO BIG DATA. Djoerd Hiemstra. http://www.cs.utwente.nl/~hiemstra. Big Data in Clinical Medicinel, 30 June 2014

INTRO TO BIG DATA. Djoerd Hiemstra. http://www.cs.utwente.nl/~hiemstra. Big Data in Clinical Medicinel, 30 June 2014 INTRO TO BIG DATA Big Data in Clinical Medicinel, 30 June 2014 Djoerd Hiemstra http://www.cs.utwente.nl/~hiemstra WHY BIG DATA? 2 Source: http://en.wikipedia.org/wiki/mount_everest 3 19 May 2012: 234 people

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])

More information

Clarity High School Student Survey

Clarity High School Student Survey Clarity High School Student Survey Instructions Take 10 minutes to help your school with technology in the classroom. This is an anonymous survey regarding your technology use. It will take approximately

More information

Big Data & Scripting Part II Streaming Algorithms

Big Data & Scripting Part II Streaming Algorithms Big Data & Scripting Part II Streaming Algorithms 1, 2, a note on sampling and filtering sampling: (randomly) choose a representative subset filtering: given some criterion (e.g. membership in a set),

More information

Cleveland State University

Cleveland State University Cleveland State University CIS 695 Big Data Processing and Data Analytics (3-0-3) 2016 Section 51 Class Nbr. 5493. Tues, Thur TBA Prerequisites: CIS 505 and CIS 530. CIS 612, CIS 660 Preferred. Instructor:

More information

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

CSCI-599 DATA MINING AND STATISTICAL INFERENCE CSCI-599 DATA MINING AND STATISTICAL INFERENCE Course Information Course ID and title: CSCI-599 Data Mining and Statistical Inference Semester and day/time/location: Spring 2013/ Mon/Wed 3:30-4:50pm Instructor:

More information

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

More information

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,

More information

Big Data and Industrial Internet

Big Data and Industrial Internet Big Data and Industrial Internet Keijo Heljanko Department of Computer Science and Helsinki Institute for Information Technology HIIT School of Science, Aalto University [email protected] 16.6-2015

More information

Big Data and Apache Hadoop s MapReduce

Big Data and Apache Hadoop s MapReduce Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23

More information

Hadoop and Map-reduce computing

Hadoop and Map-reduce computing Hadoop and Map-reduce computing 1 Introduction This activity contains a great deal of background information and detailed instructions so that you can refer to it later for further activities and homework.

More information

BIG DATA, MAPREDUCE & HADOOP

BIG DATA, MAPREDUCE & HADOOP BIG, MAPREDUCE & HADOOP LARGE SCALE DISTRIBUTED SYSTEMS By Jean-Pierre Lozi A tutorial for the LSDS class LARGE SCALE DISTRIBUTED SYSTEMS BIG, MAPREDUCE & HADOOP 1 OBJECTIVES OF THIS LAB SESSION The LSDS

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

Information Processing, Big Data, and the Cloud

Information Processing, Big Data, and the Cloud Information Processing, Big Data, and the Cloud James Horey Computational Sciences & Engineering Oak Ridge National Laboratory Fall Creek Falls 2010 Information Processing Systems Model Parameters Data-intensive

More information

Intro to Map/Reduce a.k.a. Hadoop

Intro to Map/Reduce a.k.a. Hadoop Intro to Map/Reduce a.k.a. Hadoop Based on: Mining of Massive Datasets by Ra jaraman and Ullman, Cambridge University Press, 2011 Data Mining for the masses by North, Global Text Project, 2012 Slides by

More information

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Analyst @ Expedia

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Analyst @ Expedia The Impact of Big Data on Classic Machine Learning Algorithms Thomas Jensen, Senior Business Analyst @ Expedia Who am I? Senior Business Analyst @ Expedia Working within the competitive intelligence unit

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

A Performance Analysis of Distributed Indexing using Terrier

A Performance Analysis of Distributed Indexing using Terrier A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search

More information

Clarity Middle School Survey

Clarity Middle School Survey Clarity Middle School Survey Instructions Take 10 minutes to help your school with technology in the classroom. This is an anonymous survey regarding your technology use. It will take approximately 10

More information

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage If every image made and every word written from the earliest stirring of civilization

More information

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data

More information

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012 MapReduce and Hadoop Aaron Birkland Cornell Center for Advanced Computing January 2012 Motivation Simple programming model for Big Data Distributed, parallel but hides this Established success at petabyte

More information

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015 Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015 Lecture: MWF: 1:00-1:50pm, GEOLOGY 4645 Instructor: Mihai

More information

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team) Big Data Management in the Clouds Alexandru Costan IRISA / INSA Rennes (KerData team) Cumulo NumBio 2015, Aussois, June 4, 2015 After this talk Realize the potential: Data vs. Big Data Understand why we

More information

Large-Scale Test Mining

Large-Scale Test Mining Large-Scale Test Mining SIAM Conference on Data Mining Text Mining 2010 Alan Ratner Northrop Grumman Information Systems NORTHROP GRUMMAN PRIVATE / PROPRIETARY LEVEL I Aim Identify topic and language/script/coding

More information

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze

More information

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model

More information

Outline. What is Big data and where they come from? How we deal with Big data?

Outline. What is Big data and where they come from? How we deal with Big data? What is Big Data Outline What is Big data and where they come from? How we deal with Big data? Big Data Everywhere! As a human, we generate a lot of data during our everyday activity. When you buy something,

More information

Driving Better Marketing Results with Big Data and Analytics David Corrigan, IBM, Director of Product Marketing

Driving Better Marketing Results with Big Data and Analytics David Corrigan, IBM, Director of Product Marketing Driving Better Marketing Results with Big Data and Analytics David Corrigan, IBM, Director of Product Marketing Optimizing Marketing with Big Data and Analytics Leverage Social Media Datacentric Marketing

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Chapter 6. Foundations of Business Intelligence: Databases and Information Management Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information