Sublinear Algorithms for Big Data. Part 4: Random Topics


 Malcolm Kelly
 1 years ago
 Views:
Transcription
1 Sublinear Algorithms for Big Data Part : Random Topics Qin Zhang 11
2 Topic 3: Random sampling in distributed data streams (based on a paper with ormode, Muthukrishnan and Yi, PODS 10, JAM 12) 21
3 Distributed streaming Motivated by database/networking applications Adaptive filters [Olston, Jiang, Widom, SIGMOD 03] A generic geometric approach [Scharfman et al. SIGMOD 06] Prediction models [ormode, Garofalakis, Muthukrishnan, Rastogi, SIGMOD 05] sensor networks network monitoring 31 environment monitoring cloud computing
4 Reservoir sampling [Waterman??; Vitter 85] Maintain a (uniform) sample (w/o replacement) of size s from a stream of n items Every subset of size s has equal probability to be the sample 1
5 Reservoir sampling [Waterman??; Vitter 85] Maintain a (uniform) sample (w/o replacement) of size s from a stream of n items Every subset of size s has equal probability to be the sample When the ith item arrives With probability s/i, use it to replace an item in the current sample chosen uniformly at ranfom With probability 1 s/i, throw it away 2
6 Reservoir sampling from distributed streams When k = 1, reservoir sampling has cost Θ(s log n) When k 2, reservoir sampling has cost O(n) because it s costly to track i S k S 3 S S 1 time
7 Reservoir sampling from distributed streams When k = 1, reservoir sampling has cost Θ(s log n) When k 2, reservoir sampling has cost O(n) because it s costly to track i S k Tracking i approximately? Sampling won t be uniform S 3 S S 1 time
8 Reservoir sampling from distributed streams When k = 1, reservoir sampling has cost Θ(s log n) When k 2, reservoir sampling has cost O(n) because it s costly to track i S k S 3 S 2 Tracking i approximately? Sampling won t be uniform Key observation: We don t have to know the size of the population in order to sample! 53 S 1 time
9 Basic idea: binary Bernoulli sampling 61
10 Basic idea: binary Bernoulli sampling
11 Basic idea: binary Bernoulli sampling onditioned upon a row having s active items, we can draw a sample from the active items 63
12 Basic idea: binary Bernoulli sampling onditioned upon a row having s active items, we can draw a sample from the active items The could maintain a Bernoulli sample of size between s and O(s) 6
13 Random sampling Algorithm [with ormode, Muthu & Yi, PODS 10 JAM 11] Initialize i = 0 sites S 1 S 2 S 3 Sites send in every item w.pr. 2 i In epoch i: S k 171
14 Random sampling Algorithm [with ormode, Muthu & Yi, PODS 10 JAM 11] Initialize i = 0 In epoch i: upper lower Sites send in every item w.pr. 2 i oordinator maintains a lower sample and an upper sample: each received item goes to either with equal prob. sites S 1 S 2 S 3 S k (Each item is included in lower sample w.pr. 2 (i+1) ) 172
15 Random sampling Algorithm [with ormode, Muthu & Yi, PODS 10 JAM 11] Initialize i = 0 In epoch i: upper lower Sites send in every item w.pr. 2 i oordinator maintains a lower sample and an upper sample: each received item goes to either with equal prob. sites S 1 S 2 S 3 S k (Each item is included in lower sample w.pr. 2 (i+1) ) When the lower sample reaches size s, the broadcasts to k sites advance to epoch i i+1 Discards the upper sample Randomlysplitsthelowersampleintoanewlowerandanuppersample 173
16 Random sampling Algorithm [with ormode, Muthu & Yi, PODS 10 JAM 11] Initialize i = 0 In epoch i: upper lower Sites send in every item w.pr. 2 i oordinator maintains a lower sample and an upper sample: each received item goes to either with equal prob. sites S 1 S 2 S 3 S k (Each item is included in lower sample w.pr. 2 (i+1) ) When the lower sample reaches size s, the broadcasts to k sites advance to epoch i i+1 Discards the upper sample Randomlysplitsthelowersampleintoanewlowerandanuppersample orrectness: (1): In epoch i, each item is maintained in w. pr. 2 i 17
17 Random sampling Algorithm [with ormode, Muthu & Yi, PODS 10 JAM 11] Initialize i = 0 In epoch i: upper lower Sites send in every item w.pr. 2 i oordinator maintains a lower sample and an upper sample: each received item goes to either with equal prob. sites S 1 S 2 S 3 S k (Each item is included in lower sample w.pr. 2 (i+1) ) When the lower sample reaches size s, the broadcasts to k sites advance to epoch i i+1 Discards the upper sample Randomlysplitsthelowersampleintoanewlowerandanuppersample orrectness: (1): In epoch i, each item is maintained in w. pr. 2 i (2): Always s items are maintained in 175
18 A running example upper lower Maintain s = 3 samples Epoch 0 (p = 1) sites S 1 S 2 S 3 S 181
19 A running example upper lower Maintain s = 3 samples Epoch 0 (p = 1) sites S 1 S 2 S 3 S
20 A running example upper lower 1 Maintain s = 3 samples Epoch 0 (p = 1) sites S 1 S 2 S 3 S
21 A running example upper lower 1 Maintain s = 3 samples Epoch 0 (p = 1) sites S 1 S 2 S 3 S
22 A running example upper lower 1 2 Maintain s = 3 samples Epoch 0 (p = 1) sites S 1 S 2 S 3 S
23 A running example upper lower Maintain s = 3 samples Epoch 0 (p = 1) sites S 1 S 2 S 3 S
24 A running example upper lower Maintain s = 3 samples Epoch 0 (p = 1) sites S 1 S 2 S 3 S
25 A running example upper lower Maintain s = 3 samples Epoch 0 (p = 1) sites S 1 S 2 S 3 S
26 A running example upper lower Maintain s = 3 samples Epoch 0 (p = 1) Now lower sample = 3 discard upper sample split lower sample advance to Epoch 1 sites S 1 S 2 S 3 S
27 A running example upper lower 1 5 Maintain s = 3 samples Epoch 0 (p = 1) Now lower sample = 3 discard upper sample split lower sample advance to Epoch 1 sites S 1 S 2 S 3 S
28 A running example (cont.) upper lower 1 5 Maintain s = 3 samples Epoch 1 (p = 1/2) sites S 1 S 2 S 3 S
29 A running example (cont.) upper lower 1 5 Maintain s = 3 samples Epoch 1 (p = 1/2) sites S 1 S 2 S 3 S (discard) 192
30 A running example (cont.) upper lower Maintain s = 3 samples Epoch 1 (p = 1/2) sites S 1 S 2 S 3 S (discard)
31 A running example (cont.) upper lower Maintain s = 3 samples Epoch 1 (p = 1/2) sites S 1 S 2 S 3 S (discard)
32 A running example (cont.) upper lower Maintain s = 3 samples Epoch 1 (p = 1/2) sites S 1 S 2 S 3 S (discard) (discard) 195
33 A running example (cont.) upper lower Maintain s = 3 samples Epoch 1 (p = 1/2) sites S 1 S 2 S 3 S (discard) (discard)
34 A running example (cont.) upper lower Maintain s = 3 samples Epoch 1 (p = 1/2) sites S 1 S 2 S 3 S (discard) (discard)
35 A running example (cont.) upper lower Maintain s = 3 samples Epoch 1 (p = 1/2) Again lower sample = 3 discard upper sample split lower sample advance to Epoch 2 sites S 1 S 2 S 3 S (discard) (discard)
36 A running example (cont.) upper lower Maintain s = 3 samples Epoch 1 (p = 1/2) Again lower sample = 3 discard upper sample split lower sample advance to Epoch 2 sites S 1 S 2 S 3 S (discard) (discard)
37 A running example (cont.) upper lower Maintain s = 3 samples Epoch 2 (p = 1/) More items will be discarded locally sites S 1 S 2 S 3 S (discard) (discard)
38 A running example (cont.) upper lower Maintain s = 3 samples Epoch 2 (p = 1/) More items will be discarded locally Intuition: maintain a sample prob. at each site p s/n (n: total # items) without knowing n. sites S 1 S 2 S 3 S (discard) (discard)
39 Random sampling Analysis Initialize i = 0 In epoch i: upper lower Sites send in every item w.pr. 2 i oordinator maintains a lower sample and an upper sample: each received item goes to either with equal prob. sites S 1 S 2 S 3 S k (Each item is included in lower sample w.pr. 2 (i+1) ) When the lower sample reaches size s, the broadcasts to k sites advance to epoch i i+1 Discards the upper sample Splitsthelowersampleintoanewlowersampleandanuppersample Analysis: (Msgs sent) Messages sent per epoch O(k +s) # epochs O(logn) = O((k +s)logn) 211
40 Random sampling Analysis and experiments an be 1. improved to Θ(klog k/s n+slogn) and 2. extended to sliding window cases. 221
41 Random sampling Analysis and experiments an be 1. improved to Θ(klog k/s n+slogn) and 2. extended to sliding window cases. Experiments on the real data set from 1998 world cup logs. log 2 (cost) theory 15 log 2 (cost) cost 10 3 practice log 2 s Basic case cost VS sample size n = ,k = log 2 s Timebased sliding cost VS sample size n = ,k = ( log2 Timebased sliding cost VS window size s = 128,k = 128 ) w
42 Random sampling Analysis and experiments an be 1. improved to Θ(klog k/s n+slogn) and 2. extended to sliding window cases. log 2 (cost) theory 15 practice Experiments on the real data set from 1998 world cup logs log 2 s Basic case cost VS sample size n = ,k = 128 total # items n = 7,000,000 # items sent,000 size of sample s = 128 # sites k =
43 Sampling from a (timebased) sliding window sliding window expired windows frozen window current window t 101
44 Sampling from a (timebased) sliding window sliding window t expired windows frozen window current window Sample for sliding window = need new ideas (1) a subsample of the (unexpired) sample of frozen window + (2) a subsample of the sample of current window by ISWoR 102
45 Sampling from a (timebased) sliding window sliding window expired windows frozen window current window Sample for sliding window = need new ideas (1) a subsample of the (unexpired) sample of frozen window + (2) a subsample of the sample of current window by ISWoR (1), (2) may be sampled by different rates. But as long as both have sizes min{s,# live items}, fine. t 103
46 Sampling from a (timebased) sliding window sliding window expired windows frozen window current window Sample for sliding window = need new ideas (1) a subsample of the (unexpired) sample of frozen window + (2) a subsample of the sample of current window by ISWoR (1), (2) may be sampled by different rates. But as long as both have sizes min{s,# live items}, fine. The key issue: how to guarantee both have sizes s? as items in the frozen window are expiring... t 10
47 Sampling from a (timebased) sliding window sliding window expired windows frozen window current window Sample for sliding window = need new ideas (1) a subsample of the (unexpired) sample of frozen window + (2) a subsample of the sample of current window by ISWoR (1), (2) may be sampled by different rates. But as long as both have sizes min{s,# live items}, fine. The key issue: how to guarantee both have sizes s? as items in the frozen window are expiring... Solution: In the frozen window, find a good sample rate such that the sample size s. t 105
48 Dealing with the frozen window sliding window expired windows frozen window current window t Keep all the levels? Need O(w) communication 111
49 Dealing with the frozen window sliding window expired windows frozen window current window t Keep all the levels? Need O(w) communication (s = 2) Keep most recent sampled items in a level until s of them are also sampled at the next level. Total size: O(slogw) 112
50 Dealing with the frozen window sliding window expired windows frozen window current window t Keep all the levels? Need O(w) communication (s = 2) Keep most recent sampled items in a level until s of them are also sampled at the next level. Total size: O(slogw) Guaranteed: There is a blue window with s sampled items that covers the unexpired portion of the frozen window 113
51 Dealing with the frozen window: The algorithm (s = 2) Each site builds its own levelsampling structure for the current window until it freezes Needs O(slogw) space and O(1) time per item 121
52 Dealing with the frozen window: The algorithm (s = 2) Each site builds its own levelsampling structure for the current window until it freezes Needs O(slogw) space and O(1) time per item When the current window freezes For each level, do a kway merge to build the level of the global structure at the. Total communication O((k + s) log w) 122
Optimal Tracking of Distributed Heavy Hitters and Quantiles
Optimal Tracking of Distributed Heavy Hitters and Quantiles Ke Yi HKUST yike@cse.ust.hk Qin Zhang MADALGO Aarhus University qinzhang@cs.au.dk Abstract We consider the the problem of tracking heavy hitters
More informationLECTURE 16. Readings: Section 5.1. Lecture outline. Random processes Definition of the Bernoulli process Basic properties of the Bernoulli process
LECTURE 16 Readings: Section 5.1 Lecture outline Random processes Definition of the Bernoulli process Basic properties of the Bernoulli process Number of successes Distribution of interarrival times The
More informationSublinear Algorithms for Big Data. Part 4: Random Topics
Sublinear Algorithms for Big Data Part 4: Random Topics Qin Zhang 11 21 Topic 1: Compressive sensing Compressive sensing The model (CandesRombergTao 04; Donoho 04) Applicaitons Medical imaging reconstruction
More informationNimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff
Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear
More informationBig Data & Scripting Part II Streaming Algorithms
Big Data & Scripting Part II Streaming Algorithms 1, Counting Distinct Elements 2, 3, counting distinct elements problem formalization input: stream of elements o from some universe U e.g. ids from a set
More informationZabin Visram Room CS115 CS126 Searching. Binary Search
Zabin Visram Room CS115 CS126 Searching Binary Search Binary Search Sequential search is not efficient for large lists as it searches half the list, on average Another search algorithm Binary search Very
More informationrésumé de flux de données
résumé de flux de données CLEROT Fabrice fabrice.clerot@orangeftgroup.com Orange Labs data streams, why bother? massive data is the talk of the town... data streams, why bother? service platform production
More informationB669 Sublinear Algorithms for Big Data
B669 Sublinear Algorithms for Big Data Qin Zhang 11 Now about the Big Data Big data is everywhere : over 2.5 petabytes of sales transactions : an index of over 19 billion web pages : over 40 billion of
More informationEfficient FaultTolerant Infrastructure for Cloud Computing
Efficient FaultTolerant Infrastructure for Cloud Computing Xueyuan Su Candidate for Ph.D. in Computer Science, Yale University December 2013 Committee Michael J. Fischer (advisor) Dana Angluin James Aspnes
More informationFundamentals of Analyzing and Mining Data Streams
Fundamentals of Analyzing and Mining Data Streams Graham Cormode graham@research.att.com Outline 1. Streaming summaries, sketches and samples Motivating examples, applications and models Random sampling:
More information9th MaxPlanck Advanced Course on the Foundations of Computer Science (ADFOCS) PrimalDual Algorithms for Online Optimization: Lecture 1
9th MaxPlanck Advanced Course on the Foundations of Computer Science (ADFOCS) PrimalDual Algorithms for Online Optimization: Lecture 1 Seffi Naor Computer Science Dept. Technion Haifa, Israel Introduction
More informationCloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman To motivate the Bloomfilter idea, consider a web crawler. It keeps, centrally, a list of all the URL s it has found so far. It
More informationPoint Biserial Correlation Tests
Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the productmoment correlation calculated between a continuous random variable
More information4. MAC protocols and LANs
4. MAC protocols and LANs 1 Outline MAC protocols and sublayers, LANs: Ethernet, Token ring and Token bus Logic Link Control (LLC) sublayer protocol Bridges: transparent (spanning tree), source routing
More informationSampling Central Limit Theorem Proportions. Outline. 1 Sampling. 2 Central Limit Theorem. 3 Proportions
Outline 1 Sampling 2 Central Limit Theorem 3 Proportions Outline 1 Sampling 2 Central Limit Theorem 3 Proportions Populations and samples When we use statistics, we are trying to find out information about
More informationData Mining on Streams
Data Mining on Streams Using Decision Trees CS 536: Machine Learning Instructor: Michael Littman TA: Yihua Wu Outline Introduction to data streams Overview of traditional DT learning ALG DT learning ALGs
More informationCross Validation. Dr. Thomas Jensen Expedia.com
Cross Validation Dr. Thomas Jensen Expedia.com About Me PhD from ETH Used to be a statistician at Link, now Senior Business Analyst at Expedia Manage a database with 720,000 Hotels that are not on contract
More informationBig Data & Scripting Part II Streaming Algorithms
Big Data & Scripting Part II Streaming Algorithms 1, 2, a note on sampling and filtering sampling: (randomly) choose a representative subset filtering: given some criterion (e.g. membership in a set),
More informationCompact Summaries for Large Datasets
Compact Summaries for Large Datasets Big Data Graham Cormode University of Warwick G.Cormode@Warwick.ac.uk The case for Big Data in one slide Big data arises in many forms: Medical data: genetic sequences,
More informationMining Data Streams. Chapter 4. 4.1 The Stream Data Model
Chapter 4 Mining Data Streams Most of the algorithms described in this book assume that we are mining a database. That is, all our data is available when and if we want it. In this chapter, we shall make
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture  17 ShannonFanoElias Coding and Introduction to Arithmetic Coding
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationSummary Data Structures for Massive Data
Summary Data Structures for Massive Data Graham Cormode Department of Computer Science, University of Warwick graham@cormode.org Abstract. Prompted by the need to compute holistic properties of increasingly
More informationLecture 13: Validation
Lecture 3: Validation g Motivation g The Holdout g Resampling techniques g Threeway data splits Motivation g Validation techniques are motivated by two fundamental problems in pattern recognition: model
More informationVisual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics
Motivation Visual Data Mining Visualization for Data Mining Huge amounts of information Limited display capacity of output devices Chidroop Madhavarapu CSE 591:Visual Analytics Visual Data Mining (VDM)
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationMODELING RANDOMNESS IN NETWORK TRAFFIC
MODELING RANDOMNESS IN NETWORK TRAFFIC  LAVANYA JOSE, INDEPENDENT WORK FALL 11 ADVISED BY PROF. MOSES CHARIKAR ABSTRACT. Sketches are randomized data structures that allow one to record properties of
More informationSOLVING DATABASE QUERIES USING ORTHOGONAL RANGE SEARCHING
SOLVING DATABASE QUERIES USING ORTHOGONAL RANGE SEARCHING CPS234: Computational Geometry Prof. Pankaj Agarwal Prepared by: Khalid Al Issa (khalid@cs.duke.edu) Fall 2005 BACKGROUND : RANGE SEARCHING, a
More informationCSC148 Lecture 8. Algorithm Analysis Binary Search Sorting
CSC148 Lecture 8 Algorithm Analysis Binary Search Sorting Algorithm Analysis Recall definition of Big Oh: We say a function f(n) is O(g(n)) if there exists positive constants c,b such that f(n)
More informationSampling Based Range Partition Methods for Big Data Analytics
Milan Vojnović Microsoft Research milanv@microsoft.com Sampling Based Range Partition Methods for Big Data Analytics Fei Xu Microsoft Corporation feixu@microsoft.com March 202 Technical Report MSRTR2028
More informationCSC/ECE 774 Advanced Network Security. Overview. BiBa Signature Scheme. BiBa stands for Bins and Balls
CSC/ECE 774 Advanced Network Security Topic 3.2 BiBa Dr. Peng Ning CSC/ECE 774  Adv. Net. Security 1 Overview BiBa stands for Bins and Balls Use oneway functions without trapdoors (e.g., hash functions)
More informationB490 Mining the Big Data. 2 Clustering
B490 Mining the Big Data 2 Clustering Qin Zhang 11 Motivations Group together similar documents/webpages/images/people/proteins/products One of the most important problems in machine learning, pattern
More informationContinuous Matrix Approximation on Distributed Data
Continuous Matrix Approximation on Distributed Data Mina Ghashami School of Computing University of Utah ghashami@cs.uah.edu Jeff M. Phillips School of Computing University of Utah jeffp@cs.uah.edu Feifei
More informationLog Wealth In each period, algorithm A performs 85% as well as algorithm B
1 Online Investing Portfolio Selection Consider n stocks Our distribution of wealth is some vector b e.g. (1/3, 1/3, 1/3) At end of one period, we get a vector of price relatives x e.g. (0.98, 1.02, 1.00)
More informationMaster s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
More informationB561 Advanced Database Concepts. 0 Introduction. Qin Zhang 11
B561 Advanced Database Concepts 0 Introduction Qin Zhang 11 Self introduction: my research interests Algorithms for Big Data: streaming/sketching algorithms; algorithms on distributed data; I/Oefficient
More informationOnline Maintenance of Very Large Random Samples
Online Maintenance of Very Large Random Samples Christopher Jermaine, Abhijit Pol, Subramanian Arumugam Department of Computer and Information Sciences and Engineering University of Florida Gainesville,
More informationA binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heaporder property
CmSc 250 Intro to Algorithms Chapter 6. Transform and Conquer Binary Heaps 1. Definition A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called
More informationKCover of Binary sequence
KCover of Binary sequence Prof Amit Kumar Prof Smruti Sarangi Sahil Aggarwal Swapnil Jain Given a binary sequence, represent all the 1 s in it using at most k covers, minimizing the total length of all
More informationA. OPENING POINT CLOUDS. (Notepad++ Text editor) (Cloud Compare Point cloud and mesh editor) (MeshLab Point cloud and mesh editor)
MeshLAB tutorial 1 A. OPENING POINT CLOUDS (Notepad++ Text editor) (Cloud Compare Point cloud and mesh editor) (MeshLab Point cloud and mesh editor) 2 OPENING POINT CLOUDS IN NOTEPAD ++ Let us understand
More information1: Meditech Information Systems Online Help 2: Links to Micromedex 3: Print page view 4: Lock Session: temporarily minimize the session and lock it
MEDITECH 6.0 Training Manual Training Manual Outline Shortcuts Main Menu Scheduling Desktop Header /F Footer Bars Basic Scheduling Set Scheduling Medical Necessity / Prior Authorization Pending Appointments
More informationSYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis
SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October 17, 2015 Outline
More informationMonte Carlo testing with Big Data
Monte Carlo testing with Big Data Patrick RubinDelanchy University of Bristol & Heilbronn Institute for Mathematical Research Joint work with: Axel Gandy (Imperial College London) with contributions from:
More informationADOBE ACROBAT CONNECT PRO MOBILE VISUAL QUICK START GUIDE
ADOBE ACROBAT CONNECT PRO MOBILE VISUAL QUICK START GUIDE GETTING STARTED WITH ADOBE ACROBAT CONNECT PRO MOBILE FOR IPHONE AND IPOD TOUCH Overview Attend Acrobat Connect Pro meetings using your iphone
More informationBinary Heaps. CSE 373 Data Structures
Binary Heaps CSE Data Structures Readings Chapter Section. Binary Heaps BST implementation of a Priority Queue Worst case (degenerate tree) FindMin, DeleteMin and Insert (k) are all O(n) Best case (completely
More information6.2 Normal distribution. Standard Normal Distribution:
6.2 Normal distribution Slide Heights of Adult Men and Women Slide 2 Area= Mean = µ Standard Deviation = σ Donation: X ~ N(µ,σ 2 ) Standard Normal Distribution: Slide 3 Slide 4 a normal probability distribution
More informationData Streams A Tutorial
Data Streams A Tutorial Nicole Schweikardt GoetheUniversität Frankfurt am Main DEIS 10: GIDagstuhl Seminar on Data Exchange, Integration, and Streams Schloss Dagstuhl, November 8, 2010 Data Streams Situation:
More informationDistributed Computing over Communication Networks: Maximal Independent Set
Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.
More informationActive Queue Management
Active Queue Management TELCOM2321 CS2520 Wide Area Networks Dr. Walter Cerroni University of Bologna Italy Visiting Assistant Professor at SIS, Telecom Program Slides partly based on Dr. Znati s material
More informationPulling a Random Sample from a MAXQDA Dataset
In this guide you will learn how to pull a random sample from a MAXQDA dataset, using the random cell function in Excel. In this process you will learn how to export and reimport variables from MAXQDA.
More informationSMALL INDEX LARGE INDEX (SILT)
Wayne State University ECE 7650: Scalable and Secure Internet Services and Architecture SMALL INDEX LARGE INDEX (SILT) A Memory Efficient High Performance Key Value Store QA REPORT Instructor: Dr. Song
More informationOffline sorting buffers on Line
Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com
More informationTEACHING STATISTICS IN MANAGEMENT COURSES IN INDIA. R. P. Suresh Indian Institute of Management Kozhikode India
TEACHING STATISTICS IN MANAGEMENT COURSES IN INDIA R. P. Suresh Indian Institute of Management Kozhikode India Statistics is taught as a core course in the Management programmes leading to the degree of
More informationContinuous Monitoring of Topk Queries over Sliding Windows
Continuous Monitoring of Topk Queries over Sliding Windows Kyriakos Mouratidis 1 Spiridon Bakiras 2 Dimitris Papadias 2 1 School of Information Systems Singapore Management University 8 Stamford Road,
More informationQuery Selectivity Estimation for Uncertain Data
Query Selectivity Estimation for Uncertain Data Sarvjeet Singh *, Chris Mayfield *, Rahul Shah #, Sunil Prabhakar * and Susanne Hambrusch * * Department of Computer Science, Purdue University # Department
More informationAn XML Framework for Integrating Continuous Queries, Composite Event Detection, and Database Condition Monitoring for Multiple Data Streams
An XML Framework for Integrating Continuous Queries, Composite Event Detection, and Database Condition Monitoring for Multiple Data Streams Susan D. Urban 1, Suzanne W. Dietrich 1, 2, and Yi Chen 1 Arizona
More informationBUSINESS ANALYTICS. Data Preprocessing. Lecture 3. Information Systems and Machine Learning Lab. University of Hildesheim.
Tomáš Horváth BUSINESS ANALYTICS Lecture 3 Data Preprocessing Information Systems and Machine Learning Lab University of Hildesheim Germany Overview The aim of this lecture is to describe some data preprocessing
More informationStreaming Algorithms
3 Streaming Algorithms Great Ideas in Theoretical Computer Science Saarland University, Summer 2014 Some Admin: Deadline of Problem Set 1 is 23:59, May 14 (today)! Students are divided into two groups
More informationSeed Distributions for the NCAA Men s Basketball Tournament: Why it May Not Matter Who Plays Whom*
Seed Distributions for the NCAA Men s Basketball Tournament: Why it May Not Matter Who Plays Whom* Sheldon H. Jacobson Department of Computer Science University of Illinois at UrbanaChampaign shj@illinois.edu
More informationBayes and Naïve Bayes. cs534machine Learning
Bayes and aïve Bayes cs534machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
More informationNessus Cloud User Registration
Nessus Cloud User Registration Create Your Tenable Nessus Cloud Account 1. Click on the provided URL to create your account. If the link does not work, please cut and paste the entire URL into your browser.
More informationBINARY OPTIONS MENTOR USER MANUAL
BINARY OPTIONS MENTOR USER MANUAL Version 1.4 Introduction : Welcome to Binary Options Mentor!! We provide you herewith a system of alerts with arrows for binary options. This system works for all timeframes
More informationUsing DataOblivious Algorithms for Private Cloud Storage Access. Michael T. Goodrich Dept. of Computer Science
Using DataOblivious Algorithms for Private Cloud Storage Access Michael T. Goodrich Dept. of Computer Science Privacy in the Cloud Alice owns a large data set, which she outsources to an honestbutcurious
More informationClustering Very Large Data Sets with Principal Direction Divisive Partitioning
Clustering Very Large Data Sets with Principal Direction Divisive Partitioning David Littau 1 and Daniel Boley 2 1 University of Minnesota, Minneapolis MN 55455 littau@cs.umn.edu 2 University of Minnesota,
More informationAre You Wasting PPC Budget?
Are You Wasting PPC Budget? THOMAS HAYNES. APR 2015. The Problem The average small business wastes 25% of their AdWords PPC budget Wo r d S t r e a m r e s e a r c h 2 0 1 3 How does this happen? Many
More informationChapter 13: Query Processing. Basic Steps in Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationUSING OUTLOOK WEB ACCESS
USING OUTLOOK WEB ACCESS 17 March 2009, Version 1.0 WHAT IS OUTLOOK WEB ACCESS? Outlook Web Access (OWA) is a webmail service of Microsoft Exchange Server. The web interface of Outlook Web Access resembles
More informationDistributed Data Management Summer Semester 2013 TU Kaiserslautern
Distributed Data Management Summer Semester 2013 TU Kaiserslautern Dr. Ing. Sebas4an Michel smichel@mmci.uni saarland.de 1 Lecture 8+ (DISTRIBUTED) DATA STREAM PROCESSING (INTRODUCTION) 2 So Far: Databases/NoSQL
More informationThis presentation introduces you to the Decision Governance Framework that is new in IBM Operational Decision Manager version 8.5 Decision Center.
This presentation introduces you to the Decision Governance Framework that is new in IBM Operational Decision Manager version 8.5 Decision Center. ODM85_DecisionGovernanceFramework.ppt Page 1 of 32 The
More informationCovered California s 2014 Sliding Scale Plans Single Person
Covered California s 2014 Sliding Scale Plans Single Person Annual Income $11,490  $17,235 $17,235  $22,980 $22,980  $28,725 $28,725  $45,960 $19  $57 $57  $121 $121  $193 $193  $364 Covered California
More informationInference of Probability Distributions for Trust and Security applications
Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations
More informationCall and Put. Options. American and European Options. Option Terminology. Payoffs of European Options. Different Types of Options
Call and Put Options A call option gives its holder the right to purchase an asset for a specified price, called the strike price, on or before some specified expiration date. A put option gives its holder
More informationBright House Networks Home Security and Automation Mobile Application. Quick Start Guide
Bright House Networks Home Security and Automation Mobile Application Quick Start Guide Home Security and Automation Mobile App User Guide Table of Contents Installing the Mobile Application... 4 Configuring
More informationUsing the Complete W2 efile Service Center
Using the Complete W2 efile Service Center If you are interested, for a minimal fee per employee, saving considerable time, expense and much of the headache associated with the yearend W2 filing, CYMA
More informationApproximate Counts and Quantiles over Sliding Windows
Approximate Counts and Quantiles over Sliding Windows Arvind Arasu Stanford University arvinda@cs.stanford.edu Gurmeet Singh Manku Stanford University manku@cs.stanford.edu ABSTRACT We consider the problem
More informationBright House Networks Home Security and Control Mobile Application
Bright House Networks Home Security and Control Mobile Application Quick Start Guide LIC# EF20001092 Home Security and Control Mobile App User Guide Table of Contents Installing the Mobile Application...
More informationInNetwork Coding for Resilient Sensor Data Storage and Efficient Data Mule Collection
InNetwork Coding for Resilient Sensor Data Storage and Efficient Data Mule Collection Michele Albano Jie Gao Instituto de telecomunicacoes, Aveiro, Portugal Stony Brook University, Stony Brook, USA Data
More informationData Structures. Algorithm Performance and Big O Analysis
Data Structures Algorithm Performance and Big O Analysis What s an Algorithm? a clearly specified set of instructions to be followed to solve a problem. In essence: A computer program. In detail: Defined
More informationSecretary Problems. October 21, 2010. José Soto SPAMS
Secretary Problems October 21, 2010 José Soto SPAMS A little history 50 s: Problem appeared. 60 s: Simple solutions: Lindley, Dynkin. 7080: Generalizations. It became a field. ( Toy problem for Theory
More informationCoLocationResistant Clouds
CoLocationResistant Clouds Yossi Azar TelAviv University azar@tau.ac.il Mariana Raykova SRI International mariana.raykova@sri.com Seny Kamara Microsoft Research senyk@microsoft.com Ishai Menache Microsoft
More informationMATH 140 Lab 4: Probability and the Standard Normal Distribution
MATH 140 Lab 4: Probability and the Standard Normal Distribution Problem 1. Flipping a Coin Problem In this problem, we want to simualte the process of flipping a fair coin 1000 times. Note that the outcomes
More informationPolarization codes and the rate of polarization
Polarization codes and the rate of polarization Erdal Arıkan, Emre Telatar Bilkent U., EPFL Sept 10, 2008 Channel Polarization Given a binary input DMC W, i.i.d. uniformly distributed inputs (X 1,...,
More informationMerge Sort. 2004 Goodrich, Tamassia. Merge Sort 1
Merge Sort 7 2 9 4 2 4 7 9 7 2 2 7 9 4 4 9 7 7 2 2 9 9 4 4 Merge Sort 1 DivideandConquer Divideand conquer is a general algorithm design paradigm: Divide: divide the input data S in two disjoint subsets
More informationData Storage  II: Efficient Usage & Errors
Data Storage  II: Efficient Usage & Errors Week 10, Spring 2005 Updated by M. Naci Akkøk, 27.02.2004, 03.03.2005 based upon slides by Pål Halvorsen, 12.3.2002. Contains slides from: Hector GarciaMolina
More informationCombinatorics: The Fine Art of Counting
Combinatorics: The Fine Art of Counting Week 7 Lecture Notes Discrete Probability Continued Note Binomial coefficients are written horizontally. The symbol ~ is used to mean approximately equal. The Bernoulli
More informationFact Sheet InMemory Analysis
Fact Sheet InMemory Analysis 1 Copyright Yellowfin International 2010 Contents In Memory Overview...3 Benefits...3 Agile development & rapid delivery...3 Data types supported by the InMemory Database...4
More informationFrontier Tandem. Administrator User Guide. Version 2.4 January 28, 2013
Frontier Tandem Administrator User Guide Version 2.4 January 28, 2013 About This Document 1 Version 7.3 Jan 28, 2013 Frontier Tandem Administrator Guide CONFIDENTIAL About This Document The Frontier Small
More informationLectures 6&7: Image Enhancement
Lectures 6&7: Image Enhancement Leena Ikonen Pattern Recognition (MVPR) Lappeenranta University of Technology (LUT) leena.ikonen@lut.fi http://www.it.lut.fi/ip/research/mvpr/ 1 Content Background Spatial
More informationLocal Area Networks transmission system private speedy and secure kilometres shared transmission medium hardware & software
Local Area What s a LAN? A transmission system, usually private owned, very speedy and secure, covering a geographical area in the range of kilometres, comprising a shared transmission medium and a set
More informationBig Data. Lecture 6: Locality Sensitive Hashing (LSH)
Big Data Lecture 6: Locality Sensitive Hashing (LSH) Nearest Neighbor Given a set P of n oints in R d Nearest Neighbor Want to build a data structure to answer nearest neighbor queries Voronoi Diagram
More informationContinuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream
Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream Xuemin Lin University of New South Wales Sydney, Australia Jian Xu University of New South Wales Sydney, Australia
More informationChapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.dbbook.com for conditions on reuse Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
More informationCPSC 491. Today: Source code control. Source Code (Version) Control. Exercise: g., no git, subversion, cvs, etc.)
Today: Source code control CPSC 491 Source Code (Version) Control Exercise: 1. Pretend like you don t have a version control system (e. g., no git, subversion, cvs, etc.) 2. How would you manage your source
More informationClassification: Basic Concepts, Decision Trees, and Model Evaluation. General Approach for Building Classification Model
10 10 Classification: Basic Concepts, Decision Trees, and Model Evaluation Dr. Hui Xiong Rutgers University Introduction to Data Mining 1//009 1 General Approach for Building Classification Model Tid Attrib1
More informationResource Provisioning of Web Applications in Heterogeneous Cloud. Jiang Dejun Supervisor: Guillaume Pierre 20110410
Resource Provisioning of Web Applications in Heterogeneous Cloud Jiang Dejun Supervisor: Guillaume Pierre 0410 Background Cloud is an attractive hosting platform for startup Web applications On demand
More informationAlgorithmic Techniques for Big Data Analysis. Barna Saha AT&T LabResearch
Algorithmic Techniques for Big Data Analysis Barna Saha AT&T LabResearch Challenges of Big Data VOLUME Large amount of data VELOCITY Needs to be analyzed quickly VARIETY Different types of structured
More informationCongestion Control Overview
Congestion Control Overview Problem: When too many packets are transmitted through a network, congestion occurs t very high traffic, performance collapses completely, and almost no packets are delivered
More informationCamCard Business User Guide
CamCard Business User Guide Contents Add Cards... 3 Take Photo... 3 Create Manually... 4 Import Excel... 5 Batch Scan Cards with Scanner... 6 Add from Contacts... 6 Exchang Cards... 7 Copy Cards from Private
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationEfficient Data Streams Processing in the Real Time Data Warehouse
Efficient Data Streams Processing in the Real Time Data Warehouse Fiaz Majeed fiazcomsats@gmail.com Muhammad Sohaib Mahmood m_sohaib@yahoo.com Mujahid Iqbal mujahid_rana@yahoo.com Abstract Today many business
More information