Big Data and Scripting. Plotting in R
|
|
- Joanna Powell
- 8 years ago
- Views:
Transcription
1 1, Big Data and Scripting Plotting in R
2 2, the art of plotting: first steps fundament of plotting in R: plot(x,y) plot some random values: plot(runif(10)) values are interpreted as y-values, x-values filled in as 1:10 plot a nx2 array of points in a scatterplot: plot(x) plot has a humongous amount of parameters with strange names pch - change point type (e.g. pch=20 gives points) cex - change point size col - change point color,...
3 3, a simple plotting example supply lists for point-wise settings example: data(iris) # load some flower data attach(iris) plot(iris, col=species) # plot the whole thing # plot specific axes plot(sepal.length, Sepal.Width, col=species) plot points in x, colored by species use rainbow() to create colors create individual colors with rgb() or gray()
4 4, setting parameters for plotting plot() accepts a number of parameters even more can be set using par() outer margins with mar=c(down, left,up,right) overplotting with new=t plot to certain areas fig=c(left,right,lower, upper) switch off axes with axes=f make your own with axis() some, but not all, of these parameters can be passed to plot() return value is a list with the old values of the changed parameters can be used to reset parameters to previous state
5 5, a more complicated plotting example data(iris)# get data attach(iris)# attach for easy access # plot petal width x height plot(petal.length, Petal.Width, col=species, pch=20) # make a small box on top with sepal values par=par(fig=c(0.6,0.9,0.18,0.48), new=t, mar=c(1,1,1,0)+0.1, cex=0.8) plot(sepal.length, Sepal.Width, col=species,pch=20, axes=f, main="sepal extensions") box()# make a box around the small plot par(par) # reset parameters detach(iris)
6 the resulting plot Petal.Length Petal.Width sepal extensions Sepal.Length Sepal.Width 6,
7 specialized plot functions many packages provide specialized plot functions for their results example: library(igraph) g=graph.star(15) plot(g) this uses the overriding mechanism for functions called dispatch not covered here, see stat.ethz.ch/r-manual/r-devel/library/methods/html/ Methods.html for detailed information 7,
8 8, plotting to files plotting to files is simple with file devices, example: pdf("plot.pdf");# open plot.pdf in current dir plot(1:5); # plot something dev.off(); # close device (and write file) devices can be opened, e.g. x11() opens a plotting window there is usually a currently active device if not, a plot window is created dev.off() closes the active device writes files to disk (for file devices) if possible, switches to the previously active device variants: x11(), pdf(), svg(), jpeg(),... besides file, individual parameters for each format (e.g. size for pdf, resolution for jpeg)
9 9, example: visualizing a distribution of networks c(0, 1) c(0, 1) c(0, 1) c(0, 1) c(0, 1) c(0, 1) c(0, 1) c(0, 1) 31.90x7.78 (65.31%)
10 example: visualizing a spectral distributions density ,
11 11, some useful plotting functions bars() create a bar plot hist() create a histogram of values and plot it points() add additional points lines() create lines connecting the given points grconvertx(), grconverty() convert between coordinate systems
12 Parallel Programming on a multi CPU System 12,
13 13, basic questions about the machine model execute algorithms on multiple CPUs (cores) CPUs load data from memory into their registers, compute something and write the results back to memory 1. do all cores have access to the same memory? yes: (following) PRAM-model (parallel random access memory) no: (later) distributed computing 2. concurrent access (reading/writing in parallel)? parallel reading: exclusive or concurrent parallel writing: if concurrent, which value stays? four different variants in the following, we allow concurrent reading and avoid concurrent writing
14 14, an example algorithm: summation problem given an array of numbers A[1],...,A[n] determine sum over all A[i] straightforward without parallel execution: O(n) speed up with more cores possible?
15 15, parallel summation: idea partition into smallest possible subproblems solve these in parallel combine the results again parallel continue until all values are combined
16 16, algorithm input: array A, # assume length(a)=n=2 h B=A;// B holds results on current level while(length(b)>1){// while intermediate results have to be combined T=array(length(B)/2) parallel for(i in 1:length(T)){ // execute in parallel T[i]=B[2*i]+B[2*i-1] // solve subproblem } B=T // advance to next level } return(b[1]) assumptions/preconditions: length of A is power of two (if not, pad with zeros) the + -operation is distributive, i.e. (a + b) + c = a + (b + c) approach works for every distributive operation
17 17, analysis memory: need additional array for current level number of operations: (length(a)=n = 2 h ) 2 h h = 2 h 1 O(n) no gain in comparison to sequential approach execution time on n/2 cores let one + -operation take O(f (n)) time and length(a)=n assume, copying B=A and B=T is done in parallel, too inner for-loop is executed in parallel time O(1) outer while-loop iterates levels of binary tree log 2 n levels total time consumption: O(f (n) log 2 n), for + O(log 2 n) note difference between number of operations and execution time
18 execution of n parallel processes on c cores our analysis assumed that there are n/2 cores available that s usually an unrealistic assumption instead: distribute parallel processes to as many cores as possible example for simple parallel execution on limited number of cores input: array of tasks: jobs, number of cores: cores executeparallel=function(jobs, cores){ i=1; while(i<length(jobs)){ parallel for(j in i:(i+cores-1)){ start(jobs[j]); } i=i+cores; } parallel for executes all iterations in parallel 18,
19 19, a more flexible parallelization approach (idea) assume operations depend on intermediate results created by other operations no simple systematic, but the more general case e.g. 2 depends on input from 3 and 1 8 can be executed, when 7 is finished, while 4 has in addition to wait for 5 and 2
20 19, a more flexible parallelization approach (idea) several possible execution orders optimal order depends on execution times simple strategy: 1. list of unoccupied cores 2. list of unfinished jobs, with number of unfinished dependencies 3. start unfinished jobs with no unfinished dependencies until all cores occupied 4. when job finishes: decrease number of unfinished dependencies on depending jobs 5. if not finished, repeat from 3
21 20, intermission: mapply new 1 apply variant mapply(fun,...) first argument is function to apply following arguments are vectors or lists to apply fun to calls fun for element i in all following lists if arguments are named, fun is called with named arguments >fun=function(a,b){paste(a,b,sep="-")} >mapply(fun,b=1:6,a=3:1); [1] "3-1" "2-2" "1-3" "3-4" "2-5" "1-6" naming of arguments makes order irrelevant shorter vectors are reused result: list of return values of fun 1 that s number 5
22 21, parallelization in R library parallel provides functions for parallel computations in particular: mcmapply() parallel mapply() mclapply() parallel lapply() execute functions for list elements in parallel important parameters: mc.cores - the max. number of CPU cores to use mc.preschedule decide job to core distribution at start or dynamically TRUE for many small and/or equal length jobs FALSE if jobs vary strongly in execution time
23 22, parallelize distributive functions as R code parallelaccumulate=function(f,a){ require(parallel); b=a; while(length(b)>1){ b=mclapply(1:(length(b)/2), function(i) return(f(b[[2*i]],b[[2*i-1]])); ); } return(b); } execution: plus=function(a,b) {a+b}; parallelaccumulate(plus,1:64); simple, but not very generic
24 23, parallelization of a function function in R parallelize=function(f){ par=function(b){ require(parallel); b=a; while(length(b)>1){ b=mclapply(1:(length(b)/2), function(i) return(f(b[[2*i]],b[[2*i-1]]))); } return(b[[1]]); } return(par); } execution: plus=function(a,b) {a+b}; psum=parallelize(plus); psum(1:64);
Linear Discriminant Analysis
Fiche TD avec le logiciel : course5 Linear Discriminant Analysis A.B. Dufour Contents 1 Fisher s iris dataset 2 2 The principle 5 2.1 Linking one variable and a factor.................. 5 2.2 Linking a
More informationGetting Started with R and RStudio 1
Getting Started with R and RStudio 1 1 What is R? R is a system for statistical computation and graphics. It is the statistical system that is used in Mathematics 241, Engineering Statistics, for the following
More informationViewing Ecological data using R graphics
Biostatistics Illustrations in Viewing Ecological data using R graphics A.B. Dufour & N. Pettorelli April 9, 2009 Presentation of the principal graphics dealing with discrete or continuous variables. Course
More informationGraphics in R. Biostatistics 615/815
Graphics in R Biostatistics 615/815 Last Lecture Introduction to R Programming Controlling Loops Defining your own functions Today Introduction to Graphics in R Examples of commonly used graphics functions
More informationAnalysis of Binary Search algorithm and Selection Sort algorithm
Analysis of Binary Search algorithm and Selection Sort algorithm In this section we shall take up two representative problems in computer science, work out the algorithms based on the best strategy to
More informationPackage tagcloud. R topics documented: July 3, 2015
Package tagcloud July 3, 2015 Type Package Title Tag Clouds Version 0.6 Date 2015-07-02 Author January Weiner Maintainer January Weiner Description Generating Tag and Word Clouds.
More informationIris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode
Iris Sample Data Set Basic Visualization Techniques: Charts, Graphs and Maps CS598 Information Visualization Spring 2010 Many of the exploratory data techniques are illustrated with the Iris Plant data
More informationTutorial 2: Descriptive Statistics and Exploratory Data Analysis
Tutorial 2: Descriptive Statistics and Exploratory Data Analysis Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 A very basic understanding of the R software environment is assumed.
More informationTutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler
Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Topics Exploratory Data Analysis Summary Statistics Visualization What is data exploration?
More informationData Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
More informationAdvanced Statistical Methods in Insurance
Advanced Statistical Methods in Insurance 7. Multivariate Data All Pairwise Scattergrams Iris Data Set: 3 Species 50 Cases of each with p=4 measurements per case 2 Hudec & Schlögl 1 3-d Scatterplots iris[,
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 8/05/2005 1 What is data exploration? A preliminary
More informationData-Intensive Applications on HPC Using Hadoop, Spark and RADICAL-Cybertools
Data-Intensive Applications on HPC Using Hadoop, Spark and RADICAL-Cybertools Shantenu Jha, Andre Luckow, Ioannis Paraskevakos RADICAL, Rutgers, http://radical.rutgers.edu Agenda 1. Motivation and Background
More informationData Visualization in R
Data Visualization in R L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 2014 Introduction Motivation for Data Visualization Humans are outstanding at detecting
More informationR Graphics II: Graphics for Exploratory Data Analysis
UCLA Department of Statistics Statistical Consulting Center Irina Kukuyeva ikukuyeva@stat.ucla.edu April 26, 2010 Outline 1 Summary Plots 2 Time Series Plots 3 Geographical Plots 4 3D Plots 5 Simulation
More informationCOM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3
COMP 5318 Data Exploration and Analysis Chapter 3 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping
More informationVisualizing class probability estimators
Visualizing class probability estimators Eibe Frank and Mark Hall Department of Computer Science University of Waikato Hamilton, New Zealand {eibe, mhall}@cs.waikato.ac.nz Abstract. Inducing classifiers
More informationChapter 13: Query Processing. Basic Steps in Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationVisualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures
Introductory Statistics Lectures Visualizing Data Descriptive Statistics I Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the
More informationGraphical Representation of Multivariate Data
Graphical Representation of Multivariate Data One difficulty with multivariate data is their visualization, in particular when p > 3. At the very least, we can construct pairwise scatter plots of variables.
More informationLecture 2: Exploratory Data Analysis with R
Lecture 2: Exploratory Data Analysis with R Last Time: 1. Introduction: Why use R? / Syllabus 2. R as calculator 3. Import/Export of datasets 4. Data structures 5. Getting help, adding packages 6. Homework
More informationBernd Klaus, some input from Wolfgang Huber, EMBL
Exploratory Data Analysis and Graphics Bernd Klaus, some input from Wolfgang Huber, EMBL Graphics in R base graphics and ggplot2 (grammar of graphics) are commonly used to produce plots in R; in a nutshell:
More informationExploratory Data Analysis
Exploratory Data Analysis Paul Cohen ISTA 370 Spring, 2012 Paul Cohen ISTA 370 () Exploratory Data Analysis Spring, 2012 1 / 46 Outline Data, revisited The purpose of exploratory data analysis Learning
More informationWhy? A central concept in Computer Science. Algorithms are ubiquitous.
Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online
More informationCluster Analysis using R
Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other
More informationClustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
More informationCUDA Programming. Week 4. Shared memory and register
CUDA Programming Week 4. Shared memory and register Outline Shared memory and bank confliction Memory padding Register allocation Example of matrix-matrix multiplication Homework SHARED MEMORY AND BANK
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.
More informationLoad Balancing in MapReduce Based on Scalable Cardinality Estimates
Load Balancing in MapReduce Based on Scalable Cardinality Estimates Benjamin Gufler 1, Nikolaus Augsten #, Angelika Reiser 3, Alfons Kemper 4 Technische Universität München Boltzmannstraße 3, 85748 Garching
More informationOperating Systems OBJECTIVES 7.1 DEFINITION. Chapter 7. Note:
Chapter 7 OBJECTIVES Operating Systems Define the purpose and functions of an operating system. Understand the components of an operating system. Understand the concept of virtual memory. Understand the
More informationSorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)
Sorting revisited How did we use a binary search tree to sort an array of elements? Tree Sort Algorithm Given: An array of elements to sort 1. Build a binary search tree out of the elements 2. Traverse
More informationContributions to Gang Scheduling
CHAPTER 7 Contributions to Gang Scheduling In this Chapter, we present two techniques to improve Gang Scheduling policies by adopting the ideas of this Thesis. The first one, Performance- Driven Gang Scheduling,
More informationTECH TUTORIAL: EMBEDDING ANALYTICS INTO A DATABASE USING SOURCEPRO AND JMSL
TECH TUTORIAL: EMBEDDING ANALYTICS INTO A DATABASE USING SOURCEPRO AND JMSL This white paper describes how to implement embedded analytics within a database using SourcePro and the JMSL Numerical Library,
More informationData Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)
Data Exploration and Preprocessing Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationData Visualization. Christopher Simpkins chris.simpkins@gatech.edu
Data Visualization Christopher Simpkins chris.simpkins@gatech.edu Data Visualization Data visualization is an activity in the exploratory data analysis process in which we try to figure out what story
More informationIntroduction to MATLAB (Basics) Reference from: Azernikov Sergei mesergei@tx.technion.ac.il
Introduction to MATLAB (Basics) Reference from: Azernikov Sergei mesergei@tx.technion.ac.il MATLAB Basics Where to get help? 1) In MATLAB s prompt type: help, lookfor,helpwin, helpdesk, demos. 2) On the
More informationZabin Visram Room CS115 CS126 Searching. Binary Search
Zabin Visram Room CS115 CS126 Searching Binary Search Binary Search Sequential search is not efficient for large lists as it searches half the list, on average Another search algorithm Binary search Very
More informationPhysical Data Organization
Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor
More informationUsing these objects to view the process of the whole event from triggering waiting for processing until alarm stops. Define event content first.
Chapter 7 Event Log... 2 7.1 Event Log Management... 2 7.1.1 Excel Editing... 3 7.2 Create a New Event Log... 4 7.2.1 Alarm (Event) Log General Settings... 4 7.2.2 Alarm (Event) Log Message Settings...
More informationHow To Write A Data Processing Pipeline In R
New features and old concepts for handling large and streaming data in practice Simon Urbanek R Foundation Overview Motivation Custom connections Data processing pipelines Parallel processing Back-end
More informationDistributed Image Processing using Hadoop MapReduce framework. Binoy A Fernandez (200950006) Sameer Kumar (200950031)
using Hadoop MapReduce framework Binoy A Fernandez (200950006) Sameer Kumar (200950031) Objective To demonstrate how the hadoop mapreduce framework can be extended to work with image data for distributed
More informationCoordinate Plane, Slope, and Lines Long-Term Memory Review Review 1
Review. What does slope of a line mean?. How do you find the slope of a line? 4. Plot and label the points A (3, ) and B (, ). a. From point B to point A, by how much does the y-value change? b. From point
More informationVisualization of missing values using the R-package VIM
Institut f. Statistik u. Wahrscheinlichkeitstheorie 040 Wien, Wiedner Hauptstr. 8-0/07 AUSTRIA http://www.statistik.tuwien.ac.at Visualization of missing values using the R-package VIM M. Templ and P.
More informationR Graphics Cookbook. Chang O'REILLY. Winston. Tokyo. Beijing Cambridge. Farnham Koln Sebastopol
R Graphics Cookbook Winston Chang Beijing Cambridge Farnham Koln Sebastopol O'REILLY Tokyo Table of Contents Preface ix 1. R Basics 1 1.1. Installing a Package 1 1.2. Loading a Package 2 1.3. Loading a
More informationData Mining and Visualization
Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research
More informationProf. Nicolai Meinshausen Regression FS 2014. R Exercises
Prof. Nicolai Meinshausen Regression FS 2014 R Exercises 1. The goal of this exercise is to get acquainted with different abilities of the R statistical software. It is recommended to use the distributed
More informationCharts for SharePoint
KWizCom Corporation Charts for SharePoint Admin Guide Copyright 2005-2015 KWizCom Corporation. All rights reserved. Company Headquarters 95 Mural Street, Suite 600 Richmond Hill, ON L4B 3G2 Canada E-mail:
More informationG563 Quantitative Paleontology. SQL databases. An introduction. Department of Geological Sciences Indiana University. (c) 2012, P.
SQL databases An introduction AMP: Apache, mysql, PHP This installations installs the Apache webserver, the PHP scripting language, and the mysql database on your computer: Apache: runs in the background
More informationTable of Useful R commands
Table of Useful R commands Command Purpose help() Obtain documentation for a given R command example() c(), scan() seq() rep() data() View() str() read.csv(), read.table() library(), require() dim() length()
More informationBinary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *
Binary Heaps A binary heap is another data structure. It implements a priority queue. Priority Queue has the following operations: isempty add (with priority) remove (highest priority) peek (at highest
More informationGetting started with qplot
Chapter 2 Getting started with qplot 2.1 Introduction In this chapter, you will learn to make a wide variety of plots with your first ggplot2 function, qplot(), short for quick plot. qplot makes it easy
More informationA QUICK OVERVIEW OF THE OMNeT++ IDE
Introduction A QUICK OVERVIEW OF THE OMNeT++ IDE The OMNeT++ 4.x Integrated Development Environment is based on the Eclipse platform, and extends it with new editors, views, wizards, and additional functionality.
More informationData Mining with R. Decision Trees and Random Forests. Hugh Murrell
Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge
More informationsample median Sample quartiles sample deciles sample quantiles sample percentiles Exercise 1 five number summary # Create and view a sorted
Sample uartiles We have seen that the sample median of a data set {x 1, x, x,, x n }, sorted in increasing order, is a value that divides it in such a way, that exactly half (i.e., 50%) of the sample observations
More informationSTATGRAPHICS Online. Statistical Analysis and Data Visualization System. Revised 6/21/2012. Copyright 2012 by StatPoint Technologies, Inc.
STATGRAPHICS Online Statistical Analysis and Data Visualization System Revised 6/21/2012 Copyright 2012 by StatPoint Technologies, Inc. All rights reserved. Table of Contents Introduction... 1 Chapter
More informationVisualization of 2D Domains
Visualization of 2D Domains This part of the visualization package is intended to supply a simple graphical interface for 2- dimensional finite element data structures. Furthermore, it is used as the low
More informationIntroduction to Parallel Programming and MapReduce
Introduction to Parallel Programming and MapReduce Audience and Pre-Requisites This tutorial covers the basics of parallel programming and the MapReduce programming model. The pre-requisites are significant
More informationMachine Architecture and Number Systems. Major Computer Components. Schematic Diagram of a Computer. The CPU. The Bus. Main Memory.
1 Topics Machine Architecture and Number Systems Major Computer Components Bits, Bytes, and Words The Decimal Number System The Binary Number System Converting from Decimal to Binary Major Computer Components
More informationUCINET Visualization and Quantitative Analysis Tutorial
UCINET Visualization and Quantitative Analysis Tutorial Session 1 Network Visualization Session 2 Quantitative Techniques Page 2 An Overview of UCINET (6.437) Page 3 Transferring Data from Excel (From
More informationOverlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer
More informationThe ff package: Handling Large Data Sets in R with Memory Mapped Pages of Binary Flat Files
UseR! 2007, Iowa State University, Ames, August 8-108 2007 The ff package: Handling Large Data Sets in R with Memory Mapped Pages of Binary Flat Files D. Adler, O. Nenadić, W. Zucchini, C. Gläser Institute
More informationSection IV.1: Recursive Algorithms and Recursion Trees
Section IV.1: Recursive Algorithms and Recursion Trees Definition IV.1.1: A recursive algorithm is an algorithm that solves a problem by (1) reducing it to an instance of the same problem with smaller
More informationWhat s new in TIBCO Spotfire 6.5
What s new in TIBCO Spotfire 6.5 Contents Introduction... 3 TIBCO Spotfire Analyst... 3 Location Analytics... 3 Support for adding new map layer from WMS Server... 3 Map projections systems support...
More informationClustering. Chapter 7. 7.1 Introduction to Clustering Techniques. 7.1.1 Points, Spaces, and Distances
240 Chapter 7 Clustering Clustering is the process of examining a collection of points, and grouping the points into clusters according to some distance measure. The goal is that points in the same cluster
More informationAnalysis Tools and Libraries for BigData
+ Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I
More informationEach function call carries out a single task associated with drawing the graph.
Chapter 3 Graphics with R 3.1 Low-Level Graphics R has extensive facilities for producing graphs. There are both low- and high-level graphics facilities. The low-level graphics facilities provide basic
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationCS4410 - Fall 2008 Homework 2 Solution Due September 23, 11:59PM
CS4410 - Fall 2008 Homework 2 Solution Due September 23, 11:59PM Q1. Explain what goes wrong in the following version of Dekker s Algorithm: CSEnter(int i) inside[i] = true; while(inside[j]) inside[i]
More informationApproximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
More informationParallelization Strategies for Multicore Data Analysis
Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management
More informationCOMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING
COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING 2013/2014 1 st Semester Sample Exam January 2014 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc.
More informationMap-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers - dual core to 12+-core Shift to more concurrent programming paradigms and languages Erlang,
More informationChapter 8 Statistical Sonification for Exploratory Data Analysis
The Sonification Handbook Edited by Thomas Hermann, Andy Hunt, John G. Neuhoff Logos Publishing House, Berlin, Germany ISBN 978-3-8325-2819-5 2011, 586 pages Online: http://sonification.de/handbook Order:
More informationBig Data and Scripting. Part 4: Memory Hierarchies
1, Big Data and Scripting Part 4: Memory Hierarchies 2, Model and Definitions memory size: M machine words total storage (on disk) of N elements (N is very large) disk size unlimited (for our considerations)
More informationCell Phone Vibration Experiment
Objective Cell Phone Vibration Experiment Most cell phones are designed to vibrate. But at what frequency do they vibrate? With an accelerometer, data acquisition and signal analysis the vibration frequency
More informationImproved metrics collection and correlation for the CERN cloud storage test framework
Improved metrics collection and correlation for the CERN cloud storage test framework September 2013 Author: Carolina Lindqvist Supervisors: Maitane Zotes Seppo Heikkila CERN openlab Summer Student Report
More informationAffdex SDK for Windows!
Affdex SDK for Windows SDK Developer Guide 1 Introduction Affdex SDK is the culmination of years of scientific research into emotion detection, validated across thousands of tests worldwide on PC platforms,
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding
More informationAnalysis of System Performance IN2072 Chapter M Matlab Tutorial
Chair for Network Architectures and Services Prof. Carle Department of Computer Science TU München Analysis of System Performance IN2072 Chapter M Matlab Tutorial Dr. Alexander Klein Prof. Dr.-Ing. Georg
More informationServer Load Prediction
Server Load Prediction Suthee Chaidaroon (unsuthee@stanford.edu) Joon Yeong Kim (kim64@stanford.edu) Jonghan Seo (jonghan@stanford.edu) Abstract Estimating server load average is one of the methods that
More informationMini-project in TSRT04: Cell Phone Coverage
Mini-project in TSRT04: Cell hone Coverage 19 August 2015 1 roblem Formulation According to the study Swedes and Internet 2013 (Stiftelsen för Internetinfrastruktur), 99% of all Swedes in the age 12-45
More informationEMC Unisphere for VMAX Database Storage Analyzer
EMC Unisphere for VMAX Database Storage Analyzer Version 8.1.0 Online Help (PDF version) Copyright 2014-2015 EMC Corporation. All rights reserved. Published in USA. Published September, 2015 EMC believes
More informationGraphs. Exploratory data analysis. Graphs. Standard forms. A graph is a suitable way of representing data if:
Graphs Exploratory data analysis Dr. David Lucy d.lucy@lancaster.ac.uk Lancaster University A graph is a suitable way of representing data if: A line or area can represent the quantities in the data in
More informationA Locally Cache-Coherent Multiprocessor Architecture
A Locally Cache-Coherent Multiprocessor Architecture Kevin Rich Computing Research Group Lawrence Livermore National Laboratory Livermore, CA 94551 Norman Matloff Division of Computer Science University
More informationOnline Data Monitoring Framework Based on Histogram Packaging in Network Distributed Data Acquisition Systems
Online Data ing Framework Based on Histogram Packaging in Network Distributed Data Acquisition Systems Tomoyuki Konno 1, Anatael Cabrera 2 Masaki Ishitsuka 1, Masahiro Kuze 1, Yasunobu Sakamoto 3 CHEP2010@
More informationEFFICIENT JOB SCHEDULING OF VIRTUAL MACHINES IN CLOUD COMPUTING
EFFICIENT JOB SCHEDULING OF VIRTUAL MACHINES IN CLOUD COMPUTING Ranjana Saini 1, Indu 2 M.Tech Scholar, JCDM College of Engineering, CSE Department,Sirsa 1 Assistant Prof., CSE Department, JCDM College
More informationPackage MDM. February 19, 2015
Type Package Title Multinomial Diversity Model Version 1.3 Date 2013-06-28 Package MDM February 19, 2015 Author Glenn De'ath ; Code for mdm was adapted from multinom in the nnet package
More informationSYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis
SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October 17, 2015 Outline
More informationDELL. Virtual Desktop Infrastructure Study END-TO-END COMPUTING. Dell Enterprise Solutions Engineering
DELL Virtual Desktop Infrastructure Study END-TO-END COMPUTING Dell Enterprise Solutions Engineering 1 THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL
More informationSCALABILITY OF CONTEXTUAL GENERALIZATION PROCESSING USING PARTITIONING AND PARALLELIZATION. Marc-Olivier Briat, Jean-Luc Monnot, Edith M.
SCALABILITY OF CONTEXTUAL GENERALIZATION PROCESSING USING PARTITIONING AND PARALLELIZATION Abstract Marc-Olivier Briat, Jean-Luc Monnot, Edith M. Punt Esri, Redlands, California, USA mbriat@esri.com, jmonnot@esri.com,
More informationCitrix EdgeSight for Load Testing User s Guide. Citrx EdgeSight for Load Testing 2.7
Citrix EdgeSight for Load Testing User s Guide Citrx EdgeSight for Load Testing 2.7 Copyright Use of the product documented in this guide is subject to your prior acceptance of the End User License Agreement.
More informationApplication Notes "EPCF 1%' 1SJOU &OHJOF "11&
Application Notes Adobe PDF Print Engine (APPE) ErgoSoft AG Moosgrabenstr. CH-8595 Altnau, Switzerland 0 ErgoSoft AG, All rights reserved. The information contained in this manual is based on information
More informationCUDAMat: a CUDA-based matrix class for Python
Department of Computer Science 6 King s College Rd, Toronto University of Toronto M5S 3G4, Canada http://learning.cs.toronto.edu fax: +1 416 978 1455 November 25, 2009 UTML TR 2009 004 CUDAMat: a CUDA-based
More informationTime Series Analysis AMS 316
Time Series Analysis AMS 316 Programming language and software environment for data manipulation, calculation and graphical display. Originally created by Ross Ihaka and Robert Gentleman at University
More informationPackage bigrf. February 19, 2015
Version 0.1-11 Date 2014-05-16 Package bigrf February 19, 2015 Title Big Random Forests: Classification and Regression Forests for Large Data Sets Maintainer Aloysius Lim OS_type
More informationEngineering Problem Solving and Excel. EGN 1006 Introduction to Engineering
Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques
More informationWorking with Excel in Origin
Working with Excel in Origin Limitations When Working with Excel in Origin To plot your workbook data in Origin, you must have Excel version 7 (Microsoft Office 95) or later installed on your computer
More information