Big Data and Scripting. Plotting in R

Size: px
Start display at page:

Download "Big Data and Scripting. Plotting in R"

Transcription

1 1, Big Data and Scripting Plotting in R

2 2, the art of plotting: first steps fundament of plotting in R: plot(x,y) plot some random values: plot(runif(10)) values are interpreted as y-values, x-values filled in as 1:10 plot a nx2 array of points in a scatterplot: plot(x) plot has a humongous amount of parameters with strange names pch - change point type (e.g. pch=20 gives points) cex - change point size col - change point color,...

3 3, a simple plotting example supply lists for point-wise settings example: data(iris) # load some flower data attach(iris) plot(iris, col=species) # plot the whole thing # plot specific axes plot(sepal.length, Sepal.Width, col=species) plot points in x, colored by species use rainbow() to create colors create individual colors with rgb() or gray()

4 4, setting parameters for plotting plot() accepts a number of parameters even more can be set using par() outer margins with mar=c(down, left,up,right) overplotting with new=t plot to certain areas fig=c(left,right,lower, upper) switch off axes with axes=f make your own with axis() some, but not all, of these parameters can be passed to plot() return value is a list with the old values of the changed parameters can be used to reset parameters to previous state

5 5, a more complicated plotting example data(iris)# get data attach(iris)# attach for easy access # plot petal width x height plot(petal.length, Petal.Width, col=species, pch=20) # make a small box on top with sepal values par=par(fig=c(0.6,0.9,0.18,0.48), new=t, mar=c(1,1,1,0)+0.1, cex=0.8) plot(sepal.length, Sepal.Width, col=species,pch=20, axes=f, main="sepal extensions") box()# make a box around the small plot par(par) # reset parameters detach(iris)

6 the resulting plot Petal.Length Petal.Width sepal extensions Sepal.Length Sepal.Width 6,

7 specialized plot functions many packages provide specialized plot functions for their results example: library(igraph) g=graph.star(15) plot(g) this uses the overriding mechanism for functions called dispatch not covered here, see stat.ethz.ch/r-manual/r-devel/library/methods/html/ Methods.html for detailed information 7,

8 8, plotting to files plotting to files is simple with file devices, example: pdf("plot.pdf");# open plot.pdf in current dir plot(1:5); # plot something dev.off(); # close device (and write file) devices can be opened, e.g. x11() opens a plotting window there is usually a currently active device if not, a plot window is created dev.off() closes the active device writes files to disk (for file devices) if possible, switches to the previously active device variants: x11(), pdf(), svg(), jpeg(),... besides file, individual parameters for each format (e.g. size for pdf, resolution for jpeg)

9 9, example: visualizing a distribution of networks c(0, 1) c(0, 1) c(0, 1) c(0, 1) c(0, 1) c(0, 1) c(0, 1) c(0, 1) 31.90x7.78 (65.31%)

10 example: visualizing a spectral distributions density ,

11 11, some useful plotting functions bars() create a bar plot hist() create a histogram of values and plot it points() add additional points lines() create lines connecting the given points grconvertx(), grconverty() convert between coordinate systems

12 Parallel Programming on a multi CPU System 12,

13 13, basic questions about the machine model execute algorithms on multiple CPUs (cores) CPUs load data from memory into their registers, compute something and write the results back to memory 1. do all cores have access to the same memory? yes: (following) PRAM-model (parallel random access memory) no: (later) distributed computing 2. concurrent access (reading/writing in parallel)? parallel reading: exclusive or concurrent parallel writing: if concurrent, which value stays? four different variants in the following, we allow concurrent reading and avoid concurrent writing

14 14, an example algorithm: summation problem given an array of numbers A[1],...,A[n] determine sum over all A[i] straightforward without parallel execution: O(n) speed up with more cores possible?

15 15, parallel summation: idea partition into smallest possible subproblems solve these in parallel combine the results again parallel continue until all values are combined

16 16, algorithm input: array A, # assume length(a)=n=2 h B=A;// B holds results on current level while(length(b)>1){// while intermediate results have to be combined T=array(length(B)/2) parallel for(i in 1:length(T)){ // execute in parallel T[i]=B[2*i]+B[2*i-1] // solve subproblem } B=T // advance to next level } return(b[1]) assumptions/preconditions: length of A is power of two (if not, pad with zeros) the + -operation is distributive, i.e. (a + b) + c = a + (b + c) approach works for every distributive operation

17 17, analysis memory: need additional array for current level number of operations: (length(a)=n = 2 h ) 2 h h = 2 h 1 O(n) no gain in comparison to sequential approach execution time on n/2 cores let one + -operation take O(f (n)) time and length(a)=n assume, copying B=A and B=T is done in parallel, too inner for-loop is executed in parallel time O(1) outer while-loop iterates levels of binary tree log 2 n levels total time consumption: O(f (n) log 2 n), for + O(log 2 n) note difference between number of operations and execution time

18 execution of n parallel processes on c cores our analysis assumed that there are n/2 cores available that s usually an unrealistic assumption instead: distribute parallel processes to as many cores as possible example for simple parallel execution on limited number of cores input: array of tasks: jobs, number of cores: cores executeparallel=function(jobs, cores){ i=1; while(i<length(jobs)){ parallel for(j in i:(i+cores-1)){ start(jobs[j]); } i=i+cores; } parallel for executes all iterations in parallel 18,

19 19, a more flexible parallelization approach (idea) assume operations depend on intermediate results created by other operations no simple systematic, but the more general case e.g. 2 depends on input from 3 and 1 8 can be executed, when 7 is finished, while 4 has in addition to wait for 5 and 2

20 19, a more flexible parallelization approach (idea) several possible execution orders optimal order depends on execution times simple strategy: 1. list of unoccupied cores 2. list of unfinished jobs, with number of unfinished dependencies 3. start unfinished jobs with no unfinished dependencies until all cores occupied 4. when job finishes: decrease number of unfinished dependencies on depending jobs 5. if not finished, repeat from 3

21 20, intermission: mapply new 1 apply variant mapply(fun,...) first argument is function to apply following arguments are vectors or lists to apply fun to calls fun for element i in all following lists if arguments are named, fun is called with named arguments >fun=function(a,b){paste(a,b,sep="-")} >mapply(fun,b=1:6,a=3:1); [1] "3-1" "2-2" "1-3" "3-4" "2-5" "1-6" naming of arguments makes order irrelevant shorter vectors are reused result: list of return values of fun 1 that s number 5

22 21, parallelization in R library parallel provides functions for parallel computations in particular: mcmapply() parallel mapply() mclapply() parallel lapply() execute functions for list elements in parallel important parameters: mc.cores - the max. number of CPU cores to use mc.preschedule decide job to core distribution at start or dynamically TRUE for many small and/or equal length jobs FALSE if jobs vary strongly in execution time

23 22, parallelize distributive functions as R code parallelaccumulate=function(f,a){ require(parallel); b=a; while(length(b)>1){ b=mclapply(1:(length(b)/2), function(i) return(f(b[[2*i]],b[[2*i-1]])); ); } return(b); } execution: plus=function(a,b) {a+b}; parallelaccumulate(plus,1:64); simple, but not very generic

24 23, parallelization of a function function in R parallelize=function(f){ par=function(b){ require(parallel); b=a; while(length(b)>1){ b=mclapply(1:(length(b)/2), function(i) return(f(b[[2*i]],b[[2*i-1]]))); } return(b[[1]]); } return(par); } execution: plus=function(a,b) {a+b}; psum=parallelize(plus); psum(1:64);

Linear Discriminant Analysis

Linear Discriminant Analysis Fiche TD avec le logiciel : course5 Linear Discriminant Analysis A.B. Dufour Contents 1 Fisher s iris dataset 2 2 The principle 5 2.1 Linking one variable and a factor.................. 5 2.2 Linking a

More information

Getting Started with R and RStudio 1

Getting Started with R and RStudio 1 Getting Started with R and RStudio 1 1 What is R? R is a system for statistical computation and graphics. It is the statistical system that is used in Mathematics 241, Engineering Statistics, for the following

More information

Viewing Ecological data using R graphics

Viewing Ecological data using R graphics Biostatistics Illustrations in Viewing Ecological data using R graphics A.B. Dufour & N. Pettorelli April 9, 2009 Presentation of the principal graphics dealing with discrete or continuous variables. Course

More information

Graphics in R. Biostatistics 615/815

Graphics in R. Biostatistics 615/815 Graphics in R Biostatistics 615/815 Last Lecture Introduction to R Programming Controlling Loops Defining your own functions Today Introduction to Graphics in R Examples of commonly used graphics functions

More information

Analysis of Binary Search algorithm and Selection Sort algorithm

Analysis of Binary Search algorithm and Selection Sort algorithm Analysis of Binary Search algorithm and Selection Sort algorithm In this section we shall take up two representative problems in computer science, work out the algorithms based on the best strategy to

More information

Package tagcloud. R topics documented: July 3, 2015

Package tagcloud. R topics documented: July 3, 2015 Package tagcloud July 3, 2015 Type Package Title Tag Clouds Version 0.6 Date 2015-07-02 Author January Weiner Maintainer January Weiner Description Generating Tag and Word Clouds.

More information

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode Iris Sample Data Set Basic Visualization Techniques: Charts, Graphs and Maps CS598 Information Visualization Spring 2010 Many of the exploratory data techniques are illustrated with the Iris Plant data

More information

Tutorial 2: Descriptive Statistics and Exploratory Data Analysis

Tutorial 2: Descriptive Statistics and Exploratory Data Analysis Tutorial 2: Descriptive Statistics and Exploratory Data Analysis Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 A very basic understanding of the R software environment is assumed.

More information

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Topics Exploratory Data Analysis Summary Statistics Visualization What is data exploration?

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

Advanced Statistical Methods in Insurance

Advanced Statistical Methods in Insurance Advanced Statistical Methods in Insurance 7. Multivariate Data All Pairwise Scattergrams Iris Data Set: 3 Species 50 Cases of each with p=4 measurements per case 2 Hudec & Schlögl 1 3-d Scatterplots iris[,

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 8/05/2005 1 What is data exploration? A preliminary

More information

Data-Intensive Applications on HPC Using Hadoop, Spark and RADICAL-Cybertools

Data-Intensive Applications on HPC Using Hadoop, Spark and RADICAL-Cybertools Data-Intensive Applications on HPC Using Hadoop, Spark and RADICAL-Cybertools Shantenu Jha, Andre Luckow, Ioannis Paraskevakos RADICAL, Rutgers, http://radical.rutgers.edu Agenda 1. Motivation and Background

More information

Data Visualization in R

Data Visualization in R Data Visualization in R L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 2014 Introduction Motivation for Data Visualization Humans are outstanding at detecting

More information

R Graphics II: Graphics for Exploratory Data Analysis

R Graphics II: Graphics for Exploratory Data Analysis UCLA Department of Statistics Statistical Consulting Center Irina Kukuyeva ikukuyeva@stat.ucla.edu April 26, 2010 Outline 1 Summary Plots 2 Time Series Plots 3 Geographical Plots 4 3D Plots 5 Simulation

More information

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3 COMP 5318 Data Exploration and Analysis Chapter 3 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping

More information

Visualizing class probability estimators

Visualizing class probability estimators Visualizing class probability estimators Eibe Frank and Mark Hall Department of Computer Science University of Waikato Hamilton, New Zealand {eibe, mhall}@cs.waikato.ac.nz Abstract. Inducing classifiers

More information

Chapter 13: Query Processing. Basic Steps in Query Processing

Chapter 13: Query Processing. Basic Steps in Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures Introductory Statistics Lectures Visualizing Data Descriptive Statistics I Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the

More information

Graphical Representation of Multivariate Data

Graphical Representation of Multivariate Data Graphical Representation of Multivariate Data One difficulty with multivariate data is their visualization, in particular when p > 3. At the very least, we can construct pairwise scatter plots of variables.

More information

Lecture 2: Exploratory Data Analysis with R

Lecture 2: Exploratory Data Analysis with R Lecture 2: Exploratory Data Analysis with R Last Time: 1. Introduction: Why use R? / Syllabus 2. R as calculator 3. Import/Export of datasets 4. Data structures 5. Getting help, adding packages 6. Homework

More information

Bernd Klaus, some input from Wolfgang Huber, EMBL

Bernd Klaus, some input from Wolfgang Huber, EMBL Exploratory Data Analysis and Graphics Bernd Klaus, some input from Wolfgang Huber, EMBL Graphics in R base graphics and ggplot2 (grammar of graphics) are commonly used to produce plots in R; in a nutshell:

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Paul Cohen ISTA 370 Spring, 2012 Paul Cohen ISTA 370 () Exploratory Data Analysis Spring, 2012 1 / 46 Outline Data, revisited The purpose of exploratory data analysis Learning

More information

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Why? A central concept in Computer Science. Algorithms are ubiquitous. Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

More information

Cluster Analysis using R

Cluster Analysis using R Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

CUDA Programming. Week 4. Shared memory and register

CUDA Programming. Week 4. Shared memory and register CUDA Programming Week 4. Shared memory and register Outline Shared memory and bank confliction Memory padding Register allocation Example of matrix-matrix multiplication Homework SHARED MEMORY AND BANK

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.

More information

Load Balancing in MapReduce Based on Scalable Cardinality Estimates

Load Balancing in MapReduce Based on Scalable Cardinality Estimates Load Balancing in MapReduce Based on Scalable Cardinality Estimates Benjamin Gufler 1, Nikolaus Augsten #, Angelika Reiser 3, Alfons Kemper 4 Technische Universität München Boltzmannstraße 3, 85748 Garching

More information

Operating Systems OBJECTIVES 7.1 DEFINITION. Chapter 7. Note:

Operating Systems OBJECTIVES 7.1 DEFINITION. Chapter 7. Note: Chapter 7 OBJECTIVES Operating Systems Define the purpose and functions of an operating system. Understand the components of an operating system. Understand the concept of virtual memory. Understand the

More information

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2) Sorting revisited How did we use a binary search tree to sort an array of elements? Tree Sort Algorithm Given: An array of elements to sort 1. Build a binary search tree out of the elements 2. Traverse

More information

Contributions to Gang Scheduling

Contributions to Gang Scheduling CHAPTER 7 Contributions to Gang Scheduling In this Chapter, we present two techniques to improve Gang Scheduling policies by adopting the ideas of this Thesis. The first one, Performance- Driven Gang Scheduling,

More information

TECH TUTORIAL: EMBEDDING ANALYTICS INTO A DATABASE USING SOURCEPRO AND JMSL

TECH TUTORIAL: EMBEDDING ANALYTICS INTO A DATABASE USING SOURCEPRO AND JMSL TECH TUTORIAL: EMBEDDING ANALYTICS INTO A DATABASE USING SOURCEPRO AND JMSL This white paper describes how to implement embedded analytics within a database using SourcePro and the JMSL Numerical Library,

More information

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) Data Exploration and Preprocessing Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Data Visualization. Christopher Simpkins chris.simpkins@gatech.edu

Data Visualization. Christopher Simpkins chris.simpkins@gatech.edu Data Visualization Christopher Simpkins chris.simpkins@gatech.edu Data Visualization Data visualization is an activity in the exploratory data analysis process in which we try to figure out what story

More information

Introduction to MATLAB (Basics) Reference from: Azernikov Sergei mesergei@tx.technion.ac.il

Introduction to MATLAB (Basics) Reference from: Azernikov Sergei mesergei@tx.technion.ac.il Introduction to MATLAB (Basics) Reference from: Azernikov Sergei mesergei@tx.technion.ac.il MATLAB Basics Where to get help? 1) In MATLAB s prompt type: help, lookfor,helpwin, helpdesk, demos. 2) On the

More information

Zabin Visram Room CS115 CS126 Searching. Binary Search

Zabin Visram Room CS115 CS126 Searching. Binary Search Zabin Visram Room CS115 CS126 Searching Binary Search Binary Search Sequential search is not efficient for large lists as it searches half the list, on average Another search algorithm Binary search Very

More information

Physical Data Organization

Physical Data Organization Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor

More information

Using these objects to view the process of the whole event from triggering waiting for processing until alarm stops. Define event content first.

Using these objects to view the process of the whole event from triggering waiting for processing until alarm stops. Define event content first. Chapter 7 Event Log... 2 7.1 Event Log Management... 2 7.1.1 Excel Editing... 3 7.2 Create a New Event Log... 4 7.2.1 Alarm (Event) Log General Settings... 4 7.2.2 Alarm (Event) Log Message Settings...

More information

How To Write A Data Processing Pipeline In R

How To Write A Data Processing Pipeline In R New features and old concepts for handling large and streaming data in practice Simon Urbanek R Foundation Overview Motivation Custom connections Data processing pipelines Parallel processing Back-end

More information

Distributed Image Processing using Hadoop MapReduce framework. Binoy A Fernandez (200950006) Sameer Kumar (200950031)

Distributed Image Processing using Hadoop MapReduce framework. Binoy A Fernandez (200950006) Sameer Kumar (200950031) using Hadoop MapReduce framework Binoy A Fernandez (200950006) Sameer Kumar (200950031) Objective To demonstrate how the hadoop mapreduce framework can be extended to work with image data for distributed

More information

Coordinate Plane, Slope, and Lines Long-Term Memory Review Review 1

Coordinate Plane, Slope, and Lines Long-Term Memory Review Review 1 Review. What does slope of a line mean?. How do you find the slope of a line? 4. Plot and label the points A (3, ) and B (, ). a. From point B to point A, by how much does the y-value change? b. From point

More information

Visualization of missing values using the R-package VIM

Visualization of missing values using the R-package VIM Institut f. Statistik u. Wahrscheinlichkeitstheorie 040 Wien, Wiedner Hauptstr. 8-0/07 AUSTRIA http://www.statistik.tuwien.ac.at Visualization of missing values using the R-package VIM M. Templ and P.

More information

R Graphics Cookbook. Chang O'REILLY. Winston. Tokyo. Beijing Cambridge. Farnham Koln Sebastopol

R Graphics Cookbook. Chang O'REILLY. Winston. Tokyo. Beijing Cambridge. Farnham Koln Sebastopol R Graphics Cookbook Winston Chang Beijing Cambridge Farnham Koln Sebastopol O'REILLY Tokyo Table of Contents Preface ix 1. R Basics 1 1.1. Installing a Package 1 1.2. Loading a Package 2 1.3. Loading a

More information

Data Mining and Visualization

Data Mining and Visualization Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research

More information

Prof. Nicolai Meinshausen Regression FS 2014. R Exercises

Prof. Nicolai Meinshausen Regression FS 2014. R Exercises Prof. Nicolai Meinshausen Regression FS 2014 R Exercises 1. The goal of this exercise is to get acquainted with different abilities of the R statistical software. It is recommended to use the distributed

More information

Charts for SharePoint

Charts for SharePoint KWizCom Corporation Charts for SharePoint Admin Guide Copyright 2005-2015 KWizCom Corporation. All rights reserved. Company Headquarters 95 Mural Street, Suite 600 Richmond Hill, ON L4B 3G2 Canada E-mail:

More information

G563 Quantitative Paleontology. SQL databases. An introduction. Department of Geological Sciences Indiana University. (c) 2012, P.

G563 Quantitative Paleontology. SQL databases. An introduction. Department of Geological Sciences Indiana University. (c) 2012, P. SQL databases An introduction AMP: Apache, mysql, PHP This installations installs the Apache webserver, the PHP scripting language, and the mysql database on your computer: Apache: runs in the background

More information

Table of Useful R commands

Table of Useful R commands Table of Useful R commands Command Purpose help() Obtain documentation for a given R command example() c(), scan() seq() rep() data() View() str() read.csv(), read.table() library(), require() dim() length()

More information

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * * Binary Heaps A binary heap is another data structure. It implements a priority queue. Priority Queue has the following operations: isempty add (with priority) remove (highest priority) peek (at highest

More information

Getting started with qplot

Getting started with qplot Chapter 2 Getting started with qplot 2.1 Introduction In this chapter, you will learn to make a wide variety of plots with your first ggplot2 function, qplot(), short for quick plot. qplot makes it easy

More information

A QUICK OVERVIEW OF THE OMNeT++ IDE

A QUICK OVERVIEW OF THE OMNeT++ IDE Introduction A QUICK OVERVIEW OF THE OMNeT++ IDE The OMNeT++ 4.x Integrated Development Environment is based on the Eclipse platform, and extends it with new editors, views, wizards, and additional functionality.

More information

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge

More information

sample median Sample quartiles sample deciles sample quantiles sample percentiles Exercise 1 five number summary # Create and view a sorted

sample median Sample quartiles sample deciles sample quantiles sample percentiles Exercise 1 five number summary # Create and view a sorted Sample uartiles We have seen that the sample median of a data set {x 1, x, x,, x n }, sorted in increasing order, is a value that divides it in such a way, that exactly half (i.e., 50%) of the sample observations

More information

STATGRAPHICS Online. Statistical Analysis and Data Visualization System. Revised 6/21/2012. Copyright 2012 by StatPoint Technologies, Inc.

STATGRAPHICS Online. Statistical Analysis and Data Visualization System. Revised 6/21/2012. Copyright 2012 by StatPoint Technologies, Inc. STATGRAPHICS Online Statistical Analysis and Data Visualization System Revised 6/21/2012 Copyright 2012 by StatPoint Technologies, Inc. All rights reserved. Table of Contents Introduction... 1 Chapter

More information

Visualization of 2D Domains

Visualization of 2D Domains Visualization of 2D Domains This part of the visualization package is intended to supply a simple graphical interface for 2- dimensional finite element data structures. Furthermore, it is used as the low

More information

Introduction to Parallel Programming and MapReduce

Introduction to Parallel Programming and MapReduce Introduction to Parallel Programming and MapReduce Audience and Pre-Requisites This tutorial covers the basics of parallel programming and the MapReduce programming model. The pre-requisites are significant

More information

Machine Architecture and Number Systems. Major Computer Components. Schematic Diagram of a Computer. The CPU. The Bus. Main Memory.

Machine Architecture and Number Systems. Major Computer Components. Schematic Diagram of a Computer. The CPU. The Bus. Main Memory. 1 Topics Machine Architecture and Number Systems Major Computer Components Bits, Bytes, and Words The Decimal Number System The Binary Number System Converting from Decimal to Binary Major Computer Components

More information

UCINET Visualization and Quantitative Analysis Tutorial

UCINET Visualization and Quantitative Analysis Tutorial UCINET Visualization and Quantitative Analysis Tutorial Session 1 Network Visualization Session 2 Quantitative Techniques Page 2 An Overview of UCINET (6.437) Page 3 Transferring Data from Excel (From

More information

Overlapping Data Transfer With Application Execution on Clusters

Overlapping Data Transfer With Application Execution on Clusters Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer

More information

The ff package: Handling Large Data Sets in R with Memory Mapped Pages of Binary Flat Files

The ff package: Handling Large Data Sets in R with Memory Mapped Pages of Binary Flat Files UseR! 2007, Iowa State University, Ames, August 8-108 2007 The ff package: Handling Large Data Sets in R with Memory Mapped Pages of Binary Flat Files D. Adler, O. Nenadić, W. Zucchini, C. Gläser Institute

More information

Section IV.1: Recursive Algorithms and Recursion Trees

Section IV.1: Recursive Algorithms and Recursion Trees Section IV.1: Recursive Algorithms and Recursion Trees Definition IV.1.1: A recursive algorithm is an algorithm that solves a problem by (1) reducing it to an instance of the same problem with smaller

More information

What s new in TIBCO Spotfire 6.5

What s new in TIBCO Spotfire 6.5 What s new in TIBCO Spotfire 6.5 Contents Introduction... 3 TIBCO Spotfire Analyst... 3 Location Analytics... 3 Support for adding new map layer from WMS Server... 3 Map projections systems support...

More information

Clustering. Chapter 7. 7.1 Introduction to Clustering Techniques. 7.1.1 Points, Spaces, and Distances

Clustering. Chapter 7. 7.1 Introduction to Clustering Techniques. 7.1.1 Points, Spaces, and Distances 240 Chapter 7 Clustering Clustering is the process of examining a collection of points, and grouping the points into clusters according to some distance measure. The goal is that points in the same cluster

More information

Analysis Tools and Libraries for BigData

Analysis Tools and Libraries for BigData + Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I

More information

Each function call carries out a single task associated with drawing the graph.

Each function call carries out a single task associated with drawing the graph. Chapter 3 Graphics with R 3.1 Low-Level Graphics R has extensive facilities for producing graphs. There are both low- and high-level graphics facilities. The low-level graphics facilities provide basic

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

CS4410 - Fall 2008 Homework 2 Solution Due September 23, 11:59PM

CS4410 - Fall 2008 Homework 2 Solution Due September 23, 11:59PM CS4410 - Fall 2008 Homework 2 Solution Due September 23, 11:59PM Q1. Explain what goes wrong in the following version of Dekker s Algorithm: CSEnter(int i) inside[i] = true; while(inside[j]) inside[i]

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

Parallelization Strategies for Multicore Data Analysis

Parallelization Strategies for Multicore Data Analysis Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management

More information

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING 2013/2014 1 st Semester Sample Exam January 2014 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc.

More information

Map-Reduce for Machine Learning on Multicore

Map-Reduce for Machine Learning on Multicore Map-Reduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers - dual core to 12+-core Shift to more concurrent programming paradigms and languages Erlang,

More information

Chapter 8 Statistical Sonification for Exploratory Data Analysis

Chapter 8 Statistical Sonification for Exploratory Data Analysis The Sonification Handbook Edited by Thomas Hermann, Andy Hunt, John G. Neuhoff Logos Publishing House, Berlin, Germany ISBN 978-3-8325-2819-5 2011, 586 pages Online: http://sonification.de/handbook Order:

More information

Big Data and Scripting. Part 4: Memory Hierarchies

Big Data and Scripting. Part 4: Memory Hierarchies 1, Big Data and Scripting Part 4: Memory Hierarchies 2, Model and Definitions memory size: M machine words total storage (on disk) of N elements (N is very large) disk size unlimited (for our considerations)

More information

Cell Phone Vibration Experiment

Cell Phone Vibration Experiment Objective Cell Phone Vibration Experiment Most cell phones are designed to vibrate. But at what frequency do they vibrate? With an accelerometer, data acquisition and signal analysis the vibration frequency

More information

Improved metrics collection and correlation for the CERN cloud storage test framework

Improved metrics collection and correlation for the CERN cloud storage test framework Improved metrics collection and correlation for the CERN cloud storage test framework September 2013 Author: Carolina Lindqvist Supervisors: Maitane Zotes Seppo Heikkila CERN openlab Summer Student Report

More information

Affdex SDK for Windows!

Affdex SDK for Windows! Affdex SDK for Windows SDK Developer Guide 1 Introduction Affdex SDK is the culmination of years of scientific research into emotion detection, validated across thousands of tests worldwide on PC platforms,

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

Analysis of System Performance IN2072 Chapter M Matlab Tutorial

Analysis of System Performance IN2072 Chapter M Matlab Tutorial Chair for Network Architectures and Services Prof. Carle Department of Computer Science TU München Analysis of System Performance IN2072 Chapter M Matlab Tutorial Dr. Alexander Klein Prof. Dr.-Ing. Georg

More information

Server Load Prediction

Server Load Prediction Server Load Prediction Suthee Chaidaroon (unsuthee@stanford.edu) Joon Yeong Kim (kim64@stanford.edu) Jonghan Seo (jonghan@stanford.edu) Abstract Estimating server load average is one of the methods that

More information

Mini-project in TSRT04: Cell Phone Coverage

Mini-project in TSRT04: Cell Phone Coverage Mini-project in TSRT04: Cell hone Coverage 19 August 2015 1 roblem Formulation According to the study Swedes and Internet 2013 (Stiftelsen för Internetinfrastruktur), 99% of all Swedes in the age 12-45

More information

EMC Unisphere for VMAX Database Storage Analyzer

EMC Unisphere for VMAX Database Storage Analyzer EMC Unisphere for VMAX Database Storage Analyzer Version 8.1.0 Online Help (PDF version) Copyright 2014-2015 EMC Corporation. All rights reserved. Published in USA. Published September, 2015 EMC believes

More information

Graphs. Exploratory data analysis. Graphs. Standard forms. A graph is a suitable way of representing data if:

Graphs. Exploratory data analysis. Graphs. Standard forms. A graph is a suitable way of representing data if: Graphs Exploratory data analysis Dr. David Lucy d.lucy@lancaster.ac.uk Lancaster University A graph is a suitable way of representing data if: A line or area can represent the quantities in the data in

More information

A Locally Cache-Coherent Multiprocessor Architecture

A Locally Cache-Coherent Multiprocessor Architecture A Locally Cache-Coherent Multiprocessor Architecture Kevin Rich Computing Research Group Lawrence Livermore National Laboratory Livermore, CA 94551 Norman Matloff Division of Computer Science University

More information

Online Data Monitoring Framework Based on Histogram Packaging in Network Distributed Data Acquisition Systems

Online Data Monitoring Framework Based on Histogram Packaging in Network Distributed Data Acquisition Systems Online Data ing Framework Based on Histogram Packaging in Network Distributed Data Acquisition Systems Tomoyuki Konno 1, Anatael Cabrera 2 Masaki Ishitsuka 1, Masahiro Kuze 1, Yasunobu Sakamoto 3 CHEP2010@

More information

EFFICIENT JOB SCHEDULING OF VIRTUAL MACHINES IN CLOUD COMPUTING

EFFICIENT JOB SCHEDULING OF VIRTUAL MACHINES IN CLOUD COMPUTING EFFICIENT JOB SCHEDULING OF VIRTUAL MACHINES IN CLOUD COMPUTING Ranjana Saini 1, Indu 2 M.Tech Scholar, JCDM College of Engineering, CSE Department,Sirsa 1 Assistant Prof., CSE Department, JCDM College

More information

Package MDM. February 19, 2015

Package MDM. February 19, 2015 Type Package Title Multinomial Diversity Model Version 1.3 Date 2013-06-28 Package MDM February 19, 2015 Author Glenn De'ath ; Code for mdm was adapted from multinom in the nnet package

More information

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October 17, 2015 Outline

More information

DELL. Virtual Desktop Infrastructure Study END-TO-END COMPUTING. Dell Enterprise Solutions Engineering

DELL. Virtual Desktop Infrastructure Study END-TO-END COMPUTING. Dell Enterprise Solutions Engineering DELL Virtual Desktop Infrastructure Study END-TO-END COMPUTING Dell Enterprise Solutions Engineering 1 THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL

More information

SCALABILITY OF CONTEXTUAL GENERALIZATION PROCESSING USING PARTITIONING AND PARALLELIZATION. Marc-Olivier Briat, Jean-Luc Monnot, Edith M.

SCALABILITY OF CONTEXTUAL GENERALIZATION PROCESSING USING PARTITIONING AND PARALLELIZATION. Marc-Olivier Briat, Jean-Luc Monnot, Edith M. SCALABILITY OF CONTEXTUAL GENERALIZATION PROCESSING USING PARTITIONING AND PARALLELIZATION Abstract Marc-Olivier Briat, Jean-Luc Monnot, Edith M. Punt Esri, Redlands, California, USA mbriat@esri.com, jmonnot@esri.com,

More information

Citrix EdgeSight for Load Testing User s Guide. Citrx EdgeSight for Load Testing 2.7

Citrix EdgeSight for Load Testing User s Guide. Citrx EdgeSight for Load Testing 2.7 Citrix EdgeSight for Load Testing User s Guide Citrx EdgeSight for Load Testing 2.7 Copyright Use of the product documented in this guide is subject to your prior acceptance of the End User License Agreement.

More information

Application Notes "EPCF 1%' 1SJOU &OHJOF "11&

Application Notes EPCF 1%' 1SJOU &OHJOF 11& Application Notes Adobe PDF Print Engine (APPE) ErgoSoft AG Moosgrabenstr. CH-8595 Altnau, Switzerland 0 ErgoSoft AG, All rights reserved. The information contained in this manual is based on information

More information

CUDAMat: a CUDA-based matrix class for Python

CUDAMat: a CUDA-based matrix class for Python Department of Computer Science 6 King s College Rd, Toronto University of Toronto M5S 3G4, Canada http://learning.cs.toronto.edu fax: +1 416 978 1455 November 25, 2009 UTML TR 2009 004 CUDAMat: a CUDA-based

More information

Time Series Analysis AMS 316

Time Series Analysis AMS 316 Time Series Analysis AMS 316 Programming language and software environment for data manipulation, calculation and graphical display. Originally created by Ross Ihaka and Robert Gentleman at University

More information

Package bigrf. February 19, 2015

Package bigrf. February 19, 2015 Version 0.1-11 Date 2014-05-16 Package bigrf February 19, 2015 Title Big Random Forests: Classification and Regression Forests for Large Data Sets Maintainer Aloysius Lim OS_type

More information

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques

More information

Working with Excel in Origin

Working with Excel in Origin Working with Excel in Origin Limitations When Working with Excel in Origin To plot your workbook data in Origin, you must have Excel version 7 (Microsoft Office 95) or later installed on your computer

More information