Applying Data Analysis to Big Data Benchmarks. Jazmine Olinger

Size: px
Start display at page:

Download "Applying Data Analysis to Big Data Benchmarks. Jazmine Olinger"

Transcription

1 Applying Data Analysis to Big Data Benchmarks Jazmine Olinger Abstract This paper describes finding accurate and fast ways to simulate Big Data benchmarks. Specifically, using the currently existing simulation project, Macsim, from the High Performance Architecture Lab at Georgia Tech, and finding ways to reduce simulation time on a benchmark by performing some analysis (using SimPoint) to identify critical points of the overall application, and modifying Macsim to simulate critical sections instead of the entire application. I used basic benchmarks to implement all of this research but the same idea applies to and would ideally work on Big Data benchmark or other computationally large applications. Goals of Project 1. To understand SimPoint 2. To find out if Simpoint is fast and accurate enough to simulate desired applications 3. To create an environment for testing applications with SimPoint results quickly Background Information k-means clustering (MathWorks, n.d.) k-means is a clustering algorithm which classifies a given data set through a given number of clusters. SimPoint generates many different clusterings with k-means and uses a set of criteria to select the best one for the purpose of simulation (a small number of well-defined clusters is desirable) k-means algorithm

2 1. Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids. 2. Assign each object to the group that has the closest centroid. 3. When all objects have been assigned, recalculate the positions of the K centroids. 4. Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated. SimPoint (Calder, n.d.) Simpoint is a simulation analysis tool that uses a statistical method to find ideal simulation points in an application. It uses a frequency vector profile of a program to perform k-means clustering and select the simulation points. After generating a frequency vector file of an application, running SimPoint on it will generate three meaningful output files: simpoint file: the vectors chosen as Simulation Points and their corresponding cluster numbers. weight file: a weight for each Simulation Point, and its corresponding cluster number. The weight is the proportion of the program s execution that the Simulation Point represents. label file: the final cluster labels and distance from cluster center of each vector Results SimPoint Result Data This result data is the comparison of the actual CPI at each point (from full Macsim run) compared against the CPI at only the points selected by Simpoint and multiplied by the respective weight given by Simpoint. The error

3 is very low for all but one (bzip2), which indicates that that particular program has patterns that do not lend themselves well to k-means clustering. Benchmark Macsim CPI Simpoint CPI Error bzip % gcc % lbm % mcf % The following graphs show the four benchmarks used, comparing the CPI recorded at each point from a full run (top graph) to the cluster each point is placed in. There is a very clear pattern in all but one (gcc) matching the changes in CPI to changes in cluster, as expected. (bzip2, error 9.66%) (lbm, error 0.68%)

4 (gcc, error 1.02%) (mcf, error 1.70%) SimPoint Sampler To utilize the results of SimPoint in a meaningful way, I developed the SimPoint Sampler, which uses Macsim and SimPoint results together to simulate applications quickly.mpoi Instead of running the entire program through Macsim, it uses the Simulation Points provided to switch between two modes, emulation mode and timing mode. By running in timing mode only on the blocks identified by SimPoint and running in emulation mode on all other blocks, it can simulate the entire application significantly faster than a full run of Macsim. The SimPoint sampler currently massively loses accuracy in the reported CPI when switching between modes. However if this problem within Macsim was fixed it would be a very fast and accurate way(ideally exactly as accurate as SimPoint) to simulate applications. Future Work Future work to be done on this project includes fixing the results from Macsim when switching modes, using this method on Big Data benchmarks, and trying other methods on Big Data benchmarks if this one does not work.

5 Bibliography Calder, B. (n.d.). MathWorks. (n.d.).

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

Why is Internal Audit so Hard?

Why is Internal Audit so Hard? Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

K-Means Clustering Tutorial

K-Means Clustering Tutorial K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July

More information

How To Make Visual Analytics With Big Data Visual

How To Make Visual Analytics With Big Data Visual Big-Data Visualization Customizing Computational Methods for Visual Analytics with Big Data Jaegul Choo and Haesun Park Georgia Tech O wing to the complexities and obscurities in large-scale datasets (

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

The Integration of SNORT with K-Means Clustering Algorithm to Detect New Attack

The Integration of SNORT with K-Means Clustering Algorithm to Detect New Attack The Integration of SNORT with K-Means Clustering Algorithm to Detect New Attack Asnita Hashim, University of Technology MARA, Malaysia April 14-15, 2011 The Integration of SNORT with K-Means Clustering

More information

DISCOVERING AND EXPLOITING PROGRAM PHASES

DISCOVERING AND EXPLOITING PROGRAM PHASES DISCOVERING AND EXPLOITING PROGRAM PHASES IN A SINGLE SECOND, A MODERN PROCESSOR CAN EXECUTE BILLIONS OF INSTRUCTIONS AND A PROGRAM S BEHAVIOR CAN CHANGE MANY TIMES. SOME PROGRAMS CHANGE BEHAVIOR DRASTICALLY,

More information

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python. Vikram Kamath Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

More information

W6.B.1. FAQs CS535 BIG DATA W6.B.3. 4. If the distance of the point is additionally less than the tight distance T 2, remove it from the original set

W6.B.1. FAQs CS535 BIG DATA W6.B.3. 4. If the distance of the point is additionally less than the tight distance T 2, remove it from the original set http://wwwcscolostateedu/~cs535 W6B W6B2 CS535 BIG DAA FAQs Please prepare for the last minute rush Store your output files safely Partial score will be given for the output from less than 50GB input Computer

More information

Runtime Hardware Reconfiguration using Machine Learning

Runtime Hardware Reconfiguration using Machine Learning Runtime Hardware Reconfiguration using Machine Learning Tanmay Gangwani University of Illinois, Urbana-Champaign gangwan2@illinois.edu Abstract Tailoring the machine hardware to varying needs of the software

More information

Introduction to Clustering

Introduction to Clustering Introduction to Clustering Yumi Kondo Student Seminar LSK301 Sep 25, 2010 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, 2010 1 / 36 Microarray Example N=65 P=1756 Yumi

More information

Cluster Analysis for Evaluating Trading Strategies 1

Cluster Analysis for Evaluating Trading Strategies 1 CONTRIBUTORS Jeff Bacidore Managing Director, Head of Algorithmic Trading, ITG, Inc. Jeff.Bacidore@itg.com +1.212.588.4327 Kathryn Berkow Quantitative Analyst, Algorithmic Trading, ITG, Inc. Kathryn.Berkow@itg.com

More information

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

More information

Statistical Databases and Registers with some datamining

Statistical Databases and Registers with some datamining Unsupervised learning - Statistical Databases and Registers with some datamining a course in Survey Methodology and O cial Statistics Pages in the book: 501-528 Department of Statistics Stockholm University

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report. Document Clustering. Meryem Uzun-Per Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

They can be obtained in HQJHQH format directly from the home page at: http://www.engene.cnb.uam.es/downloads/kobayashi.dat

They can be obtained in HQJHQH format directly from the home page at: http://www.engene.cnb.uam.es/downloads/kobayashi.dat HQJHQH70 *XLGHG7RXU This document contains a Guided Tour through the HQJHQH platform and it was created for training purposes with respect to the system options and analysis possibilities. It is not intended

More information

South East of Process Main Building / 1F. North East of Process Main Building / 1F. At 14:05 April 16, 2011. Sample not collected

South East of Process Main Building / 1F. North East of Process Main Building / 1F. At 14:05 April 16, 2011. Sample not collected At 14:05 April 16, 2011 At 13:55 April 16, 2011 At 14:20 April 16, 2011 ND ND 3.6E-01 ND ND 3.6E-01 1.3E-01 9.1E-02 5.0E-01 ND 3.7E-02 4.5E-01 ND ND 2.2E-02 ND 3.3E-02 4.5E-01 At 11:37 April 17, 2011 At

More information

Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

More information

Contents. Dedication List of Figures List of Tables. Acknowledgments

Contents. Dedication List of Figures List of Tables. Acknowledgments Contents Dedication List of Figures List of Tables Foreword Preface Acknowledgments v xiii xvii xix xxi xxv Part I Concepts and Techniques 1. INTRODUCTION 3 1 The Quest for Knowledge 3 2 Problem Description

More information

Compiler-Assisted Binary Parsing

Compiler-Assisted Binary Parsing Compiler-Assisted Binary Parsing Tugrul Ince tugrul@cs.umd.edu PD Week 2012 26 27 March 2012 Parsing Binary Files Binary analysis is common for o Performance modeling o Computer security o Maintenance

More information

K-means Clustering Technique on Search Engine Dataset using Data Mining Tool

K-means Clustering Technique on Search Engine Dataset using Data Mining Tool International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 6 (2013), pp. 505-510 International Research Publications House http://www. irphouse.com /ijict.htm K-means

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Clustering UE 141 Spring 2013

Clustering UE 141 Spring 2013 Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or

More information

A SYSTEM FOR CROWD ORIENTED EVENT DETECTION, TRACKING, AND SUMMARIZATION IN SOCIAL MEDIA

A SYSTEM FOR CROWD ORIENTED EVENT DETECTION, TRACKING, AND SUMMARIZATION IN SOCIAL MEDIA A SYSTEM FOR CROWD ORIENTED EVENT DETECTION, TRACKING, AND SUMMARIZATION IN SOCIAL MEDIA An Undergraduate Research Scholars Thesis by JASON M. BOLDEN Submitted to Honors and Undergraduate Research Texas

More information

Big Data Simulator version

Big Data Simulator version Big Data Simulator version User Manual Website: http://prof.ict.ac.cn/bigdatabench/simulatorversion/ Content 1 Motivation... 3 2 Methodology... 3 3 Architecture subset... 3 3.1 Microarchitectural Metric

More information

CLUSTERING FOR FORENSIC ANALYSIS

CLUSTERING FOR FORENSIC ANALYSIS IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 4, Apr 2014, 129-136 Impact Journals CLUSTERING FOR FORENSIC ANALYSIS

More information

Car Insurance. Jan Tomášek Štěpán Havránek Michal Pokorný

Car Insurance. Jan Tomášek Štěpán Havránek Michal Pokorný Car Insurance Jan Tomášek Štěpán Havránek Michal Pokorný Competition details Jan Tomášek Official text As a customer shops an insurance policy, he/she will receive a number of quotes with different coverage

More information

An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework

An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework Jakrarin Therdphapiyanak Dept. of Computer Engineering Chulalongkorn University

More information

Map-Reduce for Machine Learning on Multicore

Map-Reduce for Machine Learning on Multicore Map-Reduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers - dual core to 12+-core Shift to more concurrent programming paradigms and languages Erlang,

More information

Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2

Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2 Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data

More information

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

More information

Measuring Heart Rate

Measuring Heart Rate Name Date Hr Grade Level: 10 Points: 20 Measuring Heart Rate Course: Biology B, Chapter 27 & 28 Objective: Make a lab Benchmark: Biology DIRECTIONS: Use your Hypothetical Lab measuring heart rate of individuals

More information

Hadoop Operations Management for Big Data Clusters in Telecommunication Industry

Hadoop Operations Management for Big Data Clusters in Telecommunication Industry Hadoop Operations Management for Big Data Clusters in Telecommunication Industry N. Kamalraj Asst. Prof., Department of Computer Technology Dr. SNS Rajalakshmi College of Arts and Science Coimbatore-49

More information

Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

More information

The Methodology of Application Development for Hybrid Architectures

The Methodology of Application Development for Hybrid Architectures Computer Technology and Application 4 (2013) 543-547 D DAVID PUBLISHING The Methodology of Application Development for Hybrid Architectures Vladimir Orekhov, Alexander Bogdanov and Vladimir Gaiduchok Department

More information

Performance Metrics for Graph Mining Tasks

Performance Metrics for Graph Mining Tasks Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical

More information

WHITE PAPER AUTOMATED, REAL-TIME RISK ANALYSIS AND REMEDIATION

WHITE PAPER AUTOMATED, REAL-TIME RISK ANALYSIS AND REMEDIATION WHITE PAPER AUTOMATED, REAL-TIME RISK ANALYSIS AND REMEDIATION Table of Contents Executive Summary...3 Vulnerability Scanners Alone Are Not Enough...3 Real-Time Change Configuration Notification is the

More information

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

More information

Energy Efficient MapReduce

Energy Efficient MapReduce Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing

More information

Analytics on Big Data

Analytics on Big Data Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis

More information

Distributed Framework for Data Mining As a Service on Private Cloud

Distributed Framework for Data Mining As a Service on Private Cloud RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

PayLess: A Low Cost Network Monitoring Framework for Software Defined Networks

PayLess: A Low Cost Network Monitoring Framework for Software Defined Networks PayLess: A Low Cost Network Monitoring Framework for Software Defined Networks Shihabur R. Chowdhury, Md. Faizul Bari, Reaz Ahmed and Raouf Boutaba David R. Cheriton School of Computer Science, University

More information

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,

More information

Monday Morning Data Mining

Monday Morning Data Mining Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline: - data mining - IceCube - Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik

More information

HARNESSING BIG DATA WITHIN THE FEDERAL GOVERNMENT FINDINGS AND RECOMMENDATIONS OF ATARC S BIG DATA INNOVATION LAB DECEMBER, 2015

HARNESSING BIG DATA WITHIN THE FEDERAL GOVERNMENT FINDINGS AND RECOMMENDATIONS OF ATARC S BIG DATA INNOVATION LAB DECEMBER, 2015 HARNESSING BIG DATA WITHIN THE FEDERAL GOVERNMENT FINDINGS AND RECOMMENDATIONS OF ATARC S BIG DATA INNOVATION LAB DECEMBER, 2015 ATARC Big Data Innovation Lab Sponsors ATARC Big Data Innovation Lab Objective

More information

Tutorial Segmentation and Classification

Tutorial Segmentation and Classification MARKETING ENGINEERING FOR EXCEL TUTORIAL VERSION 1.0.8 Tutorial Segmentation and Classification Marketing Engineering for Excel is a Microsoft Excel add-in. The software runs from within Microsoft Excel

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

A Novel Approach for Network Traffic Summarization

A Novel Approach for Network Traffic Summarization A Novel Approach for Network Traffic Summarization Mohiuddin Ahmed, Abdun Naser Mahmood, Michael J. Maher School of Engineering and Information Technology, UNSW Canberra, ACT 2600, Australia, Mohiuddin.Ahmed@student.unsw.edu.au,A.Mahmood@unsw.edu.au,M.Maher@unsw.

More information

How To Test A Web Server

How To Test A Web Server Performance and Load Testing Part 1 Performance & Load Testing Basics Performance & Load Testing Basics Introduction to Performance Testing Difference between Performance, Load and Stress Testing Why Performance

More information

Glencoe. correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 3-3, 5-8 8-4, 8-7 1-6, 4-9

Glencoe. correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 3-3, 5-8 8-4, 8-7 1-6, 4-9 Glencoe correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 STANDARDS 6-8 Number and Operations (NO) Standard I. Understand numbers, ways of representing numbers, relationships among numbers,

More information

Project Report BIG-DATA CONTENT RETRIEVAL, STORAGE AND ANALYSIS FOUNDATIONS OF DATA-INTENSIVE COMPUTING. Masters in Computer Science

Project Report BIG-DATA CONTENT RETRIEVAL, STORAGE AND ANALYSIS FOUNDATIONS OF DATA-INTENSIVE COMPUTING. Masters in Computer Science Data Intensive Computing CSE 486/586 Project Report BIG-DATA CONTENT RETRIEVAL, STORAGE AND ANALYSIS FOUNDATIONS OF DATA-INTENSIVE COMPUTING Masters in Computer Science University at Buffalo Website: http://www.acsu.buffalo.edu/~mjalimin/

More information

Clustering of Documents for Forensic Analysis

Clustering of Documents for Forensic Analysis Clustering of Documents for Forensic Analysis Asst. Prof. Mrs. Mugdha Kirkire #1, Stanley George #2,RanaYogeeta #3,Vivek Shukla #4, Kumari Pinky #5 #1 GHRCEM, Wagholi, Pune,9975101287. #2,GHRCEM, Wagholi,

More information

Motion. Complete Table 1. Record all data to three decimal places (e.g., 4.000 or 6.325 or 0.000). Do not include units in your answer.

Motion. Complete Table 1. Record all data to three decimal places (e.g., 4.000 or 6.325 or 0.000). Do not include units in your answer. Labs for College Physics: Mechanics Worksheet Experiment 2-1 Motion As you work through the steps in the lab procedure, record your experimental values and the results on this worksheet. Use the exact

More information

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE

More information

Fast Matching of Binary Features

Fast Matching of Binary Features Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been

More information

Big Data Text Mining and Visualization. Anton Heijs

Big Data Text Mining and Visualization. Anton Heijs Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

On the Placement of Management and Control Functionality in Software Defined Networks

On the Placement of Management and Control Functionality in Software Defined Networks On the Placement of Management and Control Functionality in Software Defined Networks D.Tuncer et al. Department of Electronic & Electrical Engineering University College London, UK ManSDN/NfV 13 November

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

GRAPH MATCHING EQUIPMENT/MATERIALS

GRAPH MATCHING EQUIPMENT/MATERIALS GRAPH MATCHING LAB MECH 6.COMP. From Physics with Computers, Vernier Software & Technology, 2000. Mathematics Teacher, September, 1994. INTRODUCTION One of the most effective methods of describing motion

More information

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India

More information

Hadoop SNS. renren.com. Saturday, December 3, 11

Hadoop SNS. renren.com. Saturday, December 3, 11 Hadoop SNS renren.com Saturday, December 3, 11 2.2 190 40 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December

More information

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication Thomas Reilly Data Physics Corporation 1741 Technology Drive, Suite 260 San Jose, CA 95110 (408) 216-8440 This paper

More information

Assessing Measurement System Variation

Assessing Measurement System Variation Assessing Measurement System Variation Example 1: Fuel Injector Nozzle Diameters Problem A manufacturer of fuel injector nozzles installs a new digital measuring system. Investigators want to determine

More information

Building Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu

Building Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu Building Data Cubes and Mining Them Jelena Jovanovic Email: jeljov@fon.bg.ac.yu KDD Process KDD is an overall process of discovering useful knowledge from data. Data mining is a particular step in the

More information

Performance Impacts of Non-blocking Caches in Out-of-order Processors

Performance Impacts of Non-blocking Caches in Out-of-order Processors Performance Impacts of Non-blocking Caches in Out-of-order Processors Sheng Li; Ke Chen; Jay B. Brockman; Norman P. Jouppi HP Laboratories HPL-2011-65 Keyword(s): Non-blocking cache; MSHR; Out-of-order

More information

Prentice Hall: Middle School Math, Course 1 2002 Correlated to: New York Mathematics Learning Standards (Intermediate)

Prentice Hall: Middle School Math, Course 1 2002 Correlated to: New York Mathematics Learning Standards (Intermediate) New York Mathematics Learning Standards (Intermediate) Mathematical Reasoning Key Idea: Students use MATHEMATICAL REASONING to analyze mathematical situations, make conjectures, gather evidence, and construct

More information

Practical Introduction to Machine Learning and Optimization. Alessio Signorini <alessio.signorini@oneriot.com>

Practical Introduction to Machine Learning and Optimization. Alessio Signorini <alessio.signorini@oneriot.com> Practical Introduction to Machine Learning and Optimization Alessio Signorini Everyday's Optimizations Although you may not know, everybody uses daily some sort of optimization

More information

Anomaly Detection in Predictive Maintenance

Anomaly Detection in Predictive Maintenance Anomaly Detection in Predictive Maintenance Anomaly Detection with Time Series Analysis Phil Winters Iris Adae Rosaria Silipo Phil.Winters@knime.com Iris.Adae@uni-konstanz.de Rosaria.Silipo@knime.com Copyright

More information

Maschinelles Lernen mit MATLAB

Maschinelles Lernen mit MATLAB Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical

More information

Clustering Data Streams

Clustering Data Streams Clustering Data Streams Mohamed Elasmar Prashant Thiruvengadachari Javier Salinas Martin gtg091e@mail.gatech.edu tprashant@gmail.com javisal1@gatech.edu Introduction: Data mining is the science of extracting

More information

Analysis of MapReduce Algorithms

Analysis of MapReduce Algorithms Analysis of MapReduce Algorithms Harini Padmanaban Computer Science Department San Jose State University San Jose, CA 95192 408-924-1000 harini.gomadam@gmail.com ABSTRACT MapReduce is a programming model

More information

A fast multi-class SVM learning method for huge databases

A fast multi-class SVM learning method for huge databases www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,

More information

QoS-Aware Storage Virtualization for Cloud File Systems. Christoph Kleineweber (Speaker) Alexander Reinefeld Thorsten Schütt. Zuse Institute Berlin

QoS-Aware Storage Virtualization for Cloud File Systems. Christoph Kleineweber (Speaker) Alexander Reinefeld Thorsten Schütt. Zuse Institute Berlin QoS-Aware Storage Virtualization for Cloud File Systems Christoph Kleineweber (Speaker) Alexander Reinefeld Thorsten Schütt Zuse Institute Berlin 1 Outline Introduction Performance Models Reservation Scheduling

More information

Leveraging Ensemble Models in SAS Enterprise Miner

Leveraging Ensemble Models in SAS Enterprise Miner ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to

More information

Cluster analysis with SPSS: K-Means Cluster Analysis

Cluster analysis with SPSS: K-Means Cluster Analysis analysis with SPSS: K-Means Analysis analysis is a type of data classification carried out by separating the data into groups. The aim of cluster analysis is to categorize n objects in k (k>1) groups,

More information

Parameter inference of a basic p53 model using ABC

Parameter inference of a basic p53 model using ABC Parameter inference of a basic p53 model using ABC Eszter Lakatos and Michael Barclay Group meeting 29 th October 2014 p53 - ABC II. Eszter 1 / 10 Background Study p53 reaction to cellular stress on single

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Echtzeittesten mit MathWorks leicht gemacht Simulink Real-Time Tobias Kuschmider Applikationsingenieur

Echtzeittesten mit MathWorks leicht gemacht Simulink Real-Time Tobias Kuschmider Applikationsingenieur Echtzeittesten mit MathWorks leicht gemacht Simulink Real-Time Tobias Kuschmider Applikationsingenieur 2015 The MathWorks, Inc. 1 Model-Based Design Continuous Verification and Validation Requirements

More information

Distance Degree Sequences for Network Analysis

Distance Degree Sequences for Network Analysis Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation

More information

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

Quiz for Chapter 1 Computer Abstractions and Technology 3.10 Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,

More information

Clustering and mapper

Clustering and mapper June 17th, 2014 Overview Goal of talk Explain Mapper, which is the most widely used and most successful TDA technique. (At core of Ayasdi, TDA company founded by Gunnar Carlsson.) Basic idea: perform clustering

More information

MADlib. An open source library for in-database analytics. Hitoshi Harada PGCon 2012, May 17th

MADlib. An open source library for in-database analytics. Hitoshi Harada PGCon 2012, May 17th MADlib An open source library for in-database analytics Hitoshi Harada PGCon 2012, May 17th 1 Myself Window functions in 8.4 and 9.0 Help wcte work in 9.1 PL/v8 Other modules like twitter_fdw, tinyint

More information

RAVEN: A GUI and an Artificial Intelligence Engine in a Dynamic PRA Framework

RAVEN: A GUI and an Artificial Intelligence Engine in a Dynamic PRA Framework INL/CON-13-28360 PREPRINT RAVEN: A GUI and an Artificial Intelligence Engine in a Dynamic PRA Framework ANS Annual Meeting C. Rabiti D. Mandelli A. Alfonsi J. J. Cogliati R. Kinoshita D. Gaston R. Martineau

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Computer Science 146/246 Homework #3

Computer Science 146/246 Homework #3 Computer Science 146/246 Homework #3 Due 11:59 P.M. Sunday, April 12th, 2015 We played with a Pin-based cache simulator for Homework 2. This homework will prepare you to setup and run a detailed microarchitecture-level

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

Android Application Analyzer

Android Application Analyzer International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume. 1, Issue 4, August 2014, PP 32-37 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Android

More information

Learning is a very general term denoting the way in which agents:

Learning is a very general term denoting the way in which agents: What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

More information