Server Load Prediction

Similar documents
WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Predict Influencers in the Social Network

Beating the NCAA Football Point Spread

Knowledge Discovery from patents using KMX Text Analytics

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Programming Exercise 3: Multi-class Classification and Neural Networks

The Scientific Data Mining Process

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Active Learning SVM for Blogs recommendation

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

MTH 140 Statistics Videos

STATISTICA Formula Guide: Logistic Regression. Table of Contents

The Data Mining Process

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

Predicting Flight Delays

Bootstrapping Big Data

A Logistic Regression Approach to Ad Click Prediction

Azure Machine Learning, SQL Data Mining and R

Data Mining. Nonlinear Classification

Diagrams and Graphs of Statistical Data

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Our Raison d'être. Identify major choice decision points. Leverage Analytical Tools and Techniques to solve problems hindering these decision points

Capacity Planning. Capacity Planning Process

Multiple Linear Regression in Data Mining

JetBlue Airways Stock Price Analysis and Prediction

Smart Queue Scheduling for QoS Spring 2001 Final Report

Drugs store sales forecast using Machine Learning

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Analysis Tools and Libraries for BigData

Pearson's Correlation Tests

Lasso on Categorical Data

CAS CS 565, Data Mining

ANALYTICS IN BIG DATA ERA

This paper is directed to small business owners desiring to use. analytical algorithms in order to improve sales, reduce attrition rates raise

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Predict the Popularity of YouTube Videos Using Early View Data

Machine Learning Final Project Spam Filtering

Spreadsheet software for linear regression analysis

How to Win at the Track

Using News Articles to Predict Stock Price Movements

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Environmental Remote Sensing GEOG 2021

Designing a learning system

Dynamic Predictive Modeling in Claims Management - Is it a Game Changer?

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

A Simple Introduction to Support Vector Machines

Big Data para comprender la Dinámica Humana

The Melvyl Recommender Project: Final Report

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Online Learning Theory

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Data Mining: An Overview. David Madigan

Applying Statistics Recommended by Regulatory Documents

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

Music Mood Classification

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Statistical Models in Data Mining

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Projects Involving Statistics (& SPSS)

Easily Identify Your Best Customers

Development of Monitoring and Analysis Tools for the Huawei Cloud Storage

Chapter 6. The stacking ensemble approach

THE PROCESS CAPABILITY ANALYSIS - A TOOL FOR PROCESS PERFORMANCE MEASURES AND METRICS - A CASE STUDY

Detecting Corporate Fraud: An Application of Machine Learning

Data Mining Lab 5: Introduction to Neural Networks

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

MASCOT Search Results Interpretation

The PageRank Citation Ranking: Bring Order to the Web

IBM SPSS Neural Networks 22

Data Mining - Evaluation of Classifiers

How To Use Statgraphics Centurion Xvii (Version 17) On A Computer Or A Computer (For Free)

Fixed Price Website Load Testing

Four of the precomputed option rankings are based on implied volatility. Two are based on statistical (historical) volatility :

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Multiple Kernel Learning on the Limit Order Book

Straightening Data in a Scatterplot Selecting a Good Re-Expression Model

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Introduction to Pattern Recognition

Analysis of Communication Patterns in Network Flows to Discover Application Intent

Prism 6 Step-by-Step Example Linear Standard Curves Interpolating from a standard curve is a common way of quantifying the concentration of a sample.

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

USE OF ARIMA TIME SERIES AND REGRESSORS TO FORECAST THE SALE OF ELECTRICITY


Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

APPENDIX E THE ASSESSMENT PHASE OF THE DATA LIFE CYCLE

Learning from Diversity

Search Engines. Stephen Shaw 18th of February, Netsoc

Car Insurance. Prvák, Tomi, Havri

Classification of Bad Accounts in Credit Card Industry

Transcription:

Server Load Prediction Suthee Chaidaroon (unsuthee@stanford.edu) Joon Yeong Kim (kim64@stanford.edu) Jonghan Seo (jonghan@stanford.edu) Abstract Estimating server load average is one of the methods that can be used to reduce the cost of renting a cloud computing service. By making an intelligent guess of how busy a server would be in the near future, we can scale up or down computing requirements based on this information. In this experiment we have collected website access and server uptime logs from a web application company's website and extracted information from these data. We are able to build a prediction model of server load average by applying a linear regression, locally weight linear regression, multinomial logistic regression, and support vector regression. Our experiment shows a trade-off between computing efficiency and prediction accuracy among our selected learning algorithms. The locally weight linear regression has the most accurate prediction of server load average but takes a long time to converge. On the other hand, the support vector regression converges very quickly but performs poorly on sparse training samples. At the end of this paper, we will discuss on improvements we can do for future experiments. Introduction Our clients operate a website and rent the cloud servers of Amazon EC2. The EC2 allows them to scale the computing capacity up and down as they need and they only pay for the computing capacity that their website actually uses. This task is tedious and can be automated. We think that there should be a way to automatically this task without having clients to monitor their server activities all the time. Our solution is to estimate a server load average based on the website users' activities. We assume that the number of activities and transactions determines a server load average. We use this assumption to build a prediction model. Methodology We collected website access logs and server uptime logs during the past few months and processed those two files by merging transactions and load average that occur in the same time interval. We then use this processed file as our training samples. Each line of a training sample file has a time information, number of unique IP addresses, the total size of contents, protocol, and the load average of the last 1, 5, and 15 minutes. Below are parts of access log and uptime files: 99.63.73.46 - - [24/Oct/2009:00:00:05 +0000] "GET /cgi-bin/uptime.cgi HTTP/1.0" 200 71 99.63.73.46 - - [24/Oct/2009:00:01:21 +0000] "GET /logs/error_log HTTP/1.1" 200 3259 99.63.73.46 - - [24/Oct/2009:00:01:31 +0000] "GET /logs/error_log HTTP/1.1" 304-99.63.73.46 - - [24/Oct/2009:00:05:04 +0000] "GET /cgi-bin/uptime.cgi HTTP/1.0" 200 71 99.63.73.46 - - [24/Oct/2009:00:07:00 +0000] "GET /crossdomain.xml HTTP/1.1" 200 202 99.63.73.46 - - [24/Oct/2009:00:07:00 +0000] "GET /m/s/248/f/1280/f248-1280-6.bpk HTTP/1.1" 200 680 99.63.73.46 - - [24/Oct/2009:00:07:00 +0000] "GET /m/s/271/f/1301/f271-1301-6.bpk HTTP/1.1" 200 23000 Table 1. Part of access log 1 Mon Oct 12 15:50:02 PDT 2009:: 18:50:02 up 95 days, 27 min, 0 users, load average: 0.23, 0.06, 0.02 Mon Oct 12 15:55:01 PDT 2009:: 18:55:02 up 95 days, 32 min, 1 user, load average: 0.00, 0.02, 0.00 Mon Oct 12 16:00:02 PDT 2009:: 19:00:03 up 95 days, 37 min, 1 user, load average: 0.00, 0.00, 0.00 Mon Oct 12 16:05:06 PDT 2009:: 19:05:07 up 95 days, 42 min, 1 user, load average: 0.00, 0.00, 0.00 Mon Oct 12 16:10:01 PDT 2009:: 19:10:02 up 95 days, 47 min, 1 user, load average: 0.53, 0.41, 0.16 Mon Oct 12 16:15:01 PDT 2009:: 19:15:02 up 95 days, 52 min, 1 user, load average: 0.05, 0.20, 0.14 Table 2. Part of uptime log 2

After parsing and processing the data, the features available for us to use are: day of the week (Monday, Tuesday, etc.), month, date, year, hour, minute, second, number of unique users, number of POST requests, number of GET requests, and content size of the interaction. The diagram below represents our prediction model: At the very bottom level, we predict the server load based on time. The problem of predicting the server load only based on time would be extremely difficult to solve. So, we decided to divide the problem into two parts, one to predict the server load, and another, to predict the user features based on time. For regressions and classifications tests in the following section, we will assume that the user features were correctly predicted. The output value that we tested was chosen to be the 5-minute average server load because the uptime command was run every 5 minutes on the server. Experimental Result Linear Regression We assume that the user features such as number of unique users and number of requests, etc. were correctly predicted by h1(t). This assumption had to be made because of amount of data which was not sufficient to predict the user features since the logs had been made only since this August. The features month and year were dropped from the feature lists to create a training matrix that is of full rank. To create a larger set of features, we concatenated powers of the matrix X to the original matrix. So, our training matrix, X = [Xorig Xorig 2... Xorig n ].

Since the full rank of matrix X could not be preserved from n = 5, we could only test the cases where n = 1,2,3,4. To come up with the optimal n, some tests were done to compute the average training and test errors. The following graph shows the errors. As we can see from the graph above, the overall error actually got worse as we increased the number of features used. However, because we thought that n = 3 gave a prediction curve of shape closer to the actual data, we picked n = 3. Below are the plots of parts of the actual and predicted server load for training and test data. The corresponding errors with respect to n are listed below: n average training error average test error 1 0.0784 0.0783 2 0.0780 0.0787 3 0.0787 0.0922 4 0.0784 0.3471 As the two graphs display, the predictions are not as accurate as it ought to be. However, the trends in the server load are quite nicely fit. For the above regression, the training error was 0.0652, and the test error was 0.0683. Since the shape of the curve is quite nicely fit, for our purpose of predicting the server load to increase or decrease the number of servers, we could amplify the prediction curve by a constant factor to reduce the latency experienced by the users, or do the opposite to reduce the number of operating servers. Multinomial Logistic Regression In order to transform our problem into a logistic regression problem, we classified the values of the server load (CPU usage) into ten values (0,1,, 9) as follows: if the CPU usage was less than or equal to 0.10, it would be 0; else if less than or equal to 0.20, it would be 1; ; else if less than or

equal to 0.90, it would be 8; else it would be 9. We trained 10 separate hypotheses hi for each classification. For testing, the system would choose the classification that gives us the highest confidence level. We use the same approach as linear regression case to find the maximum power of the matrix we want to use when coming up with the matrix X, where X = [Xorig Xorig 2... Xorig n ]. The following plots for training and test errors were acquired: n average training error average test error 1 0.0236 0.0263 2 0.0244 0.0220 3 0.0243 0.0267 4 0.0371 0.0360 As in the linear regression case, we chose n = 3 for logistic regression. The following is the resulting plot of our prediction compared to the actual data. As we can see, the multinomial logistic regression always outputs 0 as the classification no matter what the features are. This is in part because too many of our data points has Y-values of 0 as the below histogram shows. Because the company we got the data from has just launched, the server load is usually very low. As the website becomes more popular, we would be able to obtain a better

result since the Y-values will be more evenly distributed. Locally weight linear regression Like linear regression, we first assumed all features used as a input, such as number of users, total number of requests and so on, were correctly predicted by for the same reason. For building up a hypothesis, we had to determine what weight function we would use. In common sense, it is reasonable that server loads after five or ten minutes would be dependent on previous several hours of behavior so we decided to use as a weight function in our hypothesis. However, for using the weight function we were needed to find out the optimal value of which is the bandwidth of the bell-shaped weight function. So we used Hold-out Cross Validation for picking up the optimal and the way and results are as follows. 0.0001 0.0287 0.0422 0.0472 0.005 0.0287 0.0422 0.0427 0.01 0.0287 0.0422 0.0472 0.02 0.0287 0.0422 0.0472 0.03 0.0284 0.0380 0.0423 0.04 0.0286 0.0382 0.0424 0.05 0.0288 0.0382 0.0427 0.07 0.0329 0.0582 0.0594 0.1 0.0330 0.0584 0.0594 0.5 0.0387 0.0700 0.0620

1 0.0441 0.0636 0.0572 5 0.0609 0.0666 0.0600 10 0.0650 0.0621 0.0563 First of all, we had picked 70% of the original data randomly and named it as training data and did again 30% and named testing data. Then we trained the hypothesis with various values of on the training data and applied testing data on the hypotheses and got test errors in each value of. Above chart shows test errors of sever loads tested on hypothesis h(t,u) with various. Above plot displays the pattern of the test errors with respect to. As we can see in the chart and graph, at =0.03, the test error is the smallest value on all of three cases so we picked up this one for our weight function. Above plots show the actual server load and prediction of test and train data, respectively. As making a prediction on every unit of query point x is prohibitively time consuming process, the prediction is made on each query point in train and test data. Though it seems that it followed the pattern of actual load, sometimes it could not follow spikes and it needed very long time to predict even on query points of test so it might not be suitable to be used in a real time system which would be needed in a general sever load prediction system. Thus, it is desirable that locally weighted regression will be trained just on previous several hours not on whole previous history if sufficient training data were given. Support Vector Regression We use SVM Light 3 to perform a regression. We remove all zero features ( which indicating that a server is idle ) and select training samples randomly. The trade-off between training error and margin is 0.1 and epsilon width of the tube is 1.0. We use a linear kernel and found that there are 148 support

vectors and the norm of the weight vectors are 0.00083. The training error is 0.479 and the testing error is 0.319. Above plots are parts of the results from the Support vector regression. The red curve is an actual data and the blue curve is the prediction. Although the support vector regression is quickly converged for 6,000 training data set, its prediction has higher training and testing errors than the rest of learning algorithms. We tried to adjust an epsilon width and found that the higher epsilon width will only offset an estimation by a fixed constant. Conclusion The prediction from the locally weight linear regression yields the smallest testing error but it takes an hour to process 6,000 training samples. The linear regression also yields an acceptable prediction when we concatenated the same feature vectors up to a degree of 3. Both Multinomial logistic regression and support vector regression do not perform well with sparse training data. Support vector regression, however, takes only a few minutes to converge, but yields the largest training error. Issues Bias with sparse data Our learning algorithms cannot estimate an uptime reliably because there are almost 90% of the training samples have an uptime value between 0.0 and 0.1. We believe that the data we have collected are not diverse due to the low number of activities of our testing web server. Not enough features In our experiment, a dimension of the feature vectors are 10. 7 of them are part of a timestamp. We think that we do not have enough data to capture the current physical state of the cloud servers such as the memory usage, bandwidth, and network latency. weak prediction model We think that our decision to use the number of transactions and unique users as part of the feature vectors does not accurately model the computing requirements of the cloud servers. An access log file and uptime file are generated separately by different servers. Our assumption that a user activity is the only main cause of high server load average may not be accurate. It is possible that a single transaction could cause a high demand for a computing requirement. Future work We have found a research paper that describes the use of website access logs to predict a user's activities by clustering users based on their recent web activities, we like this idea and think that it is possible to categorize website users based on their computing demands. When these type of users are using the website, we think there is a high chance of increasing the computing requirements. References

[1] Access Log specification. http://httpd.apache.org/docs/2.0/logs.html [2] Uptime specification. http://en.wikipedia.org/wiki/uptime [3] SVMLight. Vers. 6.02. 382K. Aug 14. 2008. Thorsten Joachims. Nov 15. 2009.