Data Analysis of Trends in iphone 5 Sales on ebay
|
|
|
- Felix Lucas
- 10 years ago
- Views:
Transcription
1 Data Analysis of Trends in iphone 5 Sales on ebay By Wenyu Zhang Mentor: Professor David Aldous
2 Contents Pg 1. Introduction 3 2. Data and Analysis Description of Data Retrieval of Data Preliminary Analysis 5 3. Models Regression Tree Random Forest Conclusion Future research 17
3 1. INTRODUCTION ebay Inc. is an American multinational internet corporation founded in It provides a variety of online marketplaces for the sale of goods and services, online payment services, and online communication offerings to individuals and businesses in the United States and internationally. The corporation manages ebay.com, a popular online platform on which people buy and sell a wide range of goods and services, such as clothes, electronics, and even real property. The website supports two methods of sale, that is, auction-style and Buy-It-Now standard shopping. ebay.com has an estimated 19 million users worldwide, with over 2 million visitors each day. The buying and selling patterns on ebay has been an object of interest for the many people using the website. How do prices vary across time within a day? Are items priced higher on weekends? How differently are new and used products priced? This report will attempt to answer some of these questions, with respect to a particular product: iphone 5. The iphone 5 was released on September 21, Following the success of its predecessors, iphone 5 is received with overwhelming popularity. At the online Apple store, the product is priced as follows: Carrier & Contract Storage Price 16 GB $649 Unlocked, Contract-free 32 GB $ GB $ GB From $199 Network carrier, with Contract 32 GB From $ GB From $399 Table 1.1 iphone 5 prices at Apple store From Table 1.1 above, price differences for new iphones at the store arise mainly from the type of carrier and contract, as well as the storage capacity. While the store prices are fixed, prices on online marketplaces like ebay tend to spread over a much wider range. One difference to note is that the iphones on ebay are sold without contract, though many are still carrier-specific. Also, the condition of item, the format of sale, and the time of the listing all introduce additional fluctuation to the final price at with the item is sold. The goal of the project is to build a model to predict the price of an iphone 5 with a particular condition and carrier, that is sold on ebay through a particular format of sale, at a specific day of the week and time period of the day. Due to the nature of the data collected, I will focus mainly on the use of regression trees and random forests. Tree methods are nonlinear, nonparametric, and can simultaneously handle numerical and categorical data.
4 The project hopes to provide prospective buyers with more information and viable strategies they can employ, so as to maximize utility. In general, the project attains more insights and perspectives on merchant and consumer behaviors on online shopping sites. 2. DATA AND ANALYSIS 2.1 DESCRIPTION OF DATA The dataset consists of approximately 2 months worth of data, from December 25, 2012 to March 5, After the removal of faulty data, there are a total of entries. Fig below shows a portion of this dataset entries are randomly selected as the test set, the remaining entries make up the training set. Fig Portion of complete dataset The response variable is the Price column, and the predictor variables are on the columns thereafter. The first two columns of Description and Date.Time are used to extract the predictors, and more details can be found on the following subsection. The full set of predictors consist of 8 variables: Color (Black, White) Storage capacity (16GB, 32GB, 64 GB) Format of sale (Auction, Buy-It-Now) Condition of item (New, New: other, Used) Day of the week (Mon, Tue,, Sun) Carrier (Unlocked, AT&T, Sprint, Verizon) Diff.Days i.e. number of days from the first day of available data (Dec 25, 2012) Time i.e. time at which the listing ended
5 2.2 RETRIEVAL OF DATA Data is retrieved directly from ebay.com, under the Completed Listing tab. Fig shows a portion of the raw data obtained from the page source of the website. Fig Portion of raw data The predictor features of Color, Storage capacity, Condition of item, and Carrier are extracted from the Product Description column through word matching. Day of the week, Diff.Days and Time are extracted from the Time.Listing.Ended column. Format of sale is extracted based on whether the item has a value under Auction.Price or Buy.It.Now.Price. 2.3 PRELIMINARY ANALYSIS Some preliminary analysis is conducted to explore the general relationship between price and each predictor variable. For each categorical variable, the entries are separated into the corresponding categories. The percentage volume and value of sale for each category are calculated and plotted for comparison. For instance, if percentage volume is higher than percentage value, then it is possible that iphones in that category tend to be sold at a price lower than those in other categories. The mean price for each category is added as a point on the boxplot. T-tests or one-way ANOVA tests are used to check if the samples are from populations with equal mean. (Differences between points in plots may appear visually exaggerated due to the scale.)
6 The mean prices of black and white iphones differ by $7.15. The t-test returns a p-value of 8.038e-11, such that the null hypothesis of equal mean can be rejected.
7 The mean prices of 16GB and 64GB iphones differ by $ The one-way ANOVA test returns a p-value of < 2.2e-16, such that the null hypothesis of equal mean can be rejected.
8 The mean prices of iphones sold in auction and Buy-It-Now style differ by $ The t- test returns a p-value of < 2.2e-16, such that the null hypothesis of equal mean can be rejected.
9 The mean prices of new and used iphones differ by $ The one-way ANOVA test returns a p-value of < 2.2e-16, such that the null hypothesis of equal mean can be rejected.
10 The mean prices of unlocked and Sprint iphones differ by $ The one-way ANOVA test returns a p-value of < 2.2e-16, such that the null hypothesis of equal mean can be rejected.
11 The maximum difference between mean prices of iphones sold on each day of the week is $2.97. The one-way ANOVA test returns a p-value of , such that the null hypothesis of equal mean cannot be rejected.
12 The maximum difference between mean prices of iphones sold within one-hour blocks is $ The one-way ANOVA test returns a p-value of < 2.2e-16, such that the null hypothesis of equal mean can be rejected.
13 3. MODELS 3.1 REGRESSION TREE The rpart function in R grows the tree shown in Fig Since the first split is made at whether the carrier is Sprint, it seems that the type of carrier plays the most important role on the price of the iphone 5 and that Sprint phones are generally priced lower than those of the other carriers. As observed from the other splits made, storage capacity and product condition are also important determinants of price. Fig Regression tree
14 By a plot of the cross-validation results below, a tree size of 9 gives a minimum validation error of This is exactly the original tree grown, so pruning produces the same tree in this case. Fig Cross-validation results Fitting the model to the test data and using mean squared error as the risk function, a test error of is obtained. I attempt to improve the results using random forests in the following section.
15 3.2 RANDOM FOREST Random forests improve predictive accuracy by generating a large number of bootstrapped trees, and deciding a final predicted outcome by averaging the results across all the trees. Using the randomforest function in R, a forest of 200 trees are created with bootstrapped samples. The parameter ntree = 200 is chosen by cross validation using the options 10, 100, 200, 300, 400, 500, and choosing the value that minimizes the validation error. The test error is reduced slightly to Fig shows the importance of each variable as measured by the random forest. For regression, importance is calculated as how much the split reduces node impurity in terms of residual sum of squares. Similar to the observations from the previous regression tree, the most important variable is the type of carrier, followed by storage capacity and product condition. Time and date related features come next. Sale format and color are the least important determinants of iphone 5 prices. Fig Plot of importance of variables
16 4. CONCLUSION Looking at the 3000 test set values predicted by the random forest model and the three most important variables identified (i.e. Carrier, Storage, and Condition), I took the average price in each category and obtained the following summary: Carrier ATT Verizon Sprint Unlocked Price Storage 16GB 32GB 64GB Price Condition New New: other Used Price Table 4.1 Average price by category The variables Storage and Condition reflect physical differences between the products sold, and the price differences are as expected with new, 64GB phones being more expensive than used, 16GB ones. While phones with specific carriers are expectedly sold at prices lower than their unlocked counterparts, it is interesting that Sprint phones are on average more than $100 cheaper than ATT and Verizon phones. This may be due to differences in the kind of service plans available at each carrier. The average prices of new, unlocked iphones of each storage capacity sold on ebay are calculated and tabulated in Table 4.2. Since phones with these attributes are sold at the official Apple stores as well, a comparison can be made between the prices of the same products sold via two different channels. As the table shows, 16GB phones are priced similarly on ebay and at Apple stores, while 32GB and 64GB phones are generally cheaper on ebay. On average, a 32GB iphone 5 can be bought for $20 less on ebay than at Apple stores, and a 64GB for $75 less. Storage 16GB 32GB 64GB Price on ebay Price on Apple Table 4.2 Price comparison between ebay and Apple store
17 5. FUTURE RESEARCH Further work can be done in optimizing the regression tree and random forest models, and in trying to fit other models to the dataset, so that more accurate predictions of price can be attained. More in-depth study can also be done with the data available through the Completed Listing tab on ebay. For instance, since ending time of a Buy-It-Now listing marks exactly the time a product is purchased, the data reflect appropriately consumer behavior across time and can help to identify peak buying periods on ebay. Time series analysis can be incorporated in this case. The information obtained would be particularly useful for sellers in determining the best times to list a product in order to maximize profits. A similar analysis can be conducted for Auction listings, although more coding work would be necessary to extract relevant bidding history.
Simple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
How To Predict Call Option Prices Using Regression Models
Predicting Call Option Prices Using Regression Models Munira Shahir July 24, 2014 Abstract One method to consists of predict call option prices is the Black-Scholes equation. However, that is utilizing
Winning the Kaggle Algorithmic Trading Challenge with the Composition of Many Models and Feature Engineering
IEICE Transactions on Information and Systems, vol.e96-d, no.3, pp.742-745, 2013. 1 Winning the Kaggle Algorithmic Trading Challenge with the Composition of Many Models and Feature Engineering Ildefons
Tree Ensembles: The Power of Post- Processing. December 2012 Dan Steinberg Mikhail Golovnya Salford Systems
Tree Ensembles: The Power of Post- Processing December 2012 Dan Steinberg Mikhail Golovnya Salford Systems Course Outline Salford Systems quick overview Treenet an ensemble of boosted trees GPS modern
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees
Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.
JetBlue Airways Stock Price Analysis and Prediction
JetBlue Airways Stock Price Analysis and Prediction Team Member: Lulu Liu, Jiaojiao Liu DSO530 Final Project JETBLUE AIRWAYS STOCK PRICE ANALYSIS AND PREDICTION 1 Motivation Started in February 2000, JetBlue
Chapter 12 Bagging and Random Forests
Chapter 12 Bagging and Random Forests Xiaogang Su Department of Statistics and Actuarial Science University of Central Florida - 1 - Outline A brief introduction to the bootstrap Bagging: basic concepts
Multiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
Decision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
Chapter 7. One-way ANOVA
Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks
t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon
t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. [email protected] www.excelmasterseries.com
Commerce Monks OPTIMIZING YOUR EBAY MARKETPLACE PRESENCE
Commerce Monks OPTIMIZING YOUR EBAY MARKETPLACE PRESENCE IN THIS GUIDE Commerce Monks walk you through ebay store optimization best practices, ways to promote your ebay store, do s and don ts for an ideal
Final Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
Package missforest. February 20, 2015
Type Package Package missforest February 20, 2015 Title Nonparametric Missing Value Imputation using Random Forest Version 1.4 Date 2013-12-31 Author Daniel J. Stekhoven Maintainer
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
One-Way Analysis of Variance (ANOVA) Example Problem
One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means
Recall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
M1 in Economics and Economics and Statistics Applied multivariate Analysis - Big data analytics Worksheet 3 - Random Forest
Nathalie Villa-Vialaneix Année 2014/2015 M1 in Economics and Economics and Statistics Applied multivariate Analysis - Big data analytics Worksheet 3 - Random Forest This worksheet s aim is to learn how
Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk [email protected] Tom Kelsey ID5059-19-B &
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:
Elements of statistics (MATH0487-1)
Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -
How To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző [email protected] 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
Analysis of Variance (ANOVA) Using Minitab
Analysis of Variance (ANOVA) Using Minitab By Keith M. Bower, M.S., Technical Training Specialist, Minitab Inc. Frequently, scientists are concerned with detecting differences in means (averages) between
This chapter provides information on the research methods of this thesis. The
Chapter 3 Research Methods This chapter provides information on the research methods of this thesis. The survey research method has been chosen to determine the factors influencing hedge fund investment
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
TRAINING & CONSULTANCY GUIDE
TRAINING & CONSULTANCY GUIDE All Trainings are organised Fortnightly on Mondays, Tuesdays, Wednesdays and Thursdays between 5:30pm - 9:00 pm and Saturdays All Trainings are available in house or 1-to-1
A Methodology for Predictive Failure Detection in Semiconductor Fabrication
A Methodology for Predictive Failure Detection in Semiconductor Fabrication Peter Scheibelhofer (TU Graz) Dietmar Gleispach, Günter Hayderer (austriamicrosystems AG) 09-09-2011 Peter Scheibelhofer (TU
Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
Introduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
Module 5: Statistical Analysis
Module 5: Statistical Analysis To answer more complex questions using your data, or in statistical terms, to test your hypothesis, you need to use more advanced statistical tests. This module reviews the
5 Websites Where You Can Make Money Right Now
5 Websites Where You Can Make Money Right Now A Quick Guide to Making More Money by: Brian Lang (Small Business Ideas Blog) www.smallbusinessideasblog.com 5 Websites Where You Can Make Money Right Now
Predicting borrowers chance of defaulting on credit loans
Predicting borrowers chance of defaulting on credit loans Junjie Liang ([email protected]) Abstract Credit score prediction is of great interests to banks as the outcome of the prediction algorithm
Dealing with continuous variables and geographical information in non life insurance ratemaking. Maxime Clijsters
Dealing with continuous variables and geographical information in non life insurance ratemaking Maxime Clijsters Introduction Policyholder s Vehicle type (4x4 Y/N) Kilowatt of the vehicle Age Age of the
Lernmodul Functions of a stock exchange. Lernmodul Functions of a stock exchange
Lernmodul Functions of a stock exchange Lernmodul What is a stock exchange? All of us are familiar with the image of the stock exchange in movies and the news: Stockbrokers stand around on the trading
Using Excel for Data Manipulation and Statistical Analysis: How-to s and Cautions
2010 Using Excel for Data Manipulation and Statistical Analysis: How-to s and Cautions This document describes how to perform some basic statistical procedures in Microsoft Excel. Microsoft Excel is spreadsheet
Data driven approach in analyzing energy consumption data in buildings. Office of Environmental Sustainability Ian Tan
Data driven approach in analyzing energy consumption data in buildings Office of Environmental Sustainability Ian Tan Background Real time energy consumption data of buildings in terms of electricity (kwh)
Optimization of technical trading strategies and the profitability in security markets
Economics Letters 59 (1998) 249 254 Optimization of technical trading strategies and the profitability in security markets Ramazan Gençay 1, * University of Windsor, Department of Economics, 401 Sunset,
General Regression Formulae ) (N-2) (1 - r 2 YX
General Regression Formulae Single Predictor Standardized Parameter Model: Z Yi = β Z Xi + ε i Single Predictor Standardized Statistical Model: Z Yi = β Z Xi Estimate of Beta (Beta-hat: β = r YX (1 Standard
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
Factors affecting online sales
Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4
12: Analysis of Variance. Introduction
1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider
Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com
SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING
E-Commerce Fundamentals Sukaina Al-Nasrawi Associate IT Officer ICT Division
Workshop on Delivery of E-Services in Civil Society E-Commerce Fundamentals Sukaina Al-Nasrawi Associate IT Officer ICT Division Outline Introduction: What is E-Commerce Categories of E-Commerce E-Commerce
Testing for Lack of Fit
Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit
Outline. What is Big data and where they come from? How we deal with Big data?
What is Big Data Outline What is Big data and where they come from? How we deal with Big data? Big Data Everywhere! As a human, we generate a lot of data during our everyday activity. When you buy something,
The Operational Value of Social Media Information. Social Media and Customer Interaction
The Operational Value of Social Media Information Dennis J. Zhang (Kellogg School of Management) Ruomeng Cui (Kelley School of Business) Santiago Gallino (Tuck School of Business) Antonio Moreno-Garcia
Tutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
Using Excel for Statistical Analysis
2010 Using Excel for Statistical Analysis Microsoft Excel is spreadsheet software that is used to store information in columns and rows, which can then be organized and/or processed. Excel is a powerful
Data Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
Computer Forensics Application. ebay-uab Collaborative Research: Product Image Analysis for Authorship Identification
Computer Forensics Application ebay-uab Collaborative Research: Product Image Analysis for Authorship Identification Project Overview A new framework that provides additional clues extracted from images
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
Analysing Questionnaires using Minitab (for SPSS queries contact -) [email protected]
Analysing Questionnaires using Minitab (for SPSS queries contact -) [email protected] Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
Better decision making under uncertain conditions using Monte Carlo Simulation
IBM Software Business Analytics IBM SPSS Statistics Better decision making under uncertain conditions using Monte Carlo Simulation Monte Carlo simulation and risk analysis techniques in IBM SPSS Statistics
IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA
CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the
You can set up an ebay Store if you: Step 1 Register your store subscription: 1. Login to your ebay account and click on My ebay
SETTING UP AN EBAY STORE FOR BASIC E-COMMERCE You can set up an ebay Store if you: Have an ebay seller account with your credit card on file Be registered with ebay for at least 14 days There are two ebay
Predicting Market Value of Soccer Players Using Linear Modeling Techniques
1 Predicting Market Value of Soccer Players Using Linear Modeling Techniques by Yuan He Advisor: David Aldous Index Introduction ----------------------------------------------------------------------------
Easily Identify Your Best Customers
IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Two-Group Hypothesis Tests: Excel 2013 T-TEST Command
Two group hypothesis tests using Excel 2013 T-TEST command 1 Two-Group Hypothesis Tests: Excel 2013 T-TEST Command by Milo Schield Member: International Statistical Institute US Rep: International Statistical
American and European. Put Option
American and European Put Option Analytical Finance I Kinda Sumlaji 1 Table of Contents: 1. Introduction... 3 2. Option Style... 4 3. Put Option 4 3.1 Definition 4 3.2 Payoff at Maturity... 4 3.3 Example
Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms
Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms Yin Zhao School of Mathematical Sciences Universiti Sains Malaysia (USM) Penang, Malaysia Yahya
Just 30 Minutes A Day = More Money Than You're Thinking
I've been making money with cell phones for nearly 3 years and will teach you everything I know. You aren't making money with your cell phone?! I've been making money with cell phones for nearly 3 years
Elementary Statistics Sample Exam #3
Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to
ONLINE SHOPPING PRE-READING QUESTIONS
Online Shopping 1 ONLINE SHOPPING PRE-READING QUESTIONS 1. Do you shop online? Why or why not? 2. What are your favorite online shopping websites? 3. What items or services do you purchase online? 4. How
arxiv:1301.4944v1 [stat.ml] 21 Jan 2013
Evaluation of a Supervised Learning Approach for Stock Market Operations Marcelo S. Lauretto 1, Bárbara B. C. Silva 1 and Pablo M. Andrade 2 1 EACH USP, 2 IME USP. 1 Introduction arxiv:1301.4944v1 [stat.ml]
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic
A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia
Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds
Combining Linear and Non-Linear Modeling Techniques: Getting the Best of Two Worlds Outline Who is EMB? Insurance industry predictive modeling applications EMBLEM our GLM tool How we have used CART with
Understanding Characteristics of Caravan Insurance Policy Buyer
Understanding Characteristics of Caravan Insurance Policy Buyer May 10, 2007 Group 5 Chih Hau Huang Masami Mabuchi Muthita Songchitruksa Nopakoon Visitrattakul Executive Summary This report is intended
TABLE OF CONTENTS. About Chi Squares... 1. What is a CHI SQUARE?... 1. Chi Squares... 1. Hypothesis Testing with Chi Squares... 2
About Chi Squares TABLE OF CONTENTS About Chi Squares... 1 What is a CHI SQUARE?... 1 Chi Squares... 1 Goodness of fit test (One-way χ 2 )... 1 Test of Independence (Two-way χ 2 )... 2 Hypothesis Testing
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
Data Mining: Benefits for business.
Data Mining: Benefits for business. Data Mining enables businesses of all types and sizes to empower their data and gain valuable insight into what decisions to take. Since 2005 the Oversea-Chinese Banking
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.
General Method: Difference of Means 1. Calculate x 1, x 2, SE 1, SE 2. 2. Combined SE = SE1 2 + SE2 2. ASSUMES INDEPENDENT SAMPLES. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n
New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
When to use Excel. When NOT to use Excel 9/24/2014
Analyzing Quantitative Assessment Data with Excel October 2, 2014 Jeremy Penn, Ph.D. Director When to use Excel You want to quickly summarize or analyze your assessment data You want to create basic visual
Analysis of Variance. MINITAB User s Guide 2 3-1
3 Analysis of Variance Analysis of Variance Overview, 3-2 One-Way Analysis of Variance, 3-5 Two-Way Analysis of Variance, 3-11 Analysis of Means, 3-13 Overview of Balanced ANOVA and GLM, 3-18 Balanced
PRESENCE, INTELLIGENCE AND CONFLICT: OPPORTUNITIES AND CHALLENGES IN DIRECT-TO-CONSUMER E-COMMERCE
WHITE PAPER PRESENCE, INTELLIGENCE AND CONFLICT: OPPORTUNITIES AND CHALLENGES IN DIRECT-TO-CONSUMER E-COMMERCE EXECUTIVE SUMMARY Readers of this document will learn why the advantages and opportunities
TRADING PRACTICES IN EQUITY MARKET AMONG RETAIL INVESTORS WITH SPECIFIC REFERENCE TO CHENNAI CITY
EPRA International Journal of Economic and Business Review Inno Space (SJIF) Impact Factor : 4.618(Morocco) e-issn : 2347-9671, p- ISSN : 2349-0187 Vol - 3, Issue- 7, July 2015 ISI Impact Factor : 1.259
Direct Marketing of Insurance. Integration of Marketing, Pricing and Underwriting
Direct Marketing of Insurance Integration of Marketing, Pricing and Underwriting As insurers move to direct distribution and database marketing, new approaches to the business, integrating the marketing,
Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test
Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric
Projects Involving Statistics (& SPSS)
Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,
KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management
KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
General Information Series
General Information Series 1 Agricultural Futures for the Beginner Describes various applications of futures contracts for those new to futures markets. Different trading examples for hedgers and speculators
THE OMNICHANNEL CONSUMER
THE OMNICHANNEL CONSUMER KEY FINDINGS FROM THE DELOITTE REPORT THE OMNICHANNEL OPPORTUNITY TO HELP RETAILERS UNLOCK THE POWER OF THE CONNECTED CONSUMER INTRODUCTION The internet and mobile devices have
2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
