Weekly Sales Forecasting



Similar documents
Parallel & Distributed Optimization. Based on Mark Schmidt s slides

Machine Learning Logistic Regression

Scalable Machine Learning - or what to do with all that Big Data infrastructure

Factorization Machines

Indian School of Business Forecasting Sales for Dairy Products

Predicting the Stock Market with News Articles

Using Ensemble of Decision Trees to Forecast Travel Time

Module 6: Introduction to Time Series Forecasting

Big Data Analytics. Lucas Rego Drumond

Big Data Paradigms in Python

Analysis of algorithms of time series analysis for forecasting sales

Classification of Bad Accounts in Credit Card Industry

Applying Data Science to Sales Pipelines for Fun and Profit

The Need for Training in Big Data: Experiences and Case Studies

Diagrams and Graphs of Statistical Data

Data Mining mit der JMSL Numerical Library for Java Applications

Microsoft Azure Machine learning Algorithms

Benchmarking of different classes of models used for credit scoring

Forecasting Methods / Métodos de Previsão Week 1

Hong Kong Stock Index Forecasting

OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS

Data Mining Part 5. Prediction

Time series forecasting

Better decision making under uncertain conditions using Monte Carlo Simulation

Big Data Analytics. Lucas Rego Drumond

The Artificial Prediction Market

Table of Contents. June 2010

Cross Validation. Dr. Thomas Jensen Expedia.com

OUTLIER ANALYSIS. Data Mining 1

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014

Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution

Probabilistic Forecasting of Medium-Term Electricity Demand: A Comparison of Time Series Models

TIME SERIES ANALYSIS & FORECASTING

COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers

Collaborative Filtering Scalable Data Analysis Algorithms Claudia Lehmann, Andrina Mascher

Forecasting the first step in planning. Estimating the future demand for products and services and the necessary resources to produce these outputs

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

CSCI567 Machine Learning (Fall 2014)

Large-Scale Similarity and Distance Metric Learning

HT2015: SC4 Statistical Data Mining and Machine Learning

How To Bet On An Nfl Football Game With A Machine Learning Program

L3: Statistical Modeling with Hadoop

The Foundations of Big Data Behavioral Analytics

Big Data Analytics CSCI 4030

Response prediction using collaborative filtering with hierarchies and side-information

Big Data Analytics: Optimization and Randomization

Forecast the monthly demand on automobiles to increase sales for automotive company

Large Scale Learning to Rank

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

NC STATE UNIVERSITY Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

Data Mining. Nonlinear Classification

Risk pricing for Australian Motor Insurance

Mario Guarracino. Regression

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Practical. I conometrics. data collection, analysis, and application. Christiana E. Hilmer. Michael J. Hilmer San Diego State University

Lecture 3: Linear methods for classification

A Logistic Regression Approach to Ad Click Prediction

Social Media Mining. Data Mining Essentials

Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions

Forecasting Tourism Demand: Methods and Strategies. By D. C. Frechtling Oxford, UK: Butterworth Heinemann 2001

Predict Influencers in the Social Network

Package acrm. R topics documented: February 19, 2015

RELEVANT TO ACCA QUALIFICATION PAPER P3. Studying Paper P3? Performance objectives 7, 8 and 9 are relevant to this exam

Neural Networks for Sentiment Detection in Financial Text

Simple Methods and Procedures Used in Forecasting

CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION

Outline: Demand Forecasting

DETERMINANTS OF GROCERY STORE SALES: A MULTIPLE REGRESSION APPROACH

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Lecture 8 February 4

Time Series Laboratory

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

Mammoth Scale Machine Learning!

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

DATA MINING TECHNIQUES AND APPLICATIONS

Scatter Plot, Correlation, and Regression on the TI-83/84

Predicting daily incoming solar energy from weather data

JetBlue Airways Stock Price Analysis and Prediction

Logistic Regression for Spam Filtering

Predicting Flight Delays

Weather Normalization of Peak Load

430 Statistics and Financial Mathematics for Business

Modelling Electricity Spot Prices A Regime-Switching Approach

Traffic Prediction and Analysis using a Big Data and Visualisation Approach

Scalable Machine Learning to Exploit Big Data for Knowledge Discovery

THE PREDICTIVE MODELLING PROCESS

Why Ensembles Win Data Mining Competitions

Introduction to time series analysis

(More Practice With Trend Forecasts)

MapReduce/Bigtable for Distributed Optimization

2) The three categories of forecasting models are time series, quantitative, and qualitative. 2)

Getting Even More Out of Ensemble Selection

Week TSX Index

Simple Linear Regression Inference

Applied Analytics in a World of Big Data. Business Intelligence and Analytics (BI&A) Course #: BIA 686. Catalog Description:

Transcription:

Weekly Sales Forecasting! San Diego Data Science and R Users Group June 2014 Kevin Davenport! http://kldavenport.com kldavenportjr@gmail.com @KevinLDavenport Thank you to our sponsors:

The competition Use historical markdown data to predict store sales. Predict weekly sales for a set of store & department pairs in 2013 given the weekly sales from 2010-2012.

The Data 1. train/test: weekly sales for each combination of store, department, and date. 2. features: markdowns, CPI, unemployment rate, temperature, fuel price, and holiday (T/F) 3. stores: size and type of each store

Joining the datasets http://pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html#compare-with-sql-join

Time Series and Forecasting Let s assume that a relationship between all observations or measurements can be expressed by a linear or non-linear function. Consider a measurement of 100 RPM @ 11:00am, it s highly probable that the observation before or after it (10:59am and 11:01am) are close to 100 RPM.!

Seasonality vs Trend

Seasonality vs Trend Average sales of all stores/departments per week. Two significant peaks at November and December.

Correlation and Distribution http://docs.ggplot2.org/0.9.3.1/plotmatrix.html#

Holidays One randomly selected store/department

Holidays All stores/departments

Feature Engineering 1. Naive Previous Year s Sales 2. Smoothed Sales 3. Additional features (features.csv)

Numerical Features Scale! 1. Consumer Price Index (CPI) 2. Fuel price 3. Temperature 4. Markdowns Dummy encode! 1. Department 2. Holiday 3. Store type 4. Date 5. Unemployment rate 6. Store size

Dummy encoding

Date Encoding The Date column can be binarily encoded to create the following features: - Hour of the day (24 boolean features) - Day of the week (7 boolean features) - Day of the month (up to 31 boolean features) - Month of the year (12 boolean features) - Year (as many boolean features as they are different years in your dataset)

Gradient Descent

the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate). SGD allows minibatch (online/out-of-core) learning with the partial_fit method.

Gradient Descent Gradient Descent - consider entire dataset! Stochastic Gradient Descent - Each step pick one random data point continue as if entire dataset was just the one point! Minibatch SGD - Each step pick small subset of data points continue as if entire dataset was just the this subset http://www.libfm.org/

Gradient Descent http://www.onmyphd.com/?p=gradient.descent

Gradient Descent

Gradient Descent

libfm Factorization machines (FM) are a generic approach that allows to mimic most factorization models by feature engineering. This way, factorization machines combine the generality of feature engineering with the superiority of factorization models in estimating interactions between categorical variables of large domain http://www.libfm.org/ haven t tried: https://github.com/coreylynch/pyfm

Evaluation 1. Smoothed last year s sales. 2. Ridge regression 3. Factorization machines

Please Donate to numfocus.org