Understanding a Fan Base Beyond the Ballpark: A statistical analysis of the transportation to major sporting events in NYC

Size: px
Start display at page:

Download "Understanding a Fan Base Beyond the Ballpark: A statistical analysis of the transportation to major sporting events in NYC"

Transcription

1 Understanding a Fan Base Beyond the Ballpark: A statistical analysis of the transportation to major sporting events in NYC 1. Abstract Paper Track: Business of Sports Paper ID: 1562 The sports industry has a distinct advantage over many other B2C industries in that its customers, or fans, routinely make irrational decisions fueled by passion. As sports organizations increasingly adopt loyalty programs, apps, and new technologies in an effort to understand their fans interaction, involvement, and habits inside the stadium, we extend the focus of our analysis outside of stadium walls. Our analysis takes a look at openly available data in the NYC region in order to better understand event-related transportation decisions. Understanding these decisions and incorporating a data driven approach could enable sports organizations to tailor outreach, and predict engagement levels for the fan base and improve the overall fan experience. For our analysis, we analyzed five open source data sets: 2014 NYC Yellow and Green Cab Trip Information, NYC Uber data, NYC demographic data, NYC weather data, and relevant baseball game information for the New York Mets and the New York Yankees 2014 season. From there, we conducted exploratory data analysis, using geolocation and time data to observe taxi and transportation flow. We applied both unsupervised and supervised machine learning algorithms to determine clusters of activity to and from some of NYC s largest observed sporting events and also to gain a better understanding of fan behavior leading up to and after the game. Based on the lessons learned from these machine learned clusters, we employed a set of statistical models to effectively determine feature importance and to estimate potential origins and destinations for future events given the observable variables. 2. Introduction Throughout the 2014 baseball season, over 66,000 fans took private yellow or green taxis as a means of transportation to New York Yankees and New York Mets home games. Even more interesting, these fans came from over 200 unique zip codes spanning multiple boroughs, illustrating the reach of both teams throughout the New York City area. Given that New York City is world renowned for the extent of its public transportation system, the fact that fans would view a taxi as a viable alternative illustrates the consumer s willingness to take alternative private transportation to attend sporting events. Delving deeper into yellow and green NYC taxi data, Exhibit 1 illustrates an increased willingness to pay demonstrated by fans attending baseball games in New York. As shown, on average, fans of both the New York Yankees and the New York Mets are willing to pay increased fares to arrive at the ballpark. Interestingly, as we look at cab fares terminating in the vicinity of Citi Field, we observe multiple peaks in terms of fare paid. This distribution warranted further investigation to 1

2 determine its root cause. Figure 1: Illustration of Cab Fares by Classified Destination We observed a median total Yankees game fare of approximately $16.50, for Mets games, $ This all compares to the median general NYC cab total fare of $ To us, this represents the premium that fans put on going to sporting events. This willingness to spend represents a clear opportunity for teams to further own the fan experience beyond the stadium. 3. Data and Preparation Initially, we used exploratory data analysis in order to better understand baseball fans taxi originations and the demographics of those areas. New York City provides the perfect platform for this analysis, given its two fan bases with similar means to both arrive at and leave baseball games. The raw dataset was obtained from the NYC cab website[1]. We specifically obtained data from March-September of 2014, as we were looking at the 2014 seasons for both teams.. When combined into one, the raw dataset had over 81 million rows with the following relevant columns: pickup_datetime, dropoff_datetime, passenger_count, trip_distance, pickup_longitude, pickup_latitude, dropoff_longitude, dropoff_latitude, fare_amount, total_amount. In order to filter this dataset, we took the longitude and latitude with a 0.5 mile radius around Yankee Stadium ( , ) and Citi Field ( , ). Next, we went to Baseball Reference[2] and Retrosheet[3] to obtain all relevant baseball information, including game start times and end times. Using start times, we retained all cabs arriving two hours or less before start, and one hour after, as we assume some fans arrive late. Conversely, we considered taxis that left a game 90 minutes before the end of the game and up to two 2

3 hours after, as many fans remain in the area immediately following a game. This greatly reduced the raw dataset to the following value counts: Next, we obtain respect zip codes and boroughs from latitude and longitude data, which allowed us to consider Census demographic information, which is parsed by zip code. We retained zip codes with n>100 rides, in order to avoid a few rides skewing data. Interestingly, there were plenty of taxis coming straight from major transit hubs such as LaGuardia and JFK airports. This left us with a total of 66 unique zip codes. Next, we applied various unsupervised learning algorithms to the data. This is useful to determine clusters without target variables. The most effective algorithm that we found was k-means, which is widely used across many different sectors. Within k-means, the Euclidean distance metric is calculated, and from there, it becomes an optimization problem with the goal of minimizing the cluster sum of squared errors (also called cluster inertia). One of the main criticisms of k-means is that the number of clusters must be specified beforehand. We initially used an x-means algorithm which automatically selects the number of clusters (the k) by optimization; however, similar to marketing functions that often run campaigns of different complexity, we decided to use the elbow method with k-means. Generally, elbow methods mitigate some drawbacks of clustering algorithms and give a visual representation of the number of clusters and their corresponding distortion. Figure 2: k-means cluster analysis 3

4 In Exhibit 2, there are two distinct areas where we observe elbows - at 3 and at 6. Next, we analyzed each of the clusters, as shown in Exhibits 3 and 4. Figure 3: Income, Ethnicity, Marital Status by Cluster Figure 4: Education by Cluster Next, we visually analyzed the arrival patterns of each cluster. In Exhibits 5 and 6, we observe New York Yankees and New York Mets traffic by departure location for each taxi that arrived near the stadium. The points are color coded to represent the cluster group that they belong to. 4. Data Visualization Mets: 4

5 Figure 5: Departure Location by Cluster for Mets Games Yankees: Figure 6: Departure Location by Cluster for Yankees Games Based on these visualizations, we can clearly see the battleground areas for Mets and Yankees Fans. While Yankees fans show a greater density, especially in expected areas such as the Bronx, there are similar concentrations of the second cluster in Mid-town and Long Island City. 5. Predicting traffic flow While understanding the types of fans through the aggregate season taxi amount is important, it is also of significance to understand and predict the amount of taxis coming to any individual game. By predicting the traffic flow, a team can potentially extend the game experience to hours before the game when fans are beginning their journey to the stadium. An interesting aspect of fan experience, is that the simple act of arriving at the stadium can greatly impact the likelihood that they will return. Based on this assumption, we employed different machine learning algorithms to best determine how a team can predict taxi flow to a baseball game, and the best performing machine learning algorithm was a random forest. We used an input dataset with the following columns: team power ranking, opponent power ranking, month of season, wind speed, game promotion, team rank, day or night, attendance, day of week, number of game streak, start time hour, games behind, weather summary, precipitation intensity, and temperature. 5

6 All of the columns were numeric values except for Day of Week (M/T/W/R/F/Sat/Sun), Day/Night, weather summary (clear/rain), and game promotion (Y/N). In order to get our target variable of taxi rides, we took raw taxi data set and grouped by the date and whether it was a pickup and dropoff; subsequently, a count was applied and merged with the baseball in dataset, leaving us with the total amount of taxis coming to the respective stadiums. We trained a random forest of exactly 10,000 trees for both teams and calculated the mean absolute value. Here were the following results: Mets Train/Test Random Forest mean absolute value: Yankees Train/Test mean absolute value: Mets Out of Bag mean absolute value: Yankees Out of Bag mean absolute value: One of the drawbacks is that a random forest is tough to interpret, as it is implicitly a black box machine learning algorithm. Nonetheless the increase in accuracy is enough to justify its use. Below is a histogram of showing the residuals from an out-of-the-bag random forest regression. While the train and test with a 25% split performed better, due to the size of dataset, with only one season, the more unbiased prediction is obtained using outof-the-bag random forests. Overall, this points to confidence that given a large taxi flow, by using open source and attainable data, teams can both estimate taxi flow, understand where taxis are coming from, and understand the type of fans that ride them to games. Figure 6: Histogram of random forest residuals 6

7 One of the benefits of using a random forest is to gain an understanding of the feature importance of the input variables. Feature importance can be calculated by looking at the average impurity that is decreased from all the decision trees in the process. The average feature importance between both teams is shown in the data visualization below. Here, we explore the top 12 features, as everything else proved to be of minimal importance. Figure 7: Feature Importance of Random Forest Model A cursory review of feature importance reveals some obvious conclusions better attended games are more important, due to volume effects, for example. However, we found it interesting that the temperature was viewed as very important, but other potentially inclement weather variables, such as rain, did not seem to have an impact. Below we compare two of these variables (rain and temperature), temperature having high predictive value, and rain having low. Visualizing cabs by team, we notice distinct differences between their distribution, despite similar n of taxi (rainy-day cabs had about 10% more pickups). In particular, for Mets fans, there is a thick concentration of Midtown pickups that is absent for hot days absent from rainy days: 7

8 Figure 8: Total Cabs by type of weather (NYM cabs in blue, NYY cabs in orange) Interestingly, the same trends do not apply to the Yankees fan base. Nonetheless, the difference between these two graphs can hold considerable value for an organization for further discussion, see Section 6 below. These nuances can potentially be explained one might conclude, for instance that when it is heavily raining, it is often hard to get a cab in general; however, when the weather is hotter, consumers may feel willing to spend the extra money to avoid the Manhattan heat, but competition for taxis would not be as fierce. Also of interest is that the opponent s power ranking proved to be more influential than the actual home team; having a grasp of the opponent clearly helps to determine how many fans will choose to take taxis to arrive at the ballpark. 6. Takeaways Throughout this paper, we have noted salient trends and made specific predictions about cab flow, using demographics, baseball, and weather data to buttress our analyses. Nonetheless, it is important to note what organizations can do with this type of information. 8

9 Not only can teams use this type of information to predict cab flow, they can also connect these insights to custom promotions for example, noting the surfeit of cab rides coming out of Midtown on hot days, the New York Mets might offer an air-conditioned party-bus that picks up at known or predicted cab hotspots, leveraging their data to better connect with their fans. In addition, they can make use of the battleground areas, such as Long Island City, in order to focus their marketing and cultivate deeper fan relationships in these areas. Finally, organizations can hone in on demographic-tinged clusters to predict not only where certain fans will travel from, but also what type of fan (and what purchasing power they may be likely to have) comes from what area, in order to provide those areas with custom messages and marketing. 7. Conclusion The rise of analytics and data collection has led to new business opportunities in sports that never existed. While teams have historically looked to control the fan experience in the stadium, we believe the next natural step is to utilize analytics to extend the fan experience outside the stadium and before the game. Through a deeper understanding of these transport-related decisions, organizations can begin to understand the passion that fans exhibit and provide unparalleled access, experiences, and offerings that take advantage of this data. Organizations can ensure a memorable experience from the moment a fan begins to make their way to the ballpark and again as they continue with their day after the game. The potential to use analytics to control the fan experience is a new and exciting step in the evolving landscape of sports business analytics; teams at the forefront of these advances will reap considerable benefits as they connect further with their fan-base in all aspects of the game day experience. 9

10 References [1] [2] [3] 10

11 Appendix An appendix is not required, but if you have one please include it here. 11

Highly Scalable Tile-Based Visualization for Exploratory Data Analysis

Highly Scalable Tile-Based Visualization for Exploratory Data Analysis Highly Scalable Tile-Based Visualization for Exploratory Data Analysis Strata NY: Hadoop and Beyond, 10/17/2014 David Jonker, Rob Harper 2014 OCULUS INFO INC. Making Sense of Big Data Big Data for us is

More information

CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION

CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION N PROBLEM DEFINITION Opportunity New Booking - Time of Arrival Shortest Route (Distance/Time) Taxi-Passenger Demand Distribution Value Accurate

More information

Making Sense of Urban Data

Making Sense of Urban Data Making Sense of Urban Data Anil Yazici, PhD Camille Kamga, PhD Data Sim @ Hasselt, Belgium July 213 NEW YORK CITY Population, 211 estimate: 8,244,91 Land area in square miles, 21: 32.64 Persons per square

More information

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is

More information

Predicting Flight Delays

Predicting Flight Delays Predicting Flight Delays Dieterich Lawson jdlawson@stanford.edu William Castillo will.castillo@stanford.edu Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

More information

Taxicab Driver Sample Forms. 1. Tasks. Task 1. Task 2

Taxicab Driver Sample Forms. 1. Tasks. Task 1. Task 2 Sample Forms Taxi and limousine drivers drive automobiles and limousines to transport passengers. Taxi and limousine drivers are employed by taxi and other transportation service companies, or they may

More information

Analytics That Allow You To See Beyond The Cloud. By Alex Huang, Ph.D., Head of Aviation Analytics Services, The Weather Company, an IBM Business

Analytics That Allow You To See Beyond The Cloud. By Alex Huang, Ph.D., Head of Aviation Analytics Services, The Weather Company, an IBM Business Analytics That Allow You To See Beyond The Cloud By Alex Huang, Ph.D., Head of Aviation Analytics Services, The Weather Company, an IBM Business Table of Contents 3 Ways Predictive Airport Analytics Could

More information

Predicting Bike Usage for New York City s Bike Sharing System

Predicting Bike Usage for New York City s Bike Sharing System Predicting Bike Usage for New York City s Bike Sharing System Divya Singhvi 1, Somya Singhvi 1, Peter I. Frazier 1, Shane G. Henderson 1, Eoin O Mahony 2, David B. Shmoys 1, Dawn B. Woodard 1 1 School

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

IBM SPSS Direct Marketing

IBM SPSS Direct Marketing IBM Software IBM SPSS Statistics 19 IBM SPSS Direct Marketing Understand your customers and improve marketing campaigns Highlights With IBM SPSS Direct Marketing, you can: Understand your customers in

More information

Beating the MLB Moneyline

Beating the MLB Moneyline Beating the MLB Moneyline Leland Chen llxchen@stanford.edu Andrew He andu@stanford.edu 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series

More information

Assessment Management

Assessment Management Facts Using Doubles Objective To provide opportunities for children to explore and practice doubles-plus-1 and doubles-plus-2 facts, as well as review strategies for solving other addition facts. www.everydaymathonline.com

More information

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

More information

Big Data: Rethinking Text Visualization

Big Data: Rethinking Text Visualization Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important

More information

Easily Identify Your Best Customers

Easily Identify Your Best Customers IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

NYC Taxi Trip and Fare Data Analytics using BigData

NYC Taxi Trip and Fare Data Analytics using BigData NYC Taxi Trip and Fare Data Analytics using BigData Umang Patel #1 # Department of Computer Science and Engineering University of Bridgeport, USA 1 umapatel@my.bridgeport.edu Abstract As there is an amassed

More information

Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

More information

CLUSTER ANALYSIS FOR SEGMENTATION

CLUSTER ANALYSIS FOR SEGMENTATION CLUSTER ANALYSIS FOR SEGMENTATION Introduction We all understand that consumers are not all alike. This provides a challenge for the development and marketing of profitable products and services. Not every

More information

Local Information. Below you will find travel tips and estimated costs for travel both to and from the airport and around town.

Local Information. Below you will find travel tips and estimated costs for travel both to and from the airport and around town. Local Information Travel/Transportation Below you will find travel tips and estimated costs for travel both to and from the airport and around town. Transportation To/From Airports Below, a list of transportation

More information

Text Analytics Illustrated with a Simple Data Set

Text Analytics Illustrated with a Simple Data Set CSC 594 Text Mining More on SAS Enterprise Miner Text Analytics Illustrated with a Simple Data Set This demonstration illustrates some text analytic results using a simple data set that is designed to

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Microsoft Azure Machine learning Algorithms

Microsoft Azure Machine learning Algorithms Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation

More information

Public Transportation: There are also public transit trains that run from both airports (blue line from O Hare, orange from Midway).

Public Transportation: There are also public transit trains that run from both airports (blue line from O Hare, orange from Midway). General Information Traveling to Chicago Air Travel: Chicago is served by O Hare International Airport (ORD) and Midway International Airport (MDW). Cab rides from the airports can take anywhere from 30

More information

Higher Education Enrollment Marketing

Higher Education Enrollment Marketing Higher Education Enrollment Marketing THE IMPACT OF DISTANCE ON INQUIRY GENERATION CAMPAIGNS: How Custom GeoTargeting Can Produce Efficiencies sparkroom.com Contents Summary... 3 Targeting Students...

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

How to Become a Successful Email Designer

How to Become a Successful Email Designer A retailer s guide to 2015 email trends CONTENTS Summary...1 Research methodology...1 Laying down the email landscape for retailers...2 Email database maintenance...2 Good email collection practices...4

More information

Introduction... 1 Website Development... 4 Content... 7 Tools and Tracking... 19 Distribution... 20 What to Expect... 26 Next Step...

Introduction... 1 Website Development... 4 Content... 7 Tools and Tracking... 19 Distribution... 20 What to Expect... 26 Next Step... Contents Introduction... 1 Website Development... 4 Content... 7 Tools and Tracking... 19 Distribution... 20 What to Expect... 26 Next Step... 27 Introduction Your goal is to generate leads that you can

More information

Business Analytics using Data Mining

Business Analytics using Data Mining Business Analytics using Data Mining Project Report Indian School of Business Group A6 Bhushan Khandelwal 61410182 61410806 - Mahabaleshwar Bhat Mayank Gupta 61410659 61410697 - Shikhar Angra Sujay Koparde

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

Sonatype CLM Server - Dashboard. Sonatype CLM Server - Dashboard

Sonatype CLM Server - Dashboard. Sonatype CLM Server - Dashboard Sonatype CLM Server - Dashboard i Sonatype CLM Server - Dashboard Sonatype CLM Server - Dashboard ii Contents 1 Introduction 1 2 Accessing the Dashboard 3 3 Viewing CLM Data in the Dashboard 4 3.1 Filters............................................

More information

The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon

The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon ABSTRACT Effective business development strategies often begin with market segmentation,

More information

Decision Analysis. Here is the statement of the problem:

Decision Analysis. Here is the statement of the problem: Decision Analysis Formal decision analysis is often used when a decision must be made under conditions of significant uncertainty. SmartDrill can assist management with any of a variety of decision analysis

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

More information

Visualization methods for patent data

Visualization methods for patent data Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Metroworth Consulting LLC La centerra 23501 Cinco Ranch Blvd Suite G255 832-321-4053 (Office) houstonadmin@metroworth.com

Metroworth Consulting LLC La centerra 23501 Cinco Ranch Blvd Suite G255 832-321-4053 (Office) houstonadmin@metroworth.com Houston is the fourth-largest city in the United States and the largest city in the state of Texas. According to the 2010 U.S. Census, the city had a population of 2.1 million people within an area of

More information

Five Tips for Presenting Data Analyses: Telling a Good Story with Data

Five Tips for Presenting Data Analyses: Telling a Good Story with Data Five Tips for Presenting Data Analyses: Telling a Good Story with Data As a professional business or data analyst you have both the tools and the knowledge needed to analyze and understand data collected

More information

VIDEO TRANSCRIPT: Content Marketing Analyzing Your Efforts 1. Content Marketing - Analyzing Your Efforts:

VIDEO TRANSCRIPT: Content Marketing Analyzing Your Efforts 1. Content Marketing - Analyzing Your Efforts: VIDEO TRANSCRIPT: Content Marketing Analyzing Your Efforts 1 Content Marketing - Analyzing Your Efforts: This is a transcript of a presentation originally given live at the Growth Powered by Risdall Fall

More information

Data representation and analysis in Excel

Data representation and analysis in Excel Page 1 Data representation and analysis in Excel Let s Get Started! This course will teach you how to analyze data and make charts in Excel so that the data may be represented in a visual way that reflects

More information

Foundations of Artificial Intelligence. Introduction to Data Mining

Foundations of Artificial Intelligence. Introduction to Data Mining Foundations of Artificial Intelligence Introduction to Data Mining Objectives Data Mining Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees Present

More information

Data Mining and Visualization

Data Mining and Visualization Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research

More information

EXAMINING DIRECT & INTERACTIVE MARKETING APPLICATIONS IN A VARIETY OF SECTORS

EXAMINING DIRECT & INTERACTIVE MARKETING APPLICATIONS IN A VARIETY OF SECTORS 1 CHAPTER 14-2E EXAMINING DIRECT & INTERACTIVE MARKETING APPLICATIONS IN A VARIETY OF SECTORS Summary The applications of direct and interactive marketing are almost endless. This chapter has explored

More information

P3.8 INTEGRATING A DOPPLER SODAR WITH NUCLEAR POWER PLANT METEOROLOGICAL DATA. Thomas E. Bellinger

P3.8 INTEGRATING A DOPPLER SODAR WITH NUCLEAR POWER PLANT METEOROLOGICAL DATA. Thomas E. Bellinger P3.8 INTEGRATING A DOPPLER SODAR WITH NUCLEAR POWER PLANT METEOROLOGICAL DATA Thomas E. Bellinger Illinois Emergency Management Agency Springfield, Illinois 1. INTRODUCTION A Doppler sodar owned by the

More information

Maschinelles Lernen mit MATLAB

Maschinelles Lernen mit MATLAB Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical

More information

IBM SPSS Direct Marketing 19

IBM SPSS Direct Marketing 19 IBM SPSS Direct Marketing 19 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This document contains proprietary information of SPSS

More information

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

More information

IBM SPSS Direct Marketing 20

IBM SPSS Direct Marketing 20 IBM SPSS Direct Marketing 20 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This edition applies to IBM SPSS Statistics 20 and to

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Visualizing Relationships and Connections in Complex Data Using Network Diagrams in SAS Visual Analytics

Visualizing Relationships and Connections in Complex Data Using Network Diagrams in SAS Visual Analytics Paper 3323-2015 Visualizing Relationships and Connections in Complex Data Using Network Diagrams in SAS Visual Analytics ABSTRACT Stephen Overton, Ben Zenick, Zencos Consulting Network diagrams in SAS

More information

10k. 8-week training program

10k. 8-week training program 10k 8-week training program T H E G O A L O F T H I S P L A N I S N T T O G E T Y O U A C R O S S T H E F I N I S H L I N E, I T S T O G E T T H E B E S T V E R S I O N O F Y O U A C R O S S T H E F I

More information

ALGEBRA. sequence, term, nth term, consecutive, rule, relationship, generate, predict, continue increase, decrease finite, infinite

ALGEBRA. sequence, term, nth term, consecutive, rule, relationship, generate, predict, continue increase, decrease finite, infinite ALGEBRA Pupils should be taught to: Generate and describe sequences As outcomes, Year 7 pupils should, for example: Use, read and write, spelling correctly: sequence, term, nth term, consecutive, rule,

More information

Climate and Weather. This document explains where we obtain weather and climate data and how we incorporate it into metrics:

Climate and Weather. This document explains where we obtain weather and climate data and how we incorporate it into metrics: OVERVIEW Climate and Weather The climate of the area where your property is located and the annual fluctuations you experience in weather conditions can affect how much energy you need to operate your

More information

PROBABILITY SECOND EDITION

PROBABILITY SECOND EDITION PROBABILITY SECOND EDITION Table of Contents How to Use This Series........................................... v Foreword..................................................... vi Basics 1. Probability All

More information

ROOT: A data mining tool from CERN What can actuaries do with it?

ROOT: A data mining tool from CERN What can actuaries do with it? ROOT: A data mining tool from CERN What can actuaries do with it? Ravi Kumar, Senior Manager, Deloitte Consulting LLP Lucas Lau, Senior Consultant, Deloitte Consulting LLP Southern California Casualty

More information

Colour Image Segmentation Technique for Screen Printing

Colour Image Segmentation Technique for Screen Printing 60 R.U. Hewage and D.U.J. Sonnadara Department of Physics, University of Colombo, Sri Lanka ABSTRACT Screen-printing is an industry with a large number of applications ranging from printing mobile phone

More information

Please be sure to save a copy of this activity to your computer!

Please be sure to save a copy of this activity to your computer! Thank you for your purchase Please be sure to save a copy of this activity to your computer! This activity is copyrighted by AIMS Education Foundation. All rights reserved. No part of this work may be

More information

An innovative approach combining industrial process data analytics and operator participation to implement lean energy programs: A Case Study

An innovative approach combining industrial process data analytics and operator participation to implement lean energy programs: A Case Study An innovative approach combining industrial process data analytics and operator participation to implement lean energy programs: A Case Study Philippe Mack, Pepite SA Joanna Huddleston, Pepite SA Bernard

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Analysis of Competitive Edge College Advisors Google Adwords Campaign 6/23/15 Carter Jensen

Analysis of Competitive Edge College Advisors Google Adwords Campaign 6/23/15 Carter Jensen Analysis of Competitive Edge College Advisors Google Adwords Campaign 6/23/15 Carter Jensen I. Introduction Competitive Edge College Advisors (CECA) would like to maximize revenue per click by developing

More information

Five Ways Retailers Can Profit from Customer Intelligence

Five Ways Retailers Can Profit from Customer Intelligence Five Ways Retailers Can Profit from Customer Intelligence Use predictive analytics to reach your best customers. An Apption Whitepaper Tel: 1-888-655-6875 Email: info@apption.com www.apption.com/customer-intelligence

More information

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

More information

Attend Part 1 (2-3pm) to get 1 point extra credit. Polo will announce on Piazza options for DL students.

Attend Part 1 (2-3pm) to get 1 point extra credit. Polo will announce on Piazza options for DL students. Attend Part 1 (2-3pm) to get 1 point extra credit. Polo will announce on Piazza options for DL students. Data Science/Data Analytics and Scaling to Big Data with MathWorks Using Data Analytics to turn

More information

Grades 7-8 Mathematics Training Test Answer Key

Grades 7-8 Mathematics Training Test Answer Key Grades -8 Mathematics Training Test Answer Key 04 . Factor 6x 9. A (3x 9) B 3(x 3) C 3(3x ) D 6(x 9) Option A is incorrect because the common factor of both terms is not and the expression is not factored

More information

Introduction to Clustering

Introduction to Clustering Introduction to Clustering Yumi Kondo Student Seminar LSK301 Sep 25, 2010 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, 2010 1 / 36 Microarray Example N=65 P=1756 Yumi

More information

Fast Analytics on Big Data with H20

Fast Analytics on Big Data with H20 Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,

More information

"SEO vs. PPC The Final Round"

SEO vs. PPC The Final Round "SEO vs. PPC The Final Round" A Research Study by Engine Ready, Inc. Examining The Role Traffic Source Plays in Visitor Purchase Behavior January 2008 Table of Contents Introduction 3 Definitions 4 Methodology

More information

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,

More information

Paper 232-2012. Getting to the Good Part of Data Analysis: Data Access, Manipulation, and Customization Using JMP

Paper 232-2012. Getting to the Good Part of Data Analysis: Data Access, Manipulation, and Customization Using JMP Paper 232-2012 Getting to the Good Part of Data Analysis: Data Access, Manipulation, and Customization Using JMP Audrey Ventura, SAS Institute Inc., Cary, NC ABSTRACT Effective data analysis requires easy

More information

Take value-add on a test drive. Explore smarter ways to evaluate phone data providers.

Take value-add on a test drive. Explore smarter ways to evaluate phone data providers. White Paper Take value-add on a test drive. Explore smarter ways to evaluate phone data providers. Employing an effective debt-collection strategy with the right information solutions provider helps increase

More information

California Treasures High-Frequency Words Scope and Sequence K-3

California Treasures High-Frequency Words Scope and Sequence K-3 California Treasures High-Frequency Words Scope and Sequence K-3 Words were selected using the following established frequency lists: (1) Dolch 220 (2) Fry 100 (3) American Heritage Top 150 Words in English

More information

Omatics User s Guide

Omatics User s Guide Omatics User s Guide Web Interface V2.1 User s Guide Table of Contents I. Introduction... 2 II. The Omatics Interface... 3 III. Omatics Functionality... 4 A. Launching Omatics... 4 B. Viewing current vehicle

More information

EVsdrop. Services & Deliverables. Event Listening, Learning & Insights

EVsdrop. Services & Deliverables. Event Listening, Learning & Insights EVsdrop Services & Deliverables Event Listening, Learning & Insights 1 I. Background & How it works Themes 2 Background Performance Research is a market research and intelligence company that specializes

More information

Consumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis

Consumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis Consumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis (Version 1.17) For validation Document version 0.1 7/7/2014 Contents What is SAP Predictive Analytics?... 3

More information

Monday Morning Data Mining

Monday Morning Data Mining Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline: - data mining - IceCube - Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik

More information

Data Analysis of Trends in iphone 5 Sales on ebay

Data Analysis of Trends in iphone 5 Sales on ebay Data Analysis of Trends in iphone 5 Sales on ebay By Wenyu Zhang Mentor: Professor David Aldous Contents Pg 1. Introduction 3 2. Data and Analysis 4 2.1 Description of Data 4 2.2 Retrieval of Data 5 2.3

More information

Statistics and Probability

Statistics and Probability Statistics and Probability TABLE OF CONTENTS 1 Posing Questions and Gathering Data. 2 2 Representing Data. 7 3 Interpreting and Evaluating Data 13 4 Exploring Probability..17 5 Games of Chance 20 6 Ideas

More information

Predictive Analytics

Predictive Analytics Predictive Analytics How many of you used predictive today? 2015 SAP SE. All rights reserved. 2 2015 SAP SE. All rights reserved. 3 How can you apply predictive to your business? Predictive Analytics is

More information

CENTRAL PARK TEMPERATURE THREE RADICALLY DIFFERENT US GOVERNMENT VERSIONS O

CENTRAL PARK TEMPERATURE THREE RADICALLY DIFFERENT US GOVERNMENT VERSIONS O CENTRAL PARK TEMPERATURE THREE RADICALLY DIFFERENT US GOVERNMENT VERSIONS O ur national centers regard station data as critical to measure recent climate change. The raw observations are taken from the

More information

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

Customer Life Time Value

Customer Life Time Value Customer Life Time Value Tomer Kalimi, Jacob Zahavi and Ronen Meiri Contents Introduction... 2 So what is the LTV?... 2 LTV in the Gaming Industry... 3 The Modeling Process... 4 Data Modeling... 5 The

More information

Quick Start. Creating a Scoring Application. RStat. Based on a Decision Tree Model

Quick Start. Creating a Scoring Application. RStat. Based on a Decision Tree Model Creating a Scoring Application Based on a Decision Tree Model This Quick Start guides you through creating a credit-scoring application in eight easy steps. Quick Start Century Corp., an electronics retailer,

More information

Retail / E-commerce. Turning Big Data (and Little) Into Actionable Intelligence and Customer Profitability. Case Study ebook. Unlocking Profitability.

Retail / E-commerce. Turning Big Data (and Little) Into Actionable Intelligence and Customer Profitability. Case Study ebook. Unlocking Profitability. shop shop shop shop Retail / E-commerce Turning Big Data (and Little) Into Actionable Intelligence and Customer Profitability Part 3 in a series of 5 ebooks on intelligent customer engagement Case Study

More information

InfiniteInsight 6.5 sp4

InfiniteInsight 6.5 sp4 End User Documentation Document Version: 1.0 2013-11-19 CUSTOMER InfiniteInsight 6.5 sp4 Toolkit User Guide Table of Contents Table of Contents About this Document 3 Common Steps 4 Selecting a Data Set...

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Math Content by Strand 1

Math Content by Strand 1 Patterns, Functions, and Change Math Content by Strand 1 Kindergarten Kindergarten students construct, describe, extend, and determine what comes next in repeating patterns. To identify and construct repeating

More information

Executive Summary. Viability of the Return of a Major League Baseball Franchise to Montreal (the Expos )

Executive Summary. Viability of the Return of a Major League Baseball Franchise to Montreal (the Expos ) Executive Summary Viability of the Return of a Major League Baseball Franchise to Montreal (the Expos ) November 2013 Table of Contents 1. CONTEXT AND OBJECTIVES... 3 2. RESEARCH METHODS... 5 3. KEY RESULTS...

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

How To Make A Credit Risk Model For A Bank Account

How To Make A Credit Risk Model For A Bank Account TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

PREDICTIVE ANALYTICS VS. HOTSPOTTING

PREDICTIVE ANALYTICS VS. HOTSPOTTING PREDICTIVE ANALYTICS VS. HOTSPOTTING A STUDY OF CRIME PREVENTION ACCURACY AND EFFICIENCY EXECUTIVE SUMMARY For the last 20 years, Hot Spots have become law enforcement s predominant tool for crime analysis.

More information

«VISUALIZATION OF POTENTIAL CUSTOMERS»

«VISUALIZATION OF POTENTIAL CUSTOMERS» «VISUALIZATION OF POTENTIAL CUSTOMERS» Cubas Saiz, Tinguaro. Pérez Bello, Miguel. Rodríguez Pardo, Guillermo. Team: ETSII ULL Motivations We love to innovate in developing software and this contest gives

More information

Guide to PanAm Agent and Online Booking Tool Services!

Guide to PanAm Agent and Online Booking Tool Services! Guide to PanAm Agent and Online Booking Tool Services Sections: 1. Getting Started with PanAm 2. Booking with An Agent 3. TripCase 4. Online Booking Tool - Logging In & Completing Your Profile 5. Book

More information

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago sagarikaprusty@gmail.com Keywords:

More information

Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data

Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data Proceedings of Student-Faculty Research Day, CSIS, Pace University, May 2 nd, 2014 Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition

More information