Understanding a Fan Base Beyond the Ballpark: A statistical analysis of the transportation to major sporting events in NYC
|
|
- Marsha Dorsey
- 7 years ago
- Views:
Transcription
1 Understanding a Fan Base Beyond the Ballpark: A statistical analysis of the transportation to major sporting events in NYC 1. Abstract Paper Track: Business of Sports Paper ID: 1562 The sports industry has a distinct advantage over many other B2C industries in that its customers, or fans, routinely make irrational decisions fueled by passion. As sports organizations increasingly adopt loyalty programs, apps, and new technologies in an effort to understand their fans interaction, involvement, and habits inside the stadium, we extend the focus of our analysis outside of stadium walls. Our analysis takes a look at openly available data in the NYC region in order to better understand event-related transportation decisions. Understanding these decisions and incorporating a data driven approach could enable sports organizations to tailor outreach, and predict engagement levels for the fan base and improve the overall fan experience. For our analysis, we analyzed five open source data sets: 2014 NYC Yellow and Green Cab Trip Information, NYC Uber data, NYC demographic data, NYC weather data, and relevant baseball game information for the New York Mets and the New York Yankees 2014 season. From there, we conducted exploratory data analysis, using geolocation and time data to observe taxi and transportation flow. We applied both unsupervised and supervised machine learning algorithms to determine clusters of activity to and from some of NYC s largest observed sporting events and also to gain a better understanding of fan behavior leading up to and after the game. Based on the lessons learned from these machine learned clusters, we employed a set of statistical models to effectively determine feature importance and to estimate potential origins and destinations for future events given the observable variables. 2. Introduction Throughout the 2014 baseball season, over 66,000 fans took private yellow or green taxis as a means of transportation to New York Yankees and New York Mets home games. Even more interesting, these fans came from over 200 unique zip codes spanning multiple boroughs, illustrating the reach of both teams throughout the New York City area. Given that New York City is world renowned for the extent of its public transportation system, the fact that fans would view a taxi as a viable alternative illustrates the consumer s willingness to take alternative private transportation to attend sporting events. Delving deeper into yellow and green NYC taxi data, Exhibit 1 illustrates an increased willingness to pay demonstrated by fans attending baseball games in New York. As shown, on average, fans of both the New York Yankees and the New York Mets are willing to pay increased fares to arrive at the ballpark. Interestingly, as we look at cab fares terminating in the vicinity of Citi Field, we observe multiple peaks in terms of fare paid. This distribution warranted further investigation to 1
2 determine its root cause. Figure 1: Illustration of Cab Fares by Classified Destination We observed a median total Yankees game fare of approximately $16.50, for Mets games, $ This all compares to the median general NYC cab total fare of $ To us, this represents the premium that fans put on going to sporting events. This willingness to spend represents a clear opportunity for teams to further own the fan experience beyond the stadium. 3. Data and Preparation Initially, we used exploratory data analysis in order to better understand baseball fans taxi originations and the demographics of those areas. New York City provides the perfect platform for this analysis, given its two fan bases with similar means to both arrive at and leave baseball games. The raw dataset was obtained from the NYC cab website[1]. We specifically obtained data from March-September of 2014, as we were looking at the 2014 seasons for both teams.. When combined into one, the raw dataset had over 81 million rows with the following relevant columns: pickup_datetime, dropoff_datetime, passenger_count, trip_distance, pickup_longitude, pickup_latitude, dropoff_longitude, dropoff_latitude, fare_amount, total_amount. In order to filter this dataset, we took the longitude and latitude with a 0.5 mile radius around Yankee Stadium ( , ) and Citi Field ( , ). Next, we went to Baseball Reference[2] and Retrosheet[3] to obtain all relevant baseball information, including game start times and end times. Using start times, we retained all cabs arriving two hours or less before start, and one hour after, as we assume some fans arrive late. Conversely, we considered taxis that left a game 90 minutes before the end of the game and up to two 2
3 hours after, as many fans remain in the area immediately following a game. This greatly reduced the raw dataset to the following value counts: Next, we obtain respect zip codes and boroughs from latitude and longitude data, which allowed us to consider Census demographic information, which is parsed by zip code. We retained zip codes with n>100 rides, in order to avoid a few rides skewing data. Interestingly, there were plenty of taxis coming straight from major transit hubs such as LaGuardia and JFK airports. This left us with a total of 66 unique zip codes. Next, we applied various unsupervised learning algorithms to the data. This is useful to determine clusters without target variables. The most effective algorithm that we found was k-means, which is widely used across many different sectors. Within k-means, the Euclidean distance metric is calculated, and from there, it becomes an optimization problem with the goal of minimizing the cluster sum of squared errors (also called cluster inertia). One of the main criticisms of k-means is that the number of clusters must be specified beforehand. We initially used an x-means algorithm which automatically selects the number of clusters (the k) by optimization; however, similar to marketing functions that often run campaigns of different complexity, we decided to use the elbow method with k-means. Generally, elbow methods mitigate some drawbacks of clustering algorithms and give a visual representation of the number of clusters and their corresponding distortion. Figure 2: k-means cluster analysis 3
4 In Exhibit 2, there are two distinct areas where we observe elbows - at 3 and at 6. Next, we analyzed each of the clusters, as shown in Exhibits 3 and 4. Figure 3: Income, Ethnicity, Marital Status by Cluster Figure 4: Education by Cluster Next, we visually analyzed the arrival patterns of each cluster. In Exhibits 5 and 6, we observe New York Yankees and New York Mets traffic by departure location for each taxi that arrived near the stadium. The points are color coded to represent the cluster group that they belong to. 4. Data Visualization Mets: 4
5 Figure 5: Departure Location by Cluster for Mets Games Yankees: Figure 6: Departure Location by Cluster for Yankees Games Based on these visualizations, we can clearly see the battleground areas for Mets and Yankees Fans. While Yankees fans show a greater density, especially in expected areas such as the Bronx, there are similar concentrations of the second cluster in Mid-town and Long Island City. 5. Predicting traffic flow While understanding the types of fans through the aggregate season taxi amount is important, it is also of significance to understand and predict the amount of taxis coming to any individual game. By predicting the traffic flow, a team can potentially extend the game experience to hours before the game when fans are beginning their journey to the stadium. An interesting aspect of fan experience, is that the simple act of arriving at the stadium can greatly impact the likelihood that they will return. Based on this assumption, we employed different machine learning algorithms to best determine how a team can predict taxi flow to a baseball game, and the best performing machine learning algorithm was a random forest. We used an input dataset with the following columns: team power ranking, opponent power ranking, month of season, wind speed, game promotion, team rank, day or night, attendance, day of week, number of game streak, start time hour, games behind, weather summary, precipitation intensity, and temperature. 5
6 All of the columns were numeric values except for Day of Week (M/T/W/R/F/Sat/Sun), Day/Night, weather summary (clear/rain), and game promotion (Y/N). In order to get our target variable of taxi rides, we took raw taxi data set and grouped by the date and whether it was a pickup and dropoff; subsequently, a count was applied and merged with the baseball in dataset, leaving us with the total amount of taxis coming to the respective stadiums. We trained a random forest of exactly 10,000 trees for both teams and calculated the mean absolute value. Here were the following results: Mets Train/Test Random Forest mean absolute value: Yankees Train/Test mean absolute value: Mets Out of Bag mean absolute value: Yankees Out of Bag mean absolute value: One of the drawbacks is that a random forest is tough to interpret, as it is implicitly a black box machine learning algorithm. Nonetheless the increase in accuracy is enough to justify its use. Below is a histogram of showing the residuals from an out-of-the-bag random forest regression. While the train and test with a 25% split performed better, due to the size of dataset, with only one season, the more unbiased prediction is obtained using outof-the-bag random forests. Overall, this points to confidence that given a large taxi flow, by using open source and attainable data, teams can both estimate taxi flow, understand where taxis are coming from, and understand the type of fans that ride them to games. Figure 6: Histogram of random forest residuals 6
7 One of the benefits of using a random forest is to gain an understanding of the feature importance of the input variables. Feature importance can be calculated by looking at the average impurity that is decreased from all the decision trees in the process. The average feature importance between both teams is shown in the data visualization below. Here, we explore the top 12 features, as everything else proved to be of minimal importance. Figure 7: Feature Importance of Random Forest Model A cursory review of feature importance reveals some obvious conclusions better attended games are more important, due to volume effects, for example. However, we found it interesting that the temperature was viewed as very important, but other potentially inclement weather variables, such as rain, did not seem to have an impact. Below we compare two of these variables (rain and temperature), temperature having high predictive value, and rain having low. Visualizing cabs by team, we notice distinct differences between their distribution, despite similar n of taxi (rainy-day cabs had about 10% more pickups). In particular, for Mets fans, there is a thick concentration of Midtown pickups that is absent for hot days absent from rainy days: 7
8 Figure 8: Total Cabs by type of weather (NYM cabs in blue, NYY cabs in orange) Interestingly, the same trends do not apply to the Yankees fan base. Nonetheless, the difference between these two graphs can hold considerable value for an organization for further discussion, see Section 6 below. These nuances can potentially be explained one might conclude, for instance that when it is heavily raining, it is often hard to get a cab in general; however, when the weather is hotter, consumers may feel willing to spend the extra money to avoid the Manhattan heat, but competition for taxis would not be as fierce. Also of interest is that the opponent s power ranking proved to be more influential than the actual home team; having a grasp of the opponent clearly helps to determine how many fans will choose to take taxis to arrive at the ballpark. 6. Takeaways Throughout this paper, we have noted salient trends and made specific predictions about cab flow, using demographics, baseball, and weather data to buttress our analyses. Nonetheless, it is important to note what organizations can do with this type of information. 8
9 Not only can teams use this type of information to predict cab flow, they can also connect these insights to custom promotions for example, noting the surfeit of cab rides coming out of Midtown on hot days, the New York Mets might offer an air-conditioned party-bus that picks up at known or predicted cab hotspots, leveraging their data to better connect with their fans. In addition, they can make use of the battleground areas, such as Long Island City, in order to focus their marketing and cultivate deeper fan relationships in these areas. Finally, organizations can hone in on demographic-tinged clusters to predict not only where certain fans will travel from, but also what type of fan (and what purchasing power they may be likely to have) comes from what area, in order to provide those areas with custom messages and marketing. 7. Conclusion The rise of analytics and data collection has led to new business opportunities in sports that never existed. While teams have historically looked to control the fan experience in the stadium, we believe the next natural step is to utilize analytics to extend the fan experience outside the stadium and before the game. Through a deeper understanding of these transport-related decisions, organizations can begin to understand the passion that fans exhibit and provide unparalleled access, experiences, and offerings that take advantage of this data. Organizations can ensure a memorable experience from the moment a fan begins to make their way to the ballpark and again as they continue with their day after the game. The potential to use analytics to control the fan experience is a new and exciting step in the evolving landscape of sports business analytics; teams at the forefront of these advances will reap considerable benefits as they connect further with their fan-base in all aspects of the game day experience. 9
10 References [1] [2] [3] 10
11 Appendix An appendix is not required, but if you have one please include it here. 11
Highly Scalable Tile-Based Visualization for Exploratory Data Analysis
Highly Scalable Tile-Based Visualization for Exploratory Data Analysis Strata NY: Hadoop and Beyond, 10/17/2014 David Jonker, Rob Harper 2014 OCULUS INFO INC. Making Sense of Big Data Big Data for us is
More informationCAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION
CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION N PROBLEM DEFINITION Opportunity New Booking - Time of Arrival Shortest Route (Distance/Time) Taxi-Passenger Demand Distribution Value Accurate
More informationMaking Sense of Urban Data
Making Sense of Urban Data Anil Yazici, PhD Camille Kamga, PhD Data Sim @ Hasselt, Belgium July 213 NEW YORK CITY Population, 211 estimate: 8,244,91 Land area in square miles, 21: 32.64 Persons per square
More informationLocation matters. 3 techniques to incorporate geo-spatial effects in one's predictive model
Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is
More informationPredicting Flight Delays
Predicting Flight Delays Dieterich Lawson jdlawson@stanford.edu William Castillo will.castillo@stanford.edu Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationCOPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
More informationTaxicab Driver Sample Forms. 1. Tasks. Task 1. Task 2
Sample Forms Taxi and limousine drivers drive automobiles and limousines to transport passengers. Taxi and limousine drivers are employed by taxi and other transportation service companies, or they may
More informationAnalytics That Allow You To See Beyond The Cloud. By Alex Huang, Ph.D., Head of Aviation Analytics Services, The Weather Company, an IBM Business
Analytics That Allow You To See Beyond The Cloud By Alex Huang, Ph.D., Head of Aviation Analytics Services, The Weather Company, an IBM Business Table of Contents 3 Ways Predictive Airport Analytics Could
More informationPredicting Bike Usage for New York City s Bike Sharing System
Predicting Bike Usage for New York City s Bike Sharing System Divya Singhvi 1, Somya Singhvi 1, Peter I. Frazier 1, Shane G. Henderson 1, Eoin O Mahony 2, David B. Shmoys 1, Dawn B. Woodard 1 1 School
More informationData Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
More informationIBM SPSS Direct Marketing
IBM Software IBM SPSS Statistics 19 IBM SPSS Direct Marketing Understand your customers and improve marketing campaigns Highlights With IBM SPSS Direct Marketing, you can: Understand your customers in
More informationBeating the MLB Moneyline
Beating the MLB Moneyline Leland Chen llxchen@stanford.edu Andrew He andu@stanford.edu 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series
More informationAssessment Management
Facts Using Doubles Objective To provide opportunities for children to explore and practice doubles-plus-1 and doubles-plus-2 facts, as well as review strategies for solving other addition facts. www.everydaymathonline.com
More informationBIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
More informationBig Data: Rethinking Text Visualization
Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important
More informationEasily Identify Your Best Customers
IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do
More informationIn this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
More informationNYC Taxi Trip and Fare Data Analytics using BigData
NYC Taxi Trip and Fare Data Analytics using BigData Umang Patel #1 # Department of Computer Science and Engineering University of Bridgeport, USA 1 umapatel@my.bridgeport.edu Abstract As there is an amassed
More informationUsing multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
More informationCLUSTER ANALYSIS FOR SEGMENTATION
CLUSTER ANALYSIS FOR SEGMENTATION Introduction We all understand that consumers are not all alike. This provides a challenge for the development and marketing of profitable products and services. Not every
More informationLocal Information. Below you will find travel tips and estimated costs for travel both to and from the airport and around town.
Local Information Travel/Transportation Below you will find travel tips and estimated costs for travel both to and from the airport and around town. Transportation To/From Airports Below, a list of transportation
More informationText Analytics Illustrated with a Simple Data Set
CSC 594 Text Mining More on SAS Enterprise Miner Text Analytics Illustrated with a Simple Data Set This demonstration illustrates some text analytic results using a simple data set that is designed to
More informationSTATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationMicrosoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
More informationPublic Transportation: There are also public transit trains that run from both airports (blue line from O Hare, orange from Midway).
General Information Traveling to Chicago Air Travel: Chicago is served by O Hare International Airport (ORD) and Midway International Airport (MDW). Cab rides from the airports can take anywhere from 30
More informationHigher Education Enrollment Marketing
Higher Education Enrollment Marketing THE IMPACT OF DISTANCE ON INQUIRY GENERATION CAMPAIGNS: How Custom GeoTargeting Can Produce Efficiencies sparkroom.com Contents Summary... 3 Targeting Students...
More informationIBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
More informationPredict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationHow to Become a Successful Email Designer
A retailer s guide to 2015 email trends CONTENTS Summary...1 Research methodology...1 Laying down the email landscape for retailers...2 Email database maintenance...2 Good email collection practices...4
More informationIntroduction... 1 Website Development... 4 Content... 7 Tools and Tracking... 19 Distribution... 20 What to Expect... 26 Next Step...
Contents Introduction... 1 Website Development... 4 Content... 7 Tools and Tracking... 19 Distribution... 20 What to Expect... 26 Next Step... 27 Introduction Your goal is to generate leads that you can
More informationBusiness Analytics using Data Mining
Business Analytics using Data Mining Project Report Indian School of Business Group A6 Bhushan Khandelwal 61410182 61410806 - Mahabaleshwar Bhat Mayank Gupta 61410659 61410697 - Shikhar Angra Sujay Koparde
More informationIBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
More informationSonatype CLM Server - Dashboard. Sonatype CLM Server - Dashboard
Sonatype CLM Server - Dashboard i Sonatype CLM Server - Dashboard Sonatype CLM Server - Dashboard ii Contents 1 Introduction 1 2 Accessing the Dashboard 3 3 Viewing CLM Data in the Dashboard 4 3.1 Filters............................................
More informationThe Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon
The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon ABSTRACT Effective business development strategies often begin with market segmentation,
More informationDecision Analysis. Here is the statement of the problem:
Decision Analysis Formal decision analysis is often used when a decision must be made under conditions of significant uncertainty. SmartDrill can assist management with any of a variety of decision analysis
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
More informationVisualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationMetroworth Consulting LLC La centerra 23501 Cinco Ranch Blvd Suite G255 832-321-4053 (Office) houstonadmin@metroworth.com
Houston is the fourth-largest city in the United States and the largest city in the state of Texas. According to the 2010 U.S. Census, the city had a population of 2.1 million people within an area of
More informationFive Tips for Presenting Data Analyses: Telling a Good Story with Data
Five Tips for Presenting Data Analyses: Telling a Good Story with Data As a professional business or data analyst you have both the tools and the knowledge needed to analyze and understand data collected
More informationVIDEO TRANSCRIPT: Content Marketing Analyzing Your Efforts 1. Content Marketing - Analyzing Your Efforts:
VIDEO TRANSCRIPT: Content Marketing Analyzing Your Efforts 1 Content Marketing - Analyzing Your Efforts: This is a transcript of a presentation originally given live at the Growth Powered by Risdall Fall
More informationData representation and analysis in Excel
Page 1 Data representation and analysis in Excel Let s Get Started! This course will teach you how to analyze data and make charts in Excel so that the data may be represented in a visual way that reflects
More informationFoundations of Artificial Intelligence. Introduction to Data Mining
Foundations of Artificial Intelligence Introduction to Data Mining Objectives Data Mining Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees Present
More informationData Mining and Visualization
Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research
More informationEXAMINING DIRECT & INTERACTIVE MARKETING APPLICATIONS IN A VARIETY OF SECTORS
1 CHAPTER 14-2E EXAMINING DIRECT & INTERACTIVE MARKETING APPLICATIONS IN A VARIETY OF SECTORS Summary The applications of direct and interactive marketing are almost endless. This chapter has explored
More informationP3.8 INTEGRATING A DOPPLER SODAR WITH NUCLEAR POWER PLANT METEOROLOGICAL DATA. Thomas E. Bellinger
P3.8 INTEGRATING A DOPPLER SODAR WITH NUCLEAR POWER PLANT METEOROLOGICAL DATA Thomas E. Bellinger Illinois Emergency Management Agency Springfield, Illinois 1. INTRODUCTION A Doppler sodar owned by the
More informationMaschinelles Lernen mit MATLAB
Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical
More informationIBM SPSS Direct Marketing 19
IBM SPSS Direct Marketing 19 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This document contains proprietary information of SPSS
More informationARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
More informationIBM SPSS Direct Marketing 20
IBM SPSS Direct Marketing 20 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This edition applies to IBM SPSS Statistics 20 and to
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationVisualizing Relationships and Connections in Complex Data Using Network Diagrams in SAS Visual Analytics
Paper 3323-2015 Visualizing Relationships and Connections in Complex Data Using Network Diagrams in SAS Visual Analytics ABSTRACT Stephen Overton, Ben Zenick, Zencos Consulting Network diagrams in SAS
More information10k. 8-week training program
10k 8-week training program T H E G O A L O F T H I S P L A N I S N T T O G E T Y O U A C R O S S T H E F I N I S H L I N E, I T S T O G E T T H E B E S T V E R S I O N O F Y O U A C R O S S T H E F I
More informationALGEBRA. sequence, term, nth term, consecutive, rule, relationship, generate, predict, continue increase, decrease finite, infinite
ALGEBRA Pupils should be taught to: Generate and describe sequences As outcomes, Year 7 pupils should, for example: Use, read and write, spelling correctly: sequence, term, nth term, consecutive, rule,
More informationClimate and Weather. This document explains where we obtain weather and climate data and how we incorporate it into metrics:
OVERVIEW Climate and Weather The climate of the area where your property is located and the annual fluctuations you experience in weather conditions can affect how much energy you need to operate your
More informationPROBABILITY SECOND EDITION
PROBABILITY SECOND EDITION Table of Contents How to Use This Series........................................... v Foreword..................................................... vi Basics 1. Probability All
More informationROOT: A data mining tool from CERN What can actuaries do with it?
ROOT: A data mining tool from CERN What can actuaries do with it? Ravi Kumar, Senior Manager, Deloitte Consulting LLP Lucas Lau, Senior Consultant, Deloitte Consulting LLP Southern California Casualty
More informationColour Image Segmentation Technique for Screen Printing
60 R.U. Hewage and D.U.J. Sonnadara Department of Physics, University of Colombo, Sri Lanka ABSTRACT Screen-printing is an industry with a large number of applications ranging from printing mobile phone
More informationPlease be sure to save a copy of this activity to your computer!
Thank you for your purchase Please be sure to save a copy of this activity to your computer! This activity is copyrighted by AIMS Education Foundation. All rights reserved. No part of this work may be
More informationAn innovative approach combining industrial process data analytics and operator participation to implement lean energy programs: A Case Study
An innovative approach combining industrial process data analytics and operator participation to implement lean energy programs: A Case Study Philippe Mack, Pepite SA Joanna Huddleston, Pepite SA Bernard
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationAnalysis of Competitive Edge College Advisors Google Adwords Campaign 6/23/15 Carter Jensen
Analysis of Competitive Edge College Advisors Google Adwords Campaign 6/23/15 Carter Jensen I. Introduction Competitive Edge College Advisors (CECA) would like to maximize revenue per click by developing
More informationFive Ways Retailers Can Profit from Customer Intelligence
Five Ways Retailers Can Profit from Customer Intelligence Use predictive analytics to reach your best customers. An Apption Whitepaper Tel: 1-888-655-6875 Email: info@apption.com www.apption.com/customer-intelligence
More informationKnowledge Discovery and Data Mining. Structured vs. Non-Structured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.
More informationAttend Part 1 (2-3pm) to get 1 point extra credit. Polo will announce on Piazza options for DL students.
Attend Part 1 (2-3pm) to get 1 point extra credit. Polo will announce on Piazza options for DL students. Data Science/Data Analytics and Scaling to Big Data with MathWorks Using Data Analytics to turn
More informationGrades 7-8 Mathematics Training Test Answer Key
Grades -8 Mathematics Training Test Answer Key 04 . Factor 6x 9. A (3x 9) B 3(x 3) C 3(3x ) D 6(x 9) Option A is incorrect because the common factor of both terms is not and the expression is not factored
More informationIntroduction to Clustering
Introduction to Clustering Yumi Kondo Student Seminar LSK301 Sep 25, 2010 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, 2010 1 / 36 Microarray Example N=65 P=1756 Yumi
More informationFast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
More information"SEO vs. PPC The Final Round"
"SEO vs. PPC The Final Round" A Research Study by Engine Ready, Inc. Examining The Role Traffic Source Plays in Visitor Purchase Behavior January 2008 Table of Contents Introduction 3 Definitions 4 Methodology
More informationPredictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD
Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,
More informationPaper 232-2012. Getting to the Good Part of Data Analysis: Data Access, Manipulation, and Customization Using JMP
Paper 232-2012 Getting to the Good Part of Data Analysis: Data Access, Manipulation, and Customization Using JMP Audrey Ventura, SAS Institute Inc., Cary, NC ABSTRACT Effective data analysis requires easy
More informationTake value-add on a test drive. Explore smarter ways to evaluate phone data providers.
White Paper Take value-add on a test drive. Explore smarter ways to evaluate phone data providers. Employing an effective debt-collection strategy with the right information solutions provider helps increase
More informationCalifornia Treasures High-Frequency Words Scope and Sequence K-3
California Treasures High-Frequency Words Scope and Sequence K-3 Words were selected using the following established frequency lists: (1) Dolch 220 (2) Fry 100 (3) American Heritage Top 150 Words in English
More informationOmatics User s Guide
Omatics User s Guide Web Interface V2.1 User s Guide Table of Contents I. Introduction... 2 II. The Omatics Interface... 3 III. Omatics Functionality... 4 A. Launching Omatics... 4 B. Viewing current vehicle
More informationEVsdrop. Services & Deliverables. Event Listening, Learning & Insights
EVsdrop Services & Deliverables Event Listening, Learning & Insights 1 I. Background & How it works Themes 2 Background Performance Research is a market research and intelligence company that specializes
More informationConsumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis
Consumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis (Version 1.17) For validation Document version 0.1 7/7/2014 Contents What is SAP Predictive Analytics?... 3
More informationMonday Morning Data Mining
Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline: - data mining - IceCube - Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik
More informationData Analysis of Trends in iphone 5 Sales on ebay
Data Analysis of Trends in iphone 5 Sales on ebay By Wenyu Zhang Mentor: Professor David Aldous Contents Pg 1. Introduction 3 2. Data and Analysis 4 2.1 Description of Data 4 2.2 Retrieval of Data 5 2.3
More informationStatistics and Probability
Statistics and Probability TABLE OF CONTENTS 1 Posing Questions and Gathering Data. 2 2 Representing Data. 7 3 Interpreting and Evaluating Data 13 4 Exploring Probability..17 5 Games of Chance 20 6 Ideas
More informationPredictive Analytics
Predictive Analytics How many of you used predictive today? 2015 SAP SE. All rights reserved. 2 2015 SAP SE. All rights reserved. 3 How can you apply predictive to your business? Predictive Analytics is
More informationCENTRAL PARK TEMPERATURE THREE RADICALLY DIFFERENT US GOVERNMENT VERSIONS O
CENTRAL PARK TEMPERATURE THREE RADICALLY DIFFERENT US GOVERNMENT VERSIONS O ur national centers regard station data as critical to measure recent climate change. The raw observations are taken from the
More informationApplied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
More informationCustomer Life Time Value
Customer Life Time Value Tomer Kalimi, Jacob Zahavi and Ronen Meiri Contents Introduction... 2 So what is the LTV?... 2 LTV in the Gaming Industry... 3 The Modeling Process... 4 Data Modeling... 5 The
More informationQuick Start. Creating a Scoring Application. RStat. Based on a Decision Tree Model
Creating a Scoring Application Based on a Decision Tree Model This Quick Start guides you through creating a credit-scoring application in eight easy steps. Quick Start Century Corp., an electronics retailer,
More informationRetail / E-commerce. Turning Big Data (and Little) Into Actionable Intelligence and Customer Profitability. Case Study ebook. Unlocking Profitability.
shop shop shop shop Retail / E-commerce Turning Big Data (and Little) Into Actionable Intelligence and Customer Profitability Part 3 in a series of 5 ebooks on intelligent customer engagement Case Study
More informationInfiniteInsight 6.5 sp4
End User Documentation Document Version: 1.0 2013-11-19 CUSTOMER InfiniteInsight 6.5 sp4 Toolkit User Guide Table of Contents Table of Contents About this Document 3 Common Steps 4 Selecting a Data Set...
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationMath Content by Strand 1
Patterns, Functions, and Change Math Content by Strand 1 Kindergarten Kindergarten students construct, describe, extend, and determine what comes next in repeating patterns. To identify and construct repeating
More informationExecutive Summary. Viability of the Return of a Major League Baseball Franchise to Montreal (the Expos )
Executive Summary Viability of the Return of a Major League Baseball Franchise to Montreal (the Expos ) November 2013 Table of Contents 1. CONTEXT AND OBJECTIVES... 3 2. RESEARCH METHODS... 5 3. KEY RESULTS...
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationHow To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationPREDICTIVE ANALYTICS VS. HOTSPOTTING
PREDICTIVE ANALYTICS VS. HOTSPOTTING A STUDY OF CRIME PREVENTION ACCURACY AND EFFICIENCY EXECUTIVE SUMMARY For the last 20 years, Hot Spots have become law enforcement s predominant tool for crime analysis.
More information«VISUALIZATION OF POTENTIAL CUSTOMERS»
«VISUALIZATION OF POTENTIAL CUSTOMERS» Cubas Saiz, Tinguaro. Pérez Bello, Miguel. Rodríguez Pardo, Guillermo. Team: ETSII ULL Motivations We love to innovate in developing software and this contest gives
More informationGuide to PanAm Agent and Online Booking Tool Services!
Guide to PanAm Agent and Online Booking Tool Services Sections: 1. Getting Started with PanAm 2. Booking with An Agent 3. TripCase 4. Online Booking Tool - Logging In & Completing Your Profile 5. Book
More informationData Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product
Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago sagarikaprusty@gmail.com Keywords:
More informationClassification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data
Proceedings of Student-Faculty Research Day, CSIS, Pace University, May 2 nd, 2014 Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition
More information