Football Match Result Predictor Website

Similar documents

Football Match Winner Prediction

Predicting outcome of soccer matches using machine learning

Prediction on Soccer Matches using Multi- Layer Perceptron

Predicting sports events from past results

Predicting Soccer Match Results in the English Premier League

FOOTBALL INVESTOR. Member s Guide 2014/15

Relational Learning for Football-Related Predictions

Numerical Algorithms for Predicting Sports Results

Predicting the World Cup. Dr Christopher Watts Centre for Research in Social Simulation University of Surrey

As per requirement, all competition webpages ran and promoted by your website will be white labelled to match the colour scheme of your brand.

Beating the NCAA Football Point Spread

An Introduction to Data Mining

Rating Systems for Fixed Odds Football Match Prediction

You re keen to start winning, and you re keen to start showing everyone that you re the great undiscovered tactical mastermind.

Betting Terms Explained

How To Predict The Outcome Of The Ncaa Basketball Tournament

The 7 Premier League trends the bookies don t want YOU to know...

The Betting Machine. Martin Belgau Ellefsrød

Proposal. Product Offering. Runlastman.com, Unit 3, Sandyford Office Park, Sandyford, Dublin 18

Analyzing Information Efficiency in the Betting Market for Association Football League Winners

Predicting Market Value of Soccer Players Using Linear Modeling Techniques

FORM LAB BLACK ADVANCED USER GUIDE

Man Vs Bookie. The 3 ways to make profit betting on Football. Man Vs Bookie Sport Betting

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

Prediction of Stock Performance Using Analytical Techniques

The most reliable cards bets in the Premier League. bets that paid out in 84%, 89% and 95% of games last season

Tennis Winner Prediction based on Time-Series History with Neural Modeling

Equity forecast: Predicting long term stock price movement using machine learning

How to Make s Using my Football Betting system. Chris Clark

Towards applying Data Mining Techniques for Talent Mangement

SECRET BETTING CLUB FINK TANK FREE SYSTEM GUIDE

Pentaho Data Mining Last Modified on January 22, 2007

THE FORM LAB 2.5 GOAL STRATEGY

Beating the MLB Moneyline

Face Recognition For Remote Database Backup System

SECRET BETTING CLUB FINK TANK FREE SYSTEM GUIDE

Decision Support System for predicting Football Game result

2. Scoring rules used in previous football forecast studies

How to cash-in on ALL football matches without ever picking a winning team

1. Classification problems

Design call center management system of e-commerce based on BP neural network and multifractal

FOOTBALL AND CASH HOW TO MAKE MONEY BETTING ON NAIRABET

How To Bet On An Nfl Football Game With A Machine Learning Program

A Tokenization and Encryption based Multi-Layer Architecture to Detect and Prevent SQL Injection Attack

Guide to Spread Betting

Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

Initial Report. Predicting association football match outcomes using social media and existing knowledge.

Azure Machine Learning, SQL Data Mining and R

EFFICIENCY IN BETTING MARKETS: EVIDENCE FROM ENGLISH FOOTBALL

Handling attrition and non-response in longitudinal data

Random Fibonacci-type Sequences in Online Gambling

澳門彩票有限公司 SLOT Sociedade de Lotarias e Apostas Mútuas de Macau, Lda. Soccer Bet Types

Automatic Inventory Control: A Neural Network Approach. Nicholas Hall

The Fibonacci Strategy Revisited: Can You Really Make Money by Betting on Soccer Draws?

Soccer Analytics. Predicting the outcome of soccer matches. Research Paper Business Analytics. December Nivard van Wijk

An Unconvered Roulette Secret designed to expose any Casino and make YOU MONEY spin after spin! Even in the LONG Run...

Application of Neural Network in User Authentication for Smart Home System

FOREX TRADING PREDICTION USING LINEAR REGRESSION LINE, ARTIFICIAL NEURAL NETWORK AND DYNAMIC TIME WARPING ALGORITHMS

INTELLIGENT ENERGY MANAGEMENT OF ELECTRICAL POWER SYSTEMS WITH DISTRIBUTED FEEDING ON THE BASIS OF FORECASTS OF DEMAND AND GENERATION Chr.

Data quality in Accounting Information Systems

Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.

Feature Subset Selection in Spam Detection

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Data mining and football results prediction

Towards better accuracy for Spam predictions

Inferring the score of a tennis match from in-play betting exchange markets

OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS

A Content based Spam Filtering Using Optical Back Propagation Technique

New Ensemble Combination Scheme

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Jordan Miller s Betting Profit Blitz

Learning to Process Natural Language in Big Data Environment

Using Monte Carlo simulation to calculate match importance: the case of English Premier League

LIVE BETTING ULTRA RULES

Support Vector Machines for Dynamic Biometric Handwriting Classification

How To Predict Web Site Visits

How To Make Big Money Betting In Naira On Football Matches!

Using Data Mining for Mobile Communication Clustering and Characterization

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

pi-football: A Bayesian network model for forecasting Association Football match outcomes

Glossary of bet types

Back Propagation Neural Networks User Manual

Chapter 6. The stacking ensemble approach

Start of Season Promotion Our Biggest Ever Offer. Massive Savings at least 66% throughout August**

Sports betting odds: A source for empirical Bayes

Learning is a very general term denoting the way in which agents:

bet365 spread betting best odds

Predictive time series analysis of stock prices using neural network classifier

Learning outcomes. Knowledge and understanding. Competence and skills

Game ON! Predicting English Premier League Match Outcomes

Commercial-In-Confidence. Report to Football1x2.com. Football Frenzy

Web Document Clustering

Soccer Bet Types Content

Statistical Football Predictions and Betting Tips

Probability, statistics and football Franka Miriam Bru ckler Paris, 2015.

UZH Business Working Paper Series (ISSN )

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Transcription:

Electronics and Computer Science Faculty of Physical and Applied Sciences University of Southampton Matthew Tucker Tuesday 13th December 2011 Football Match Result Predictor Website Project supervisor: Dr Alex Rogers Second examiner: Professor Mahesan Niranjan A project progress report submitted for the award of MEng Computer Science

Abstract Predicting the outcome of English Premier League football matches poses an interesting challenge for which it is realistically impossible to successfully do so for every match. Despite this, there is a lot of money to be made by creating a predictor with a good degree of accuracy and betting on the results. This project will build a football match predictor using Artificial Intelligence classification algorithm(s). The predictor will be hosted as a website and will display the predictions for up-coming Premier League fixtures, and an archive of previous predictions alongside the final outcome of the match. At this stage, the dataset required is in place, and a simple Multi-Layer Perceptron with two features has been successfully implemented on the website. The predictor needs to be made a lot more accurate (and therefore complex) with many more features, in particular by using data concerning which players are involved in a match. The user will be able to interact with the site, selecting which players they think will play and thus altering the prediction calculated for the match. The website will also compare the prediction calculated to bookmaker odds and show where is best to place money on bets. i

Contents Abstract i 1 Enhanced Project Description 1 2 Background Research and Literature Review 1 2.1 Research Into Existing Football Predictors............... 1 2.1.1 Data Sets Used in Previous Predictors............. 1 2.1.2 Success Rates of Algorithms................... 2 2.1.3 Features Selected......................... 2 2.1.4 Markov Models.......................... 2 2.1.5 Summary............................. 3 2.2 Research Into Classification Algorithms................ 3 2.3 Current Bookmaker Odds Success................... 3 2.3.1 Converting Bookmaker Odds Into Percentages........ 3 2.3.2 Analysing Bookmaker s Prediction Accuracy......... 4 2.3.3 Data Analysis and Project Accuracy Aims........... 5 3 Proposed Final Design 6 3.1 Languages to be Used and Website Structure............. 6 3.2 Predictor Design.............................. 6 3.3 Website................................... 7 3.3.1 Website Homepage........................ 7 3.3.2 Individual Match Page...................... 7 4 Account of Work to Date 8 4.1 Preliminary Work and Setting Up.................... 8 4.2 Research Completed........................... 8 4.3 Implementing a Basic Predictor..................... 8 5 Plan of Remaining Work 9 ii

References 10 Appendix 11 A Gantt Chart 11 B Website Mock-Ups 13 iii

1 Enhanced Project Description The website will be hosted at http://predictor.footybrain.co.uk/ and implemented in Python 1 using the Django framework 2 and a MySQL database 3. The data needed to implement the predictions will be provided by the author s own personal football data collection that has been operating electronically for over three years. For each match, the final calculations from the predictor created will be returned as a three-way percentage result - the likelihood of a home win, draw, and away win. To make this predictor differ from other existing predictors, the two teams starting line-ups will be taken into account, these generally become available around 40 minutes before the start of the match from various media sources. This will allow the predictor to adjust accordingly to any notable absentees or surprise selections within the starting line-ups before the kick-off of the match. Before the starting line-ups are available a prediction for the match can still be obtained by using probable line-ups. The prediction for a match will therefore vary, depending on the line-ups given to the predictor for both teams. The project will look at how to maximise winnings from a bookmaker, by comparing their odds, to the predictions calculated. For example, both the predictor and the bookmakers may believe the most likely outcome is a home win, but the predictor may return a much higher chance of a draw than the bookmaker is offering in their odds, so it may be worthwhile to place some money on a draw [1]. 2 Background Research and Literature Review 2.1 Research Into Existing Football Predictors Several papers exist on football predictors, using many different classification algorithms including Bayesian Networks, Neural Networks (primarily Multi- Layer Perceptron), Decision Trees and K-Nearest Neighbour (KNN) algorithm. 2.1.1 Data Sets Used in Previous Predictors The data sets varied from the 2006 World Cup tournament (64 matches) [2], to one full season of Premier League football (380 matches) [3], and even two full seasons based on one team (76 Tottenham Hotspur matches) [4]. It was concluded in [4] and [5] that a larger data set would have made their predictor more accurate, but may increase the complexity of their solution. This project 1 http://python.org/ 2 http://python.org/ 3 http://www.mysql.com/ 1

has three full seasons of Premier League data available and the current one in progress (1140 matches, rising to 1520 matches at the end of the current season), which should greatly help with the accuracy of the predictor and prevent the overtraining which is easy with a small data set [6]. 2.1.2 Success Rates of Algorithms The papers which considered multiple algorithms seemed to agree that a Neural Network using back-propagation (such as a Multi-Layer Perceptron with five hidden layers) provided the best results, achieving accuracies around 66% [3], [5]. Bayesian networks appeared reasonably successful, however decision trees (such as J48 and Random Forest) achieved much poorer results [3], [4]. [2] studied the 2006 World Cup made a claim of achieving a 76.9% accuracy using a Neural Network. However, their method discounted any draws from their results, not attempting to predict these, and was based on such a small test set (13 matches) that this value is very unrepresentative of what can be achieved. 2.1.3 Features Selected The features selected in the papers were generally the form of the teams and the head-to-head record (previous matches between the two teams) [3], [5]. In [2] more general match statistics such as ball possession, shots on goal and fouls committed were considered. This data could prove useful, however is not included in this projects data set. [2] and [4] included the method of using an expert to rank players and teams strengths. This is all subjective, as it is down to the expert s opinion and not based on actual data. [2] even found that the expert s opinions were ineffective on the accuracy. The use of an expert s opinion will not be used for this project as there is sufficient data in the dataset to have enough useful features to create a successful algorithm. Only [7] looked into the individual players for the team, and even in the case of one player, analysed his playing position. Whilst [3] and [5] noted that they believed they could achieve more accuracy if they modelled every player playing, but did not have the data for this. 2.1.4 Markov Models Away from classification algorithms, [1] used Markov Chain Monte Carlo methods to predict the outcome of a season of Premier League matches in the 1990 s. The features used were the attack and defensive strength of the two teams calculated from a large set of previous results. This was used to calculate surprise match results and positions in the final league table, as well as looking into betting. 2

2.1.5 Summary A larger data set is needed than the papers analysed used, this is available, so puts this project at an advantage. Neural Networks using back-propagation (Multi-Layer Perceptron) with around five hidden layers has been the most successful algorithm in general. Using features that have been determined by an expert aren t accurate enough, and it will be better to use other features based on the data available such as taking into account the players for each team. 2.2 Research Into Classification Algorithms Many classification algorithms are available, the main ones have been used and analysed in the previous section. A multi-class classifier is needed as there are three classes for the result to be classified into (home win, draw, and away win) as opposed to a binary classifier that can only classify instances into two classes [8]. Any binary classifier can be converted into a multi-class classifier by using the one-against-all strategy where the classifier is trained for each class against the other classes [9]. This will be needed for any binary classifier implemented. The previous section showed that a Multi-Layer perceptron and Bayesian Networks were the most successful algorithms with decision trees being much less effective for football predictors. A tool such as Weka 4 can be used to run the data set through many different algorithms such as the ones previously mentioned to find the most efficient one for the data set being used. 2.3 Current Bookmaker Odds Success In order to be able to set a realistic target for the accuracy of the predictor being built in this project, it is necessary to research the current standards of the bookmakers - as it is these that this project is trying to beat. This has been made simple as archives of bookmaker odds set before matches are readily available. 5 2.3.1 Converting Bookmaker Odds Into Percentages Bookmakers set odds for a match days in advance and this serves as their prediction for the match. However, bookmakers return their predictions as a set of odds rather than percentages. The odds is a number to another number (written x-y ), where the lowest odds indicates the likeliest event to occur and the odds for all possible outcomes do not have to add up to anything. Therefore a method is needed to convert the odds into percentages that add up to 100. The 4 A collection of machine learning algorithms for data mining tasks in Java. The Weka website can be found at: http://www.cs.waikato.ac.nz/ml/weka/ 5 The data source for all the bookmaker odds in this section is: http://www. football-data.co.uk/englandm.php 3

way to achieve this is illustrated in Table 1 with an example 6, and has been figured as; computing y/(x+y) for all of the odds, adding these figures up for each of the three outcomes (which will inevitably total greater than 100), and then dividing the previously worked out figure by this total to normalise the figures to adding up to 100%. The reason why the Decimal Odds comes to a figure greater than 100 is called overround 7 and acts as an insurance that the bookmaker will generally make money no matter the outcome. Manchester City 4 v 0 Swansea City - 15/08/11 Odds Decimal Odds Normalised Percentage Home Win 29-100 100/(29+100)*100 = 77.52 (77.52/105.21)*100 = 73.68% Draw 4-1 1/(4+1)*100 = 20.00 (20.00/105.21)*100 = 19.01% Away Win 12-1 1/(12+1)*100 = 7.69 (7.69/105.21)*100 = 7.31% Total 105.21 100.00% Table 1: Bet365 odds converted into percentages for Manchester City v Swansea City - a match from the opening round of fixtures of the 2011/12 Premier League season. 2.3.2 Analysing Bookmaker s Prediction Accuracy From these normalised percentages, two figures have been calculated to show the success of the bookmakers using two seasons 8, the first is shown in Table 2 and is the total number of predictions correct. This is achieved by taking the highest normalised percentage (lowest odds) out of the three possibilities as the bookmakers prediction. Therefore in the case of Table 1 a home win has the highest normalised percentage and so this is the bookmaker s prediction. The bookmaker s prediction is then compared to the actual outcome of the match. In the case of Table 1 a home win did occur meaning the bookmaker s prediction was correct. Bookmaker Season 2000/01 2010/11 Interwetten 25.00% 50.79% Sportingbet 26.58% 50.79% William Hill 27.11% 50.26% Table 2: Bookmaker s correct predictions as percentages for data from the entire 2000/01 and 2010/11 FA English Premier League seasons. 6 Data taken from: http://www.football-data.co.uk/mmz4281/1112/e0.csv 7 http://betting.football-data.co.uk/overround.php 8 2000/01 data from: http://www.football-data.co.uk/mmz4281/0001/e0.csv, 2010/11 data from: http://www.football-data.co.uk/mmz4281/1011/e0.csv 4

The second way calculated to view the accuracy of the bookmaker s predictions is shown in Table 3 and is calculated by taking the actual outcome of the normalised percentage. In Table 1 a home win occurred, which the bookmaker confidently predicted at 73.68%. Averaging these figures out produces a value of how accurate the bookmakers were in predicting the actual result. Bookmaker Season 2000/01 2010/11 Interwetten 29.66% 39.63% Sportingbet 30.23% 40.06% William Hill 29.56% 40.07% Table 3: Bookmaker s average predictions for the actual outcome of matches for data from the entire 2000/01 and 2010/11 FA English Premier League seasons. 2.3.3 Data Analysis and Project Accuracy Aims Two very interesting points can be obtained by analysing Table 2 and Table 3. The first is that ten years ago, bookmakers accuracy (according to both measures) was less than 33%. Given that there are only three possible outcomes to choose from, it is surprising that a random generator was more accurate than bookmakers, considering the data isn t random and should hold some correlation. The 2010/11 figures are far more respectable, which leads on to the next point - that the bookmakers have made a significant improvement in ten years, doubling their accuracy by the method shown in Table 2. The minimum aim for this project has been set at the current bookmaker s success rate, 50% by the method in Table 2 and 40% by the method in Table 3. However, these odds have been collected from the bookmakers a day or two before the match happened 9, and so the bookmakers have not been able to take into account the final line-ups, something this predictor will utilise and may gain an advantage with 10. If the bookmakers can be surpassed, then the next target will be the accuracy of the other predictors discussed earlier - around 66%. 9 Odds for weekend games were taken on Friday afternoons and on Tuesday afternoons for midweek games. http://www.football-data.co.uk/notes.txt 10 A small study (of around 20 matches) into how bookmakers odds change from a day before the match to 30 minutes before kick-off showed that the normalised percentage can fluctuate one or two percent for some of the outcomes, however the direction of this fluctuation does not appear to bare any correlation with the actual outcome of the match. 5

3 Proposed Final Design 3.1 Languages to be Used and Website Structure The predictor will be hosted at predictor.footybrain.co.uk and will be coded in Python 2.7 and the Django framework. Python is being used for its extensive amount of mathematical and machine learning libraries that will assist with the construction of a predictor. Django is the most common web framework for Python 11 and invokes the MVC format. For development, Instant Django 12 has been selected to allow local development, committing code to the server only at stable points during the development. This makes development easier and prevents broken code from being uploaded to the server. The data and statistics will be displayed at footybrain.co.uk. A MySQL relational database will be used to store all of the data. Pages to display the statistics will be coded in PHP 13 using the CodeIgniter framework 14. PHP has been chosen for this as it is the most common language for websites 15 and there are no complex calculations needed, just a lot of database queries and then a method to display the data. CodeIgniter is a popular 16, well documented and simple PHP framework that assists with querying and invokes the MVC format. The development of this side of the site is done by committing code to the server (using SVN) into a development folder at footybrain.co.uk/alpha which is password protected. A second MySQL database has been created, for the predictor to use to store all the features for each match that will be needed in the classification algorithm. This database can be populated by PHP and MySQL scripts, accessing the statistics database, and computing new values for the required features. This approach has been taken as it is sensible to separate the predictor data from the raw statistics data - the statistics database can then be read-only to preserve the data. 3.2 Predictor Design The exact algorithm to be implemented has yet to be decided and will depend on running successful features through Weka to find the best algorithm, however it 11 http://zipxap.com/kurtsblog/wp-content/uploads/2009/09/python.png 12 The Instant Django website has been removed from the internet within the last year, but can still be downloaded from http://s3.amazonaws.com/instant.django/django. exe, and the original site with the documentation can be accessed through a web archive in the following link: http://web.archive.org/web/20110202180216/http://www. instantdjango.com/ 13 http://www.php.net/ 14 http://codeigniter.com/ 15 http://w3techs.com/technologies/overview/programming_language/all 16 http://www.availsoftsolutions.com/wp-content/uploads/2011/11/ phpframeworktrend.jpg 6

is likely to be a Neural Network or a Bayesian Network. The predictor will need to be a multi-class classifier, so as to allow for the three different results. The features of the predictor are yet to finalised (using Weka will determine these) but it will be heavily based around the players playing for both the teams in the match. This has yet to have been fully explored by previous predictors and is considered important - each team is more than just a name, success is dependant on the eleven players representing that name for any given match. The predictor will have its weights and any other values needed stored in a database table. It will then calculate a new prediction by just feeding a match s data through the algorithm with the saved weights, so as not to recalculate the weights every time a prediction is made, and thus improve efficiency. The weights will be updated automatically by a cron script that runs whenever any new matches have been completed and recalculates the weights to reflect the new matches that have occurred. 3.3 Website 3.3.1 Website Homepage The website s homepage will display a list of the next round of Premier League fixtures alongside the prediction calculated. If the team line-ups have not been released (predictions will be shown several days before a match s date) then a probable team line-up will be used. This calculation will be updated if team lineups change when they are actually released. The user will be able to click on a link for each match to view more detail about the prediction for that match. The homepage will also show a list of recently completed matches with the predictions made and the final result. A mock-up of the homepage can be seen in the appendix at Figure 2. 3.3.2 Individual Match Page Each match will have an individual page showing all the bookmaker odds collected for this game. This page will also show the calculation of where best to place bets (bookmaker and outcome). A mock-up of this can be seen in the appendix at Figure 3. Beneath this, but still on the same page will be a graphical representation of the probable line-ups for the two teams. The user will be able to alter the line-ups to who they think will play which will in turn update the predictor. A mock-up of this can be seen in the appendix at Figure 4 and is an interactive part of the site that will be coded in Javascript using JQuery 17, and will draw the user in, with it s similarity to highly popular games such as fantasy football. 17 http://jquery.com/ 7

4 Account of Work to Date The work that has been completed so far is shown in the Gantt chart in the appendix at Figure 1 from weeks 0-11. 4.1 Preliminary Work and Setting Up A domain for the website has been registered - footybrain.co.uk - to host the data and the predictor for this project. The author s football statistics data is stored in flat text files, which a custom created program in Pascal and Delphi accesses, adds to and edits. The Pascal program was created solely by the author over three years ago. The database on the server has been created to match a similar format to that of the text files holding the original data. Scripts in PHP and MySQL have been written to read these text files and populate the database, the scripts also allow for new data to be added to the database. The subdomain predictor.footybrain.co.uk has been set up for the sole purpose of hosting the predictor and work for this project. On the data side of the website, pages have been coded to search for and display the data - individual players 18, clubs 19 and matches 20 have their own page with the relevant data showing. Only the current seasons data has been made available to protect the author s entire data collection from being scraped. Currently the pages have no styling but comply with XHTML 1.0 standards 21. 4.2 Research Completed Research into the bookmaker predictions was undertaken followed by research into existing sports predictors and classification algorithms. To assist deciding which algorithm to implement, the Weka software tool has been utilised. The predictor database can be quickly connected to in Weka to help identify which features have more distinction and correlation to the result of the match, and can identify which algorithms produce the highest success rate. This has saved a lot of time and allowed for a quicker implementation of the Python predictor. 4.3 Implementing a Basic Predictor Initially, only a couple of features were trialled in Weka, the main feature studied was the form of the two teams competing in a match. Form is a recent guide to 18 An example player (Emmanuel Adebayor): http://footybrain.co.uk/player/ index/103 19 An example club (Arsenal): http://footybrain.co.uk/club/index/581 20 An example match (Blackburn Rovers v Wolverhampton Wanderers - 13/08/2011): http: //footybrain.co.uk/match/index/9655 21 This is certified at: http://validator.w3.org/check?uri=footybrain.co.uk 8

how a team has been performing. In football, form is traditionally shown as the last six games for a team. However, there is no particular reason why form has to be the last six games. Several different values for form were therefore computed, ranging from the last three to last nine matches played. Some weighted the form (so that the most recent match was more important than the sixth most recent), some of the form calculations took into account the matches within a certain number of days, instead of a set number of matches. In total sixteen different values for form were calculated for each team in each match. Each of these were run through Weka to find the best one. With the Multi-Layer Perceptron fairing well in Weka, and proving popular from the earlier research, this was implemented in Python using just two features for input (the most accurate value for form found in Weka - the last nine games weighted). A successful Multi-Layer Perceptron was created which produced the same accuracy as Weka had with the same features 22. The Multi-Layer Perceptron has one hidden layer with three hidden states and initially has one output. The data set used was the three complete seasons available (2008/09-2010/11) with the first two seasons (760 matches) used as the training data, and the last season (380 matches) used as test data. With just the two features used the Multi- Layer Perceptron managed a success rate of 47.11% which already is not that far off the current bookmaker s standards (50%)! 5 Plan of Remaining Work The schedule for the remaining work is shown in the Gantt chart in the appendix at Figure 1 from weeks 17-32. With a basic predictor already implemented it is aimed to have a near final predictor working before the end of February by using Weka to help identify the features (and algorithm) and then implementing this in Python. Once this has been completed work on the functionality and appearance of the website should be completed by the Easter break. This leaves time for tasks overrunning and the Easter break to write up the final report and prepare for the viva. 22 This is hosted here: http://predictor.footybrain.co.uk/mlp. The page trains and tests the data on the Multi-Layer perceptron, hence the delay in loading the page and then displays the success rate. 9

References [1] H. Rue and O. Salvesen, Prediction and retrospective analysis of soccer matches in a league, 1997. [Online]. Available: http://citeseerx.ist.psu.edu/ viewdoc/download;jsessionid=c85cf60ec1472266a41ef2f0436a9806? doi=10.1.1.46.2009&rep=rep1&type=pdf [2] K.-Y. Huang and W.-L. Chang, A neural network method for prediction of 2006 world cup football game, in Neural Networks (IJCNN), The 2010 International Joint Conference on, July 2010, pp. 1 8. [3] R.-K. Balla, Soccer match result prediction using neural networks, Aug 2007. [4] A. Joseph, N. E. Fenton, and M. Neil, Predicting football results using bayesian nets and other machine learning techniques, Know.- Based Syst., vol. 19, pp. 544 553, November 2006. [Online]. Available: http://dl.acm.org/citation.cfm?id=1222216.1222263 [5] J. Hucaljuk and A. Rakipovic, Predicting football scores using machine learning techniques, in MIPRO, 2011 Proceedings of the 34th International Convention, May 2011, pp. 1623 1627. [6] W. Byrne, Generalization and maximum likelihood from small data sets, in Neural Networks for Signal Processing [1993] III. Proceedings of the 1993 IEEE-SP Workshop, sep 1993, pp. 197 206. [7] A. Joseph, N. E. Fenton, and M. Neil, Predicting football results using bayesian nets and other machine learning techniques, 2005. [Online]. Available: http://www.dcs.qmw.ac.uk/ norman/papers/spurs-2.pdf [8] C.-W. Hsu and C.-J. Lin, A comparison of methods for multiclass support vector machines, Neural Networks, IEEE Transactions on, vol. 13, no. 2, pp. 415 425, Mar 2002. [9] C. Watkins and J. Weston. (1998, May) Multi class support vector machine. [Online]. Available: http://coitweb.uncc.edu/ jfan/multi-label5.pdf 10

A Gantt Chart 11

12 Figure 1: Gantt Chart for entire project. The weeks shown in the header row are numbered in alignment with the university numbering for the academic year s weeks. This was implemented in Microsoft Project but has been copied in Microsoft Excel to save space.

B Website Mock-Ups Figure 2: Mock-up of how the homepage of Footybrain will look. 13

Figure 3: Mock-up of a page with the predictor odds compared to bookmaker odds, and suggestions of where to place bets. 14

Figure 4: Mock-up of how the starting line-ups can be edited by the user section will look. Will be displayed beneath Figure 3 on the same page on the site. 15