Цаконас А.Д., Дуніас Г.Д., Штовба С.Д. Прогнозування результатів футбольних матчів за допомогою машини опорних векторів // Вісник Житомирського інженерно-технологічного інститута.- 003.-. С.8 86. Ths artcle s publsed n Ukranan n "Herald of Zhytomyr Engeneerng- Technologcal Insttute" 003.-. P. 8 86. UDC 68.3 A. Tsakonas, Ph.D Student, Unversty of the Aegean, Chos, Greece G. Dounas, Ph.D., Lecturer, Unversty of the Aegean, Chos, Greece S. Shtovba, Ph.D., Assstant Professor, Vnntsa State Techncal Unversty, Ukrane FORECASTIG FOOTBALL MATCH OUTCOMES WITH SUPPORT VECTOR MACHIES Анотація. В статі пропонується метод прогнозування резльтатів футбольних ігор, що грунтується на такій технології софт-комп ютингу як автоматичне навчання на базі машині розділяючої гіперплощини. Розроблена в статті модель прогнозування враховує такі показники команд: різниця кількості вибувших провідних гравців; різниця ігрових динамік команд; різниця класу команд; фактор свого полю; результати особистих зустрічей команд. Тестування показує, що запропонова модель забезпечує добру збіжність прогнозованих та дійсних результатів футбольних матчів, що дозволяє рекомендувати машину розділяючої гіперплощини як перспективний підхід для прогнозування результатів різних спортивних чемпіонатів.. Introducton The predcton of sport game results corresponds to an nterestng real-world applcaton of modern decson makng and forecastng, whle t could also be consdered as a good benchmark problem for testng dverse technques of etrapolaton and predcton under dffcult condtons of lmted avalable statstcs and uncertantes of nfluence factors. By referrng to terms and methodologes such as ntellgent technques, soft computng, or computatonal learnng [] we mean n fact, a large varety of new powerful technques for ntellgent data analyss, whch provde a sutable way for handlng complety, uncertanty and fuzzness of real-world problems. The am of the present paper s to demonstrate an eample of how to predct football game wnners by applyng such a specfc modern ntellgent technque, namely Support Vector Machnes (SVM). Data representng the Ukranan football champonshp durng the 0 last years are used for the creaton and testng of the ntellgent prognostc models appled wthn ths paper.. The problem statement The task of creatng football wnner predcton models could be reduced to that of fndng out functonal mappng of the form: = (,,..., ) yî{ d, d, d }, () n 3 where - denotes a vector of features (.e. nfluence factors), such as team level, clmate condtons, playng place, results of past games etc.; y - denotes the football game result for assessment of one of the terms: d - «host team s wn», d - «draw» and d 3 - «guest team s wn». For the need of SVM applcaton the problem s re-stated as follows: = (,,..., n) yî{ -,}, () where - denotes the same vector as prevously; y - denotes the football game result for assessment of one of the terms: - s equal to result «host team wll not wn» and s equal to result «guest team wll not wn». 3. Feature selecton
The features carryng the major nfluence on the game predcton results, always correspond to a subjectve choce for every dfferent decson-maker, nevertheless there are some common aspects taken nto account from all decson makers. Accordng to [] these features taken fnally nto account, are the followngs: - dfference of nfrmty factors (as number of traumatsed and dsqualfed players of host team mnus the same players of guest team); - dfference of dynamcs profle (as score of host team for fve last games mnus score of guest team for the fve last games); - dfference of ranks (host team's rank mnus guest team's rank, n the current champonshp); 3 4 - host factor (as HP HG - GP GG, where HP denotes the total home ponts of the host team n the current champonshp; HG s the number of played home games by the host team; GP s the total guest ponts of the guest team n the current champonshp; HG s the number of played guest games by the guest team); 5 - personal score (as goal dfference for all the games of the teams nvolved, wthn 0 years). ote, that the above features do not consst confdental nformaton, but t s easy for the decson maker to know the feature values before the game. 4. Support Vector Machnes SVMs [3] correspond to a relatvely new computatonal ntellgence technque, related to the machne learnng concept. SVMs are used n pattern recognton as well as n regresson estmaton and lnear operator nverson. SVMs have nterestng attrbutes, dfferent than other computatonal ntellgence technques, such as neural networks, as SVMs are always able to fnd a global mnmum and they have a smple geometrc nterpretaton. SVMs are also capable of handlng large number of data or attrbutes and ther learnng s comparable n terms of speed wth that of neural networks. More specfcally, n order to estmate a classfcaton functon such as: f : { ± }, (3) the most mportant s to select an estmate f from a well restrcted so-called capacty of the learnng machne. Small capactes may not be suffcent to appromate comple functons, whle large capactes may fal to generalze, whch s the effect of what s called overfttng. In contrast to the neural networks' approach, where the early stoppng method s used to avod overfttng, n SVMs overfttng s lmted accordng to the statstcal theory of learnng from small samples [3]. The smpler decson functons are the lnear functons. In the case of SVM, the mplementaton of lnear functons corresponds to fndng a large margn separatng between two classes. Ths margn s the mnmum dstance of the tranng data ponts to the separaton surface. The procedure to fnd the mamum margn separaton s a conve quadratc problem [4]. An addtonal parameter enables the SVM to msclassfy some outlyng tranng data n order to get larger margn between the rest tranng data, wthout however affectng the optmzaton be the quadratc problem. If we transform the nput data nto a feature space F usng a map such as: f : F, (4) then, a lnear learnng machne s etended to a non-lnear one. In SVMs the latter procedure s appled mplctly. What we have to supply, s a dot product of pars of data ponts f( ) f( ) Î n feature space. Thus, to compute these dot products, we supply the so-called kernel j F functons that defne the feature space va: K(, ) = f( ) f( ). (5) j j We don't need to know f, because the mappng s performed mplctly. SVMs can also learn whch of the features mpled by the kernel are dstnctve for the two classes. The selecton of the approprate kernel functon may boost the learnng process. 4. The SVM algorthm As assumed n secton 3, we are gven tranng set S = {(, y),...,(, y )}, where each pont n = (,,..., n) belongs to R, and y Î{-, } s a label that dentfes the class of pont. The goal s to determne a functon f( ) = w f( ) + b, (6) where w = ( w, w,..., w n ) and b are the parameters of shatterng hyperplane; f( ) = ( f( ),..., f m ( )) corresponds to a mappng from R n nto a feature space Kernel Hlbert Space mappng used for kernel learnng machnes [5]. m R. Ths s the standard
Accordng to Statstcal Learnng Theory [3] n order to obtan a functon wth controllable generalzaton capablty, we need to control the Vapnk Chervonenks dmenson of the functon through structural rsk mnmzaton. SVMs are a practcal mplementaton of ths dea. The formulaton of SVM leads to the followng quadratc programmng problem [5]: Problem P: Mnmze w w + C å, subject to y( w f( ) + b) ³ -, ³ 0, =,,...,, where C s a postve penalty coeffcent for a msclassfcaton. The soluton w * of ths problem s gven by the equaton: (7) * w* ayf( ), =å * * * where a * = ( a, a,..., a ) s the soluton of followng Dual Problem: Problem P: T Mamze - a Da +åa, subject to å y a = 0 ; 0 a C, =,,...,, where D s a matr such that: D = yyf( ) f( ). (8) j j j By combnng equatons (6) and (7) the soluton of Problem P s gven by: * * åyaf( ) f( ) + b. * The ponts for whch a > 0 are called Support Vectors (SVs). They are the ponts that are ether msclassfed by the computer separatng functon or are closer than a mnmum dstance - the margn of the soluton - from the separatng surface [5]. In many applcatons they form a small subset of the tranng ponts. For certan choces of the mappng f( ) we can epress the dot product n the feature space defned by the f 's as f( ) f( ) = K(, ), where K s called the kernel of the Reproducng Kernel Hlbert Space defned by the f 's [5]. j j We may observe that the spatal complety of Problem P s, ndependent from the dmensonalty of the feature space. Ths observaton allows us to etend the method n feature spaces of nfnte dmenson [3]. In practce however, because of memory and speed requrements, Problem P presents lmtatons on the sze of the tranng set [6]. (9) 5. Results Although our problem s actually a mult-class classfcaton (predct the wnner wth three possble outcomes: home, host, draw) lttle research or none has been done n the one-step mult-class [7]. Thus we solve ths classfcaton problem as a common regresson problem, where the SVM algorthm has to mnmze the mean square error. Then, n order to get the predcted outcome, the followng rules are appled to the denormalzed forecasted values: f forecasted_value>=0 consder postve or zero score result guest team wll not wn ; f forecasted_value<0 consder negatve or zero score result host team wll not wn. Whle SVM classfcaton must be appled between two classes, we select to gnore the draw case as a specal case (a no wnner case) keepng the sgn of the output ndcatng the predcted class. The algorthm was gven as nput a set of 05 tranng data records and the SVM was tested on 70 test data records []. All data were normalzed n [-, ] range. The software appled was the mysvm [8]. We selected as kernel functon the dot functon (smple multplcaton) as we had no evdence for the approprateness of other, more comple functons. We also set the capacty parameter of the SVM equal to С = 000. Ths parameter has to be postve, ts value s then dvded by the number of eamples that are used for tranng. The other mportant parameter (see secton 4) s the nsenstvty known as epslon, whch s a constant that the predcton can devate from the functonal value wthout beng penalzed. In the algorthm t sets both a postve (epslon+) and a negatve nsenstvty (epslon-). Here we set epslon=0.0.
The algorthm statstcs n detal are presented n Table, and are eplaned n the paragraph that follows. Support Vectors s the number of support vectors produced. Bounded SVs are the number of support vectors at the upper bound, then the mnmum and the mamum values of the alphas are shown. w s the -norm of the hyperplane vector and VCdm s an estmator of the Vapnk Chervonenks dmenson computed from the last two values. ( w,..., w5) s the hyperplane vector for the attrbutes and b s the addtonal constant of the hyperplane. The followng results were obtaned after 377 teratons: Tran Set Mean square error - 0.0597589; Test Set Mean square error - 0.05367684. By applyng the classfcaton rules descrbed n the prevous paragraph we receved the followng results: Correct Predcton on Test Set s 43 out of 70 eamples (accuracy 6.4%). Table - Support vector learnng output statstcs Parameter Value Support Vectors 97 Bounded SVs 90 mn SV -9.7087379 ma SV 9.7087379 w 0.8035 VCdm <=.3774434 w 0.570 w -0.00445 w 0.8758 3 w 0.838793 4 w 0.0998453 5 b 0.0638468 In order to compare our model wth other approaches, n Table we consdered results obtaned by other computatonal ntellgent approaches, n prevous work []. Those results were obtaned for a predcton ncludng the draw result of the matches, thus ther quotaton s here ndcatory. Also, results for the fuzzy model and the neural network nclude the classfcaton score on an 75-element set (tranng and testng sets). These results can help however to draw general conclusons on the effectveness of the method n ths data set. Table Comparson of the SVM model wth other approaches Model Correct classfcaton Fuzzy model 64 % (both sets) eural network 64 % (both sets) Genetc programmng model 64.8 % (test set) Support Vector Machnes 6.4% (test set) 6. Conclusons - Further Research Ths paper brefly demonstrates the applcaton of modern statstcal or entropy-based approaches, such as Support Vector Machnes. The latter, relatvely new computatonal ntellgence approach, was mplemented n a common (for SVM theory) ± outcome bass, wth postve values correspondng to a guest team wll not wn outcome and negatve values to a host team wll not wn outcome. These prme results presented n the paper, are ndcatve of the usablty of the SVMs, denotng the compettveness of ths approach among other ntellgent approaches for data drven forecastng and decson makng. Further research n ths doman, may nvolve hybrd computatonal ntellgent schemes (see a detaled revew n [9], for detals), whle those approaches have been proved n many cases capable of capturng nearly stochastc or chaotc processes offerng a hgh classfcaton and predcton rate. References. Zadeh L. Appled Soft Computng Foreword // Appled Soft Computng, 000, Vol., P.-.. Tsakonas A., Dounas G., Shtovba S., Vvdyuk V. Soft Computng-Based Result Predcton of Football Games // Proceedngs of the Frst Internatonal Conference on Inductve Modelng, Lvv, 00, Vol. 3, P. 5-.
3. Vapnk V.. Statstcal learnng theory.- Wley-Interscence, 998.- 736 p. 4. Vapnk V.. The nature of statstcal learnng theory.- Sprnger-Verlag, 999.- 304 p. 5. Cortes C., Vapnk V. Support Vector etworks // Machne Learnng, 995, Vol. 0, P. -5. 6. Evgenou T., Pontl M., Support Vector Machnes wth Clusterng for Tranng wth Very Large Datasets.- Sprnger: Lecture otes n Computer Scence, Vol. 308, 00.- P. 346-354. 7. Boser B., Guyon I., Vapnk V.P. A tranng algorthm for optmal margn classfers // Computatonal Learnng Theory, 99, Vol.5, P. 44-5. 8. Rupng S., mysvm-manual. Techncal Report.- Unversty of Dortmund, Computer Scence Department, 000. 9. Tsakonas A., Dounas G. Hybrd Computatonal Intellgence Schemes n Comple Domans: An Etended Revew.- Sprnger: Lecture otes n Computer Scence, Vol. 308, 00, P. 494-5. Tsakonas Athanasos Demetros, MSc, PhD Student, Unversty of the Aegean, Chos, Greece. Scentfc nterests: Computatonal Intellgence, Decson Makng, Wavelets, Chaos Theory. Tel.: (30937) 89399. E-mal: tsakonas@stt.aegean.gr. Dounas Gorgos D., PhD, Lecturer, Unversty of the Aegean, Chos, Greece. Scentfc nterests: Computatonal Intellgence, Decson Makng, Wavelets, Medcal Applcatons of Artfcal Intellgenc. Tel.: (307)-94408. E-mal: g.dounas@aegen.gr. Shtovba Serhy Dmytrovych, PhD, Assstant Professor, Vnntsa State Techncal Unversty, Vnntsa, Ukrane. Scentfc nterests: Fuzzy Logc, Genetc Algorthms, Decson Makng, Relablty, Qualty Control. Tel.: (043)-440430. E-mal: serg@faksu.vstu.vnnca.ua. Стаття надійшла до редакції 00р. А.Д. Цаконас, Г.Д. Дуниас, С.Д. Штовба Прогнозирование результатов футбольных матчей с помощью машины разделяющей гиперплоскости. В статье предложен метод прогнозирование результатов футбольных матчей, основанный на такой технологии софт-компьютинга как автоматическое обучение на основе машины разделяющей гиперплоскости. Разработанная в статье модель прогнозирование учитывает следующие показатели команд: разница потерь ведущих игроков; разница игровых динамик команд; разница классов команд; фактор своего поля; результаты личных встреч команд. Тестирование показывает, что предложенная модель обеспечивает хорошую согласованность спрогнозированных и действительных результатов футбольных матчей, что позволяет рекомендовать машину разделяющей гиперплоскости как перспективный подход для прогнозирования результатов различных спортивных чемпионатов. A. Tsakonas, G. Dounas, S. Shtovba Forecastng football match outcomes wth support vector machnes. A soft computng method for result predcton of football games based on machne learnng technques such as support vector machnes s proposed n the artcle. The model s takng nto account the followng features of football teams: dfference of nfrmty factors; dfference of dynamcs profle; dfference of ranks; host factor; personal score of the teams. Testng shows that the proposed model acheves a satsfactory estmaton of the actual game outcomes. The current work concludes wth the recommendaton of support vector machnes technque as a powerful approach, for the creaton of result predcton models of dverse sport champonshps.