Study on the Application of Data Mining-Based BP Neural Networ Forecasting Model in Physical Training Study on the Application of Data Mining-Based BP Neural Networ Forecasting Model in Physical Training Department of Physical Education, Hunan Normal University, yeychen@yeah.net Abstract The data mining technology can be used to extract potentially useful information and nowledge from a large amount of incomplete, noisy, fuzzy and random data in practical application. Based on the feed-forward calculation, weight adustment and learning algorithm of the data mining technology and BP neural networ, this paper uses the BP neural networ to establish a comprehensive sports performance forecasting model of the athletes, and the athlete physical training data of the high ump team in a physical culture institute in Hubei Province during 2009-2011 were combined to verify the accuracy of the forecasting model and analyze the causes of errors. It has a high degree of approximation in forecasting the sports performance, which can provide the coach with the basis to grasp the training status of the athletes. 1. Introduction Keywords: Data Mining, Neural Networ, Physical Training Since the adoption of the reform and opening-up policy, China s sports achievements have attracted worldwide attention, and in particular, with the increasing application of the database technology, a massive amount of sports training data have been accumulated in the field of sports [1]. How to effectively use these data to find the pattern and serve the decision-maers has become one of the focuses in domestic and foreign researches. With the development of competitive sports in China, there is also a trend of scientification in physical training, and the difficulty of decision-maing and forecasting in physical training has also increased. The purpose of data mining is to rapidly and effectively extract required answers from the data in the sports statistic reports. The bac propagation algorithm of artificial neural networ has strong function approximation capacity. The forecasting model established based on the BP neural networ can be used to forecast the comprehensive sports performance of the athletes and provide assistance in physical training [2]. 2. Data Mining Technology 2.1. Generation of the Data Mining Technology With the rapid development of the information technology, the database size has continued expanding, and the data volume has also increased. The large amount of data tends to confuse people s udgment, and the traditional query method cannot satisfy the requirement of data mining. In order to better use the data, extract useful information behind the data and provide help to the decision maers, new data analysis technology has been developed. In the 1989 International Joint Conference on Artificial Intelligence, the concept of KDD (Knowledge Discovery in Databases) was proposed, which represents the nowledge discovered in data. ater, a new information technology data mining emerged [3]. Generation of the data mining technology represents the evolution process of the database technology, and it is not only the result of long-term study and development of the database technology, but also the result of the natural evolution of the information technology. At present, the data mining technology is an emerging interdisciplinary field, which involves nowledge on various aspects, such as the data warehouse technology, artificial intelligence, genetic algorithm, neural networ, advanced mathematics, statistics and data visualization [4]. International Journal of Digital Content Technology and its Applications(JDCTA) Volume6,Number17,September 2012 doi:10.4156/dcta.vol6.issue17.22 204
Study on the Application of Data Mining-Based BP Neural Networ Forecasting Model in Physical Training 2.2. The Concept of Data Mining Technology The data mining technology refers to the process to extract hidden, unnown and potentially useful information and nowledge from a large amount of incomplete, noisy, fuzzy and random databases, which is an interdisciplinary field of application. Through analysis of each data, a pattern can be found from a large amount of data. The tass of data mining include correlation analysis, cluster analysis, classification analysis, exception analysis, special group analysis and evolution analysis [5]. This definition consists of the following four aspects: (1) The data source must be accurate, massive and noisy; (2) The discovered nowledge must have certain attraction to the user; (3) The discovered nowledge must be acceptable and understandable, which should also be easy to express and use; (4) It is not required to discover the universal nowledge, and the discovered nowledge can only support certain precondition and constraint condition, and apply to a certain field. Data mining consists of the following three steps: (1) Data preparation: select required data from related data source (data warehouse or data mining maret), and integrate them into datasets used in data mining through data cleansing and transformation; (2) Pattern seeing: use certain methods (database technology, artificial intelligence technology, decision tree, statistic analysis, etc.) to find the pattern contained in the datasets; (3) Pattern expression: express the pattern in a way that the user can understand (such as visualization), and provide decision support. 2.3. Process of Data Mining Data mining mainly consists of the following several steps: First of all, the purpose of data mining should be recognized; Secondly, the require data should be obtained, and initial data integration and inspection should be conducted; Thirdly, data preprocessing should be conducted, including data cleansing, data integration, data transformation and data reduction; Fourthly, methods such as the decision tree, classification, cluster, rough set, association rules and sequential pattern should be used to select corresponding algorithms and appropriate parameters and to analyze the data; Fifthly, the results of data mining should be tested and verified, which usually needs to use the visualization technology; Finally, the nowledge should be interpreted, and the nowledge obtained through analysis should be used into the organizational structure of the operational information system. Because data mining includes massive preparation and planning wor, 80% of the time and energy have to be spent in the data preparation phase. Therefore, before analysis of the data mining technology, adequate preparation wor should be done [6]. Figure 1 shows the classic structure chart of the data mining system. In accordance with Figure 1, a classic data mining system is based on the database, data warehouse, the world-wide-web and other information repository, it enters the database or data warehouse server through data cleansing, integration and selection, then, it completes the data mining wor in the data mining engine, nowledge base and pattern evaluation, and it finally feeds bac it to the user through the user interface. 205
Study on the Application of Data Mining-Based BP Neural Networ Forecasting Model in Physical Training user interface evaluation pattern nowledge base data mining engine data cleansing, integration and selection data warehouse server world-wide-web other information repository Figure 1. Classic Structure Chart of the Data Mining System 3. Application of the Data Mining Technology in China s Sports With the wide application of databases and the networ technology, the data mining technology has been widely applied in various industries, such as mareting, finance and insurance, communications networ and product manufacturing. In 2008, Beiing successfully held the Olympic Games, in which, the Chinese athletes won 51 gold medals, and for the first time, China raned No.1 in the summer Olympic gold medal table. In this Olympic Games, the Chinese athletes won 100 medals in total, which shows that China s sports have reached the advanced international level. The data mining technology has been widely used in China s sports research field. By looing up into China Journal Full-text Database, China Academic Journal Networ Publishing Database, China Important Conference Paper Full-text Database and China Excellent Master and Doctoral Dissertation Full-text Database, we find that since 2002, there have been more than 40 papers of the application of data mining in the scientific researches of sports. Applications of the data mining technology in sports include: physical fitness test data, information of the competitor, sports training monitoring, game strategies and sports information management. In his paper Study on the Application of Data Mining Technology in Technique and Tactics Analysis of Pingpong Match, by obtaining the technique and tactics parameters of single motion of excellent Chinese athletes in every round of important international games, Gao Hongge introduced the mining results of the FP-growth algorithm in the technique and tactics analysis of Pingpong match [7]. In addition, he also introduced the K-Means dynamic clustering algorithm and its application/realization process and mining results, and developed the software of Technique and Tactics Statistic Analysis System of Pingpong Match. In "Study and Realization of the Agent-Based Decision Support System of Physical Training Management" of Mei Cheng (2008), based on introduction of the decision support system, agent and related theory and technology of MAS, the data mining technology and association rule mining technology were studied, and the improved M-Apriori algorithm with higher execution efficiency and compatibility with the information of the students physical fitness was obtained, which was applied in the decision support system [8]. In accordance with the characteristics of physical training management, the multi-agent system was designed for the physical training management, and the physical training management DSS based on multi-agent was obtained. In Study and Realization of the Decision Support System Based on Data Mining of Chi Dianwei (2008), based on specific analysis of the basic theory, method, technology and data mining algorithm of association rule, appropriate improvement was made to the Apriori algorithm, and the index structure was used to store the transaction data, which can increase the speed to read valid data in the database; the decision tree ID3 algorithm was improved to increase the running speed; the aid decision support system of college student physical training was established, based on various basic data such as 206
Study on the Application of Data Mining-Based BP Neural Networ Forecasting Model in Physical Training the physical performance and physical examination reports of the students of Nanchang University, and through application of the established model for the improved data mining algorithm, corresponding training plan can be made in accordance with the rules in the nowledge base and the physical situation of the students, which can provide effective decision support during the college students physical training. In his Application of the Data Mining Technology in Modern Sports Researches, based on the questionnaires of the speed-type athletes of a certain province, Zhu Weidong [9] (2010) collected 12,528 data of 343 master-level, level-one and level-two athletes, and after processing the database, the information and data sheet of this group was obtained; then, the data mining software and the ID3 algorithm were used to obtain the decision tree of the basic information; finally, the Apriori algorithm was used to obtain the association rules of the basic information and data of the excellent athletes in this group, and characteristics such as the athletes were generally older, had long trainings and did not have significant gender difference were obtained. In Mining and Analysis of China s Sports Statistical Data, in accordance with the data and information in the 2001-2009 China s Sports Statistical Yearboo, Wang Fan [10] (2011) adopted various methods such as document literature, mathematical statistics and data mining, and used grey forecasting analysis, grey correlation analysis, cluster analysis and factor analysis to establish the forecasting model of the number of world champions, discuss the impact of various factors on the number of world champions, such as expenditure of excellent sports team, expenditure of sports infrastructure, national excellent athletes, total number of coaches and referees of various levels, and also to conduct the cluster and level sequencing of the sports development of various regions in China, which has shown that: the data mining technology is very helpful in discovering potential association in massive data and in deeper analysis of the data and information; the time sequence development trend of human resource input is closer to the reference sequence of the number of world champions than that of the expenditure input. 4. Application of Neural Networ in Physical Training 4.1. Introduction of Neural Networ Artificial neural networ (ANN for short) or neural networ (NN for short) is a complicated networ system formed by a large amount of mutually connected simple neurons, which has strong adaptability, nonlinear mapping capability, robustness and fault-tolerant capability. Researches on artificial neural networ started in the 1940s. In 1943, the psychologist W. McCulloch wored with the mathematical logician W. Pitts to propose the famous MP Model. ater, the American biophysicist Hopfield proposed the Hopfield Model in 1982, which provided an explicit method to determine the balance and stable state of the artificial neural networ. Artificial neural networ reflects many basic features of the human brain functions, which is a highly complicated nonlinear system. It was proposed based on modern neurobiology and cognitive science s studies on information processing by the human, which tried to simulate the way in which the human neural system processes and memorizes the information to design a ind of signal processing system with the characteristics of human brain. The bac propagation algorithm (BP for short) is a basic method to train the artificial neural networ, which was developed and designed by D. E. Rumelhart, J. K. McCelland and their research team in 1986. It can realize arbitrary nonlinear mapping between the input and output, and it has strong function approximation capacity. Among the three types of neural networs (BP, RBF and CMAC), the neural networ based on the BP algorithm has the widest application [11]. The BP networ can learn and store a large amount of input output mode mapping relations without pre-disclosure of the mathematical equation that describes this mapping relation. Its learning rule is to use the steepest descent method to continuously adust the weight and threshold values of the networ through bac propagation, so that minimum sum of square errors can be realized. The topological structure of the BP networ model includes the input layer, hidden layer and output layer, as shown in Figure 2. 207
Study on the Application of Data Mining-Based BP Neural Networ Forecasting Model in Physical Training Input ayer Output ayer Hidden ayer Figure 1. BP Neural Networ 4.2. Feed-forward Calculation of BP Neural Networ During the learning phase of this networ, there are N input nodes, nodes in the hidden layer and M nodes in the output layer, then, the output of the J th node in the hidden layer is: The output of the th node is: net O M ioi (1) i1 f net (2) In which, f (net ) is the excitation function: 1 f net net (3) 0 1 e Where: θ threshold value, whether it is positive or negative represents the position of the excitation function through left-right movement along the horizontal axis; θ 0 It determines the shape of the δ function, when θ 0 is small, the δ function is closer to a step function, and when θ 0 is big, the δ function is closer to be flat. In accordance with the above formula, we obtain: 1 f net f net f net The output of the th node O will be transmitted forward to the th node through the weighting coefficient ω i, and the total input of the th node in the output layer is: net q 1 The actual networ output of the th node in the output layer is: O (5) (4) O f net (6) If the output value is inconsistent with the expected value d, then conduct bac propagation of the error information from the output layer, and conduct modified calculation to the connection weight 208
Study on the Application of Data Mining-Based BP Neural Networ Forecasting Model in Physical Training during the propagation process so that the output result will be as close to the target output value d as possible. After modification of the networ connection weight of this sample P has been completed, another group of training sample should be delivered to conduct similar learning. 4.3. Weight Adustment of BP Neural Networ Assume the quadratic error function of the input-output mode pair of each sample is defined as: Then, the total system error is: t 1 E d O 2 (7) p p p 2 1 1 E d O E 2 P 2 P p p p (8) p1 1 p1 Where, P refers to the sample mode logarithm, and refers to the number of networ output nodes. The weight coefficient should be adusted in the opposite direction of the gradient change of E function, so the networ output will be close to the expected output. The modified formula of the weight coefficient is: E (9) Where, η refers to the learning efficiency and η>0. E E net net (10) The bac propagation error signal δ is defined as: E E O net O net (11) E O ( d O ) (12) O net f( net) net f ( net ) (13) We can deduce that: Because d O f net O O d O (14) ( ) ( ) (1 )( ) q net ( O) O (15) i 1 209
Study on the Application of Data Mining-Based BP Neural Networ Forecasting Model in Physical Training Therefore, we can obtain the modified formula of arbitrary neuron weight coefficient in the output layer as: i.e., (16) ( d O) f ( net) O O O (1 O )( d O ) O (17) The variation of the computation weight coefficient is: E i E net net E net O i i i f net Oi (18) Because E E net O net O 1 1 1 q O 1 net O 1 E E net f net 1 (19) (20) Plug the sample sign P into the formula, and we can obtain: For the output node : f net ( d O ) O (1 O )( d O ) O (21) p p p p p p p p p For the hidden node : p i f net p p Opi Op(1 Op) p Opi 1 1 (22) Where, O p refers to the output of the output node, O p refers to the output of the hidden node J, and O pi refers to the output of the input node i. The modified formula of the networ connection weight can be written as: t1 t O t t1 i i i i i i (23) 210
Study on the Application of Data Mining-Based BP Neural Networ Forecasting Model in Physical Training Where, α is the smoothing factor, and 0<α<1. 4.3. Computation Steps of the earning Algorithm of BP Neural Networ (1) Set all weights ω i, ω i and the threshold value θ as random number between [-1, 1]; assign the input vector X= (x 1, x 2,, x n ) used in the learning and training and the expected output vector D= (d 0, d l,, d l ); the neuron output in the middle layer is O =f (net ), and that in the end output layer is O =f (net ). (2) The error signal between the expected output and actual output of the networ is transmitted from the output layer through the middle layer and then to the input layer, which is a layer-to-layer process to modify the connected bac propagation of error : calculate the difference between the actual output and the target value: t 1 E 2 p dp Op (24) 2 Calculate the weight adustment amount: 1 f net ( d O ) O (1 O )( d O ) O (25) p p p p p p p p p p i f net p p Opi Op (1 Op ) p Opi 1 1 (26) (3) The networ memory training process with alternate forward propagation of mode and bac propagation of error, (27) 1 (4) The learning and convergence process in which the global error of the networ moves towards the minimal value: calculate the error between the target value and the output value at this moment once again, and when the error satisfies the requirement, the learning process is over; otherwise, return to step (2) to continue iteration [12]. 4.4. Use BP Neural Networ to Establish the Forecasting Model Because the input layer of the BP neural networ serves as the buffer storage, therefore, the data can be loaded to the forecasting model. The node number in the input layer of the model equals the dimension number of the data source. After calculation in the model, the calculation results of the forecasting model are output through the output layer. The athlete physical training data of the high ump team in a physical culture institute in Hubei Province during 2009-2011 were selected, and in accordance with the data of 20 physical training programs, the comprehensive sports performance of the athletes could be forecasted. Because there were 20 physical training programs, the dimension number of the input layer in the forecasting model of BP neural networ was set as 20, the dimension number of the output layer was set as 1, and the forecasted comprehensive sports performances of the athletes were output. The error signal between the expected output and actual output of the networ was transmitted from the output layer through the middle layer and then to the input layer, which was a layer-to-layer process to modify the connected bac propagation of error. Increased hidden layers can increase the processing capability of the model, but during the convergence process, the more neurons there are in the hidden layer, the more times of iterative computation are needed, which would further increase the complexity and times of training. During this forecasting, the neuron number in the hidden layer was set as 8. 211
Study on the Application of Data Mining-Based BP Neural Networ Forecasting Model in Physical Training 4.5. Analysis of the Forecasting Result During the experiment, the sample data were divided into two groups of training data and test data. The training data were used to train the BP neural networ in order to determine the weight, while the test data were used to test the pre-trained BP neural networ. Of which, the training data accounted for 90% of the total, while the test data accounted for 20%. The outputs of the training data and test data obtained after 2500 trainings of the BP neural networ model are shown in Table 1 and Table 2 respectively. Table 1 Output of Training Data Target Value Output Value Error (%) owest Score (m) 1.9675 1.9677 0.0102 Highest Score (m) 2.0467 2.0468 0.0049 Average Score (m) 1.9865 1.9863-0.0101 Mean Square Error 1.2153 1.2149-0.0329 Table 2 Output of Test Data Target Value Output Value Error (%) owest Score (m) 1.9687 1.9689 0.0102 Highest Score (m) 2.0459 2.0461 0.0098 Average Score (m) 1.9834 1.9837 0.0151 Mean Square Error 1.2147 1.2150 0.0247 Through comparison of Table 1 and Table 2, we can see that: the BP neural networ model has great learning capability and adaptability, which has high degree of approximation in forecasting the sports performance, and it should be promoted. However, we also noticed that there was certain distortion in the forecasting results obtained by the BP neural networ model, and the reasons mainly include: (1) The data sample had a limited size; (2) There was error during the testing and recording process of the physical training data of the athletes. Therefore, when BP neural networ is used to conduct data mining, the noise data can be further eliminated to optimize the sample. 5. Conclusion (1) With the flourish of China s sports, a large amount of physical training data have been accumulated in the field of sports, and how to process these data and find their pattern is an urgent problem. (2) Through analysis of each data, the data mining technology can find a certain pattern in massive data and discover the useful information hidden behind the data, in this way to provide help to the decision maers. The data mining technology has been widely applied in the field of sports. (3) Artificial neural networ (ANN) has strong adaptability, nonlinear mapping capability, robustness and fault-tolerant capability, and the bac propagation algorithm has strong function approximation capacity. (4) The forecasting model established based on the BP neural networ can be used to forecast the comprehensive sports performance of the athletes. Through corresponding forecasting technology and mining of the physical training data of the athletes of the high ump team in a physical culture institute in Hubei Province during 2009-2011, we find that this model has great learning capability and adaptability as well as a high degree of approximation in forecasting the sports performance, which can provide the coach with the basis to grasp the training status of the athletes. 6. References [1] Ginger Smith, Andrea Cahn and Sybil Ford, Sports Commerce and Peace: The Special Case of the Special Olympics, Journal of Business Ethics, Vol. 89, No. 4, pp. 587-602, 2009. 212
Study on the Application of Data Mining-Based BP Neural Networ Forecasting Model in Physical Training [2] Guangdong Tian, Jiangwei Chu, Yumei iu, Pengfei Cui, Guangming Qiao, "Disassembly Probability Analysis Based on Neural Networs", AISS, Vol. 4, No. 7, pp. 248 ~ 255, 2012. [3] Mingchang iu, Hewang iu, "Research on Application of Association Rule Mining in Chinese Athletes Nutritional and Biochemical Indexes Monitoring", JDCTA, Vol. 6, No. 7, pp. 174 ~ 180, 2012. [4] Yi iu, Shengwu Xiong, "A Fine-grained Parallel Multi-Obective Genetic Algorithm for Stadium Evacuation Route Assignment", JDCTA, Vol. 6, No. 8, pp. 302 ~ 310, 2012. [5] Zhu Jinwei, Ju Shiguang, Xin Yan, Data Mining Based Approach to Preprocessing TCM Data Set, Computer Engineering, vol. 32, no. 15, pp. 280-283, 2006. [6] i Xiaoyi, Xu Zhaodi, Arithmetic Analysis of Association Rules Mining, Journal of iaoning, Technical University, vol. 25, no. 2, pp. 318-320, 2006. [7] Yanqiang GE, Xiangzheng WANG, Qingsheng I, "A improved Harmony Search Algorithm based on Association Rules", IJACT, Vol. 3, No. 9, pp. 122-128, 2011. [8] He Yueshun, Du Ping, "The Research of andslide Monitring and Pre-warning Based on Association Rules Mining", JCIT, Vol. 6, No. 9, pp. 89 ~ 95, 2011. [9] Mao Jie, Mei Yan, Application of Grey ART Clustering Analysis in Monitoring Physiological and Chemical Index of Competitive Sports, Journal of Wuhan Institute of Physical Education, Vol.39, No. 10, pp. 50-52, 2005. [10] i Xinwu, "A New Clustering Segmentation Algorithm of 3D Medical Data Field Based on Data Mining ", JDCTA, Vol. 4, No. 4, pp. 174-181, 2010. [11] Cairong Wu, Huaxing Huang, "Evaluation and Research on Sports Psychology based on BP Neural Networ Model", AISS, Vol. 4, No. 10, pp. 355 ~ 363, 2012 [12] Fuwei Zhang, Qin Su, "Application of Wavelet Neural Networ into the Sustainable Development of Power Maret", AISS, Vol. 4, No. 10, pp. 269 ~ 277, 2012. 213