DATA MINING FOR SMALL STUDENT DATA SET KNOWLEDGE MANAGEMENT SYSTEM FOR HIGHER EDUCATION TEACHERS
|
|
|
- Wilfrid Montgomery
- 10 years ago
- Views:
Transcription
1 DATA MINING FOR SMALL STUDENT DATA SET KNOWLEDGE MANAGEMENT SYSTEM FOR HIGHER EDUCATION TEACHERS Srečko Natek International School for Social and Business Studies, Slovenia Moti Zwilling College of Law and Business, Israel Abstract: Higher education teachers are often curious whether students will be successful or not. Before or during a course they try to estimate the percentage of successful students. But is it possible to predict the success rate of students enrolled in their course? Are there any specific student characteristics, known to a teacher, which can be associated with the student success rate? Is there any relevant student data available to teachers on the basis of which they could predict the student success rate? The answers to the above research questions can generally be obtained with data mining tools. Unfortunately, data mining algorithms work best with large data sets, while student data, available to er education teachers, is extremely limited and clearly falls into the category of small data sets. Thus, the study focuses on data mining for small student data sets and aims to answer the above research questions using data and comparative analysis using data mining tools normally available to er education teachers. The conclusions of this study are very promising and will encourage teachers to incorporate data mining tools as an important part of their er education knowledge management systems. Keywords: data mining, knowledge management system, predicting student s success rate, small data set 1379
2 1. DATA MINING FOR SMALL DATA SET AS PART OF HIGHER EDUCATION TEACHER S KNOWLEDGE MANAGEMENT SYSTEM In knowledge management process, data mining technique can be used to extract and discover the valuable and meaningful knowledge from a large amount of data. Nowadays,data mining has given a great deal of concern and attention in the information industry and in society as a whole. This technique is an approach that is currently receiving great attention in data analysis and it has been recognized as a newly emerging analysis tool (Osei-Bryson, 2010; Park, 2001; Sinha, 2008; Tso & Yau, 2007; Wan, 2009; Zanakis, 2005;Zhuang et al., 2009). There are many areas which adapted this approach to solve their problems such as in finance, medical, marketing, stock, telecommunication, manufacturing, health care, customer relationship and etc. However, the data mining application has not attracted much attention from people in relation to Educational Systems. In particular, Data mining become very popular among researches because so many standalone or desktop data mining tools are available e.g. Microsoft Excel, SPSS, Weka, Protégé as Knowledge Acquisition System and Rapid Miner. Higher education institutions promote knowledge management as incentive environment for their teachers (e.g. Gomezelj & Biloslavo & Trnavčevič, 2010, p. 49). So data mining is recognized as contemporary tool for building knowledge management systems (Jashapara, 20 pp ). A knowledge management approach to data mining process, otherwise associated with business intelligence technology, brings some new sinergy and added value to data mining solutions (Wang & Wang, 2008, pp ). This study was focused on er education teacher s knowledge management systems as part of theirs educational and research work. Unfortunately most data mining technique work best with very large samples. Andonie, reported that some data mining technique, e.g. neural networks may not be able to accomplish the learning task as small datasets cannot provide enough data to fill the gaps between too small samples (Andonie, 2010, pp ). Several authors conclude that small datasets limit the scope of data mining technique (Yuan & Fine, 1998, pp ). In real-life there are many situations, where relatively small data sets are normal, at least from the point of view of users. The student s data of one course are good example of such situation. Even if the course attend relatively large group of students the relevant data are considered to be small dataset. Andonie even suggested that there is no universal optimal solution to the problem of small datasets (Andonie, 2010, pp. 283). So what possibilities are available to the er education teachers if they want to explore how the new course s students will be successful, using contemporary but available data mining tools. Student s data, available to er education teachers are extremely limited and clearly fall into the category of small data set. Nevertheless the research fols the induction learning rules from examples as one of the primary characteristics of machine learning (Becerra & Gonzales & Sabherwal, 2004, p. 220). Described approach is very popular and well supported by data mining tools, used in the study. The research goal is to focus on small student data set data mining, trying to answer various teacher s questions, using the data and data mining tools, normally available to average er education teachers. For er education teachers the most important outcome are relatively reliable students success prediction and user friendly experience. Bukowitz and Williams (Bukowitz &Williams, 2000, p. 5) advocate how technology is blurring the distinction between types of knowledge from unknown knowledge to known knowledge. The study supports their conceptual model as data mining on students data sets discovered some interesting key influencer and Final grade prediction. 1380
3 2. BUILDING STUDENT DATA MINING MODEL 2.1 Building student data mining model for MS Excel Table Tools data mining Several authors suggest different steps for building data mining model, focusing mainly on data transformation process. Turban advocate data preparation model with data consolidation, data cleaning, data transformation and data reduction phases (Turban & Sharda & Delen, 20, p. 209). Berry and Linoff defined comprehensive data mining building model as an iterative process with several steps (Berry & Linoff, 2000, p. 48). The adopted model was used to establish the student data mining model for International School for Social and Business Studies from Slovenia: - Data requirements identification: er education teacher require to predict the students success in a specific course. - Data acquizition: - course: Informatics on Bechelor Study Programmes Economy in Contemporary Society, - academic years number of students: 2010/ (42 students), 20/12 (32 students) and 2012/13 (32 students) total 106 students, - the data was imported from web application Novis Higher Education Information System Teacher Module, - imported to MS Excel Table Tools, - with Data Mining add-in and MS SQL Server, installed on laptop computer. - Validate, Explore, Clean and Transpose Data: - delete irrelevant columns (e.g. group, remark, individual activities points, registration number), - translation from Slovene to English (columns title and content), - adjusting column order (logical order for success prediction, the last culmn is predictable»final grade«, - status transformation: from several status type to yes/no value. - Employment transformation: from several status to yes/no value. - student name transformation to number. - Add Derived Variables: - aditional column»study year«- aditional culumn derived form student name:»gender«. - merge full and part students with additional column»type of study«. - Create Model Data Sets: - Student data sets columns (Table 1): - Study year (2010/, 20/12, 2012/13) - Student (order number 1 106) - Gender (female, male) - Student year of birth (e.g. 1988) - Employment (no, yes) - Status, e.g. sport etc. (no, yes) - Registration (first, repeat) - Type of study (full, part ) - Exam condition (no, yes) - Activities points (0 50) - Exam points (0 50) - Final points (0 50) - Final grade (1 10) prediction column. - Informatics and training data predict final grade.xls: students row, - for 2010/ and 20/12 74 row of training data, - for 2012/13 32 row without predictable»final grade«data - Informatics actual data.xls 32 row of actual»final grade«data - The data set form small number of examples but their content was estimated to be relevant for machine learning (Han & Kamber, & Pei, 2012, p.24). - Choose Data Mining Technology: - MS offer three possibilities of data mining level: - Basic: using MS Excel Table tools Analyze Data Mining features; 1381
4 - Intermediate: using MS Excel Data Mining Adds- in features, - Expert: using MS SQL Server Data Mining capabilities. - All levels od data mining are using algorithms from MS SQL Server Data mining, with different user interface, different technique and number of parameters, used to manage data mining process. - For research purpose Higher education teachers, we choose the basic level. - Choose Data Mining modeling Technique: - MS Table tools offer several Data Mining Technique: Analyze Key influencers, Detect Categories, Fill from Examples, Forecast, Highlight Exceptions, Scenario Analysis, Prediction Calculation and Shoping Basket Analysis, - Some analysis are not suitable for given student data sets, e.g. Forecat, Scenario Analysis and Shoping Basket due to data or content limitation. - Among other technique, the most usable technique are Key Influencer and Fill from Examples, which we used for the foling research. - Train Model: - Using Fill from example technique, we run analysis several, choosing»final grade«as prediction culumn (e.g. Selecting column containing the examples), - Choosing different combination of columns: we found out the most relevant results by omitting student number (not relevant at all) and Exam points columns (directly define the final grade...). - Choose Best Model evaluate the results: - Comparing the results of prediction for 2012/13 student»final grade«with the actual Final grade of the same student we can choose the best parameters for Fill from example data mining technique. - The final Data mining Fill from example predictive model in now ready for future student»final grade«prediction, based on already known instances. The model can be used by confidence, until the foling assumption is true (Berry & Linoff, 2000, pp ): - The past is good predictor of the future (if the student generation are changed their attitude and behavior may vary from the past and create different student patern. - The data is available we assume the data will be always available for teachers. - The data contains what we want to predict the data mining model, we built contains the relevant data. Table 1: Students data sets Study year Student Gender Year of birth Employment Status (sport...) Registration Type of study Exam Condition Activities points (50) Exam points (50) Final Points (100) Final grade (10) female 1988 no no first full 2 male 1990 no no first full 3 female 1990 no no first full 4 female 1990 no no first full 5 female 1989 no no first full 6 male 1990 no no first full 7 female 1990 no no first full 8 female 1990 no yes first full 9 male 1990 no no first full 106 female 1990 no no first part yes yes yes yes yes yes yes yes yes yes Building student data mining model with Weka The Data was also built using the WEKA Data Mining tool, were the foling steps were conducted: - Create Model Data Sets: 1382
5 - The data was saved as described above, except from the last column ("The predited class") - The Final Grade. This attribute was transformed into an ordinal attribue using the foling index: for values between 8-10 the "High" value was assigned, For Values between 6-7 the value "Medium" was assigned and for er values than 6 the "Low" value was assigned. (In the case of decision tree M5P the class attribute remained the same as numerical). - Choose Data Mining Technology: - The Weka tool was chosen. - Choose Data Mining modeling Technique: - 3 Decision Trees techniques: J48, M5P and RepTree were selected. (In the case of M5P no conversaion of values for the class attribute was neccesary and the Final grade values were remained as numbers. - Choose Best Model evaluate the results: - Prediction rate accuracy for each trained model were evaluated in comparison with the related test set and the actual data respectivly. 3. STUDENT DATA MINING RESULTS 3.1 Key influencer for student Final grade : At the beginning of the study, we wanted to find out which data (variables) are most powerful influencer for final grade prediction. Key Influencer Analysis is the right and useful tool for the job. Table 2 exhibitds the most powerful relative impact for different variables. For example, the last study year ( ), where Final grade is empty (we want to predict it) show 100 relative impact to favour empty column. Similar, final points strongly favour Final grade, and Activity points or Exam points favour Final grade. The results suggested to eliminate some of this columns from final prediction or at least try several combination of columns to produce relevant results. Table 2: Key influencer Report for Final grade Key Influencers Report for 'Final grade _10_' Key Influencers and their impact over the values of 'Final grade _10_' Filter by 'Column' or 'Favors' to see how various columns influence 'Final grade _10_' Column Value Favors Relative Impact Study year <Empty> 100 Final Points _100_ Activities points _50_ Final Points _100_ Final Points _100_ Exam points _50_ Final Points _100_ Exam points _50_ Final Points _100_ Exam points _50_ Final Points _100_ Exam points _50_ Final Points _100_ Fill From Example Student Final grade prediction for 2012/13: The next step of study try to predict Student Final grade for missing 2012/13 by using Fill from example data mining technique, based on machine learning using previous examples from 2010/ 1383
6 and 20/12. For more relevant result Student, Exam points and Final points were eliminated from analysis. The Pattern Report in Table 3 shows diversity of influences. The results justified our decision to eliminate powerful key influencer from analysis. Table 3: Fill from example Analyse for Student Final grade Pattern Report for 'Final grade _10_' Key Influencers and their impact over the values of 'Final grade _10_' Filter by 'Column' or 'Favors' to see how various columns influence 'Final grade _10_' Column Value Favors Relative Impact Type of study part Activities points _50_ 43,539-47, Registration repeat Study year 2 13 Type of study part 4 17 Gender male 6 33 Study year Activities points _50_ 33,000-38, Status _sport yes 7 36 Employment yes 7 31 Activities points _50_ 38,902-41, Year of birth 1.967, , Activities points _50_ 43,539-47, Activities points _50_ 43,539-47, Type of study part 9 24 Employment no 9 10 Activities points _50_ 41,221-43, Prediction for student Final grade for 2012/13 was the final part of data mining analysis, using Fill From Example Data Mining technique. The results are shown in table 4. The prediction based on data - knowledge, hidden in the student data. Algorithms use the data from year 2010/ and 20/12 for data mining prediction model (training data) and try to predict the missing student final grade data for 2012/13. As noticed all the student s Final grade are positive (above 6) because the majority of previous results are positive too. Table 4: Final grade prediction for 2012/13 Study year Stud ent Gender Year of birth Employmen t Status (sport...) Registra tion Type of study Exam Condition Activities points (50) Exam points (50) Final Points (100) Final grade (10) Final grade _10 Extende d female 1990 no no first full yes male 1990 no no first full yes male 1992 no no first full yes male 1992 no no first full yes male 1990 no no first full yes male 1983 no no first full yes female 1991 no no first full yes female 1973 yes no first full yes female 1989 no no first full yes female 1989 no no first full yes
7 female 1990 no no first full yes female 1976 yes no first full yes female 1987 no no first full yes male 1992 no no first full yes female 1992 no no first full yes female 1992 no no first full yes male 1992 no no first full yes male 1992 no no first full yes male 1992 no no first full yes female 1992 no no first full yes male 1991 no no first full yes male 1987 no no first full yes female 1990 no no repeat full yes female 1991 no no first full yes female 1968 yes no first full yes female 1986 no no first full yes male 1989 no no first full yes male 1990 no no first part no female 1988 no no first part yes female 1969 no no first part yes female 1988 no no first part yes female 1990 no no first part yes Interpretation: - Interpretation is final but vital phase of data mining (Negnevitsky, 20, p. 367). To evaluate the relevance of data mining student Final grade prediction the study compared the actual student s Final grade with predicted one, shown in table 5. The foling conclusions can be calculated and discussed: - 32 students total, 20 students with actual Final grade (Exam), - data mining predict all 32 students Final grade, included 12 (37,5%) without actual Final grade, - with zero Final grade tolerance, the prediction was successful for 7 Final grade (21,8%), - with +- 1 Final grade tolerance the prediction was successful for Final grade (43,3%), - the positive Final grade prediction was successful for 13 of 32 students (40,6%), - but if 12 students (without actual Final grade ) are omitted, the positive Final grade prediction was successful for 13 of 20 students (65%). Table 5: Actual and prediction Final grade comparison Study year Student Final grade (10) Prediction
8 The study s results are not very clear. Final interpretation, that the positive Final grade prediction was successful for 13 of 20 students (65%) indicate the potential of data mining technique even for small student data sets. If larger data set or even more relevant student data were available the result relevance would improve, particularly when carefully combining parameters and columns in data mining analysis. 3.4 Key influencer for student Final grade attribute using the Weka Tool: REPTree model: Training Phase: - Correctly Classified Instances: %, (66); - Incorrectly Classified Instances: %, (2). Figure 1: REPTree Decision Tree Model (Constructed in the train phase) Testing Phase: - Correctly Classified Instances: 100%, (20/20) J48 model: Training Phase: 1386
9 - Correctly Classified Instances: %, (67); - Incorrectly Classified Instances: %, (1). Figure 2: J48 Decision Tree Model (Constructed in the train phase) Testing Phase: - Correctly Classified Instances: 90%, (18/20). M5P model: - Correlation coefficient: Relative absolute error: % Figure 3: M5P Decision Tree Model (Constructed in the train phase) Table 6: Final grade prediction for 2012/13 Using 2 Decision Tree Methods analysis with Weka. Study year Student Final grade (10) Prediction- J48 Prediction REPTree () () () () () () 1387
10 () () () () () () () () () () () () () () 4. CONCLUSIONS The study explores the possibility to predict the success rate of students enrolled in a course using contemporary data mining tools normally available to er education teachers. The research clearly proves that the available desktop data mining tools have matured in terms of their usability and ease of use, and provide usable results without extensive investment. The results of the study show that student data, available to er education teachers via export/import features, carries enough student-specific characteristics in the sense of hidden knowledge which can be successfully associated with student success rates. Despite the well-known fact that data mining algorithms work best on large data sets, the study focused on student data available to er education teachers which is extremely limited and clearly falls into the category of a small data set. The results show that small student data sets in the specific data mining analysis did not limit the use of data mining tools. Moreover were the weka tool, that was used as comparative analysis to MS Excel, showed that by using decision tree models a prediction accuracy (especially with the REPTree model) is obtained (the accuracy was verified during the test phase). M5P regression tree didn't perform as well as the other decision trees since it requires more assumptions the formers. In fact, The REPTree is a fast decision tree learner which builds a decision/regression tree using information gain as the splitting criterion, and prunes it using reduced error pruning. The model only sorts values for numeric attributes once. Missing values are dealt with using C4.5 s method of using fractional instances. Therefore, it may be deduced that since the predicted attribute was transformed into an ordinary value, REPTree was less sensitive to missing values than J48 thus prediction accuracy on test set was better performed. However on Train set it obtained a slightly less performance than J48 (~97% v.s. ~98%). J48 is a slightly modified C4.5 algorithm that generates a classification-decision tree for the given data-set by recursive partitioning of data. The decision is grown using Depth-first strategy. The algorithm considers all the possible tests that can split the data set and selects a test that gives the best information gain. Concerning the evaluated data J48 was less accurate than REPTree, which can be postulated since it is a more sensitive algorithm than REPTree. 1388
11 In general, the study answers the research questions and supports the promising conclusions that data mining for small data sets has a real potential to become a serious part of er education teachers knowledge management systems. According to Nonaka s process model of a knowledgebased firm (Nonaka & Toyama & Hirata, 2008, p. 27), it is very important that organisations explore tacit knowledge, not only in their employees heads. Somes it is more important to explore the knowledge hidden in an organisation s data and transform it into explicit knowledge to improve the organisation, in our study the quality of the educational process. The authors hope that these conclusions will encourage teachers to incorporate data mining tools into their daily work. REFERENCE LIST 1. Andonie, R. (2010). Extreme Data Mining: Inference from Small Datasets. Int. J. Of Compputers, Communications & Control, 5(3). 2. Becerra-Fernandez, I., & Gonzales, A., & Sabherwal, R. (2004). Knowledge Management, Challenges, Solutions, and Technologies. Pearson Prentice Hall. 3. Berry, M., & Linoff, G. (2000). Mastering Data Mining. The Art and Science of Customer Relationship Management. Wiley. 4. Bukowitz, W., R., & Williams, R., L. (2000). The Knowledge Management Fieldbook. Prentice Hall. 5. Gomezelj Omerzel, D. & Biloslavo, R. & Trnavčevič, A. (2010). Management znanja v visokošolskih zavodih. Univerza na Primorskem, Fakulteta za management Koper. 6. Han, J., & Kamber, M., & Pei, J. (2012). Data Mining, Concepts and Techniques. Morgan Kaufmann. Jashapara, A. (20). Knowledge Management, An Integrated Approach. Prentice Hall. 2. Ed. 7. Negnevitsky, M. (20). Artificial Inteligence, A Guide to Intelligent Systems. Pearson Education Limited. 3. Ed. 8. Nonaka, I., & Toyama, R., & Hirata, T. (2008). Managing F, A Process Theory of Knowledge-Based Firm. Palgrave Macmillan. 9. Osei-Bryson, K.-M. (2010). Towards supporting expert evaluation of clustering results using a data mining process model. Information Sciences, 180(3), Sinha, A. P., & Zhao, H. (2008). Incorporating domain knowledge into data mining classifiers: An application in indirect lending. Decision Support System, 46(1), Tso, G. K. F., & Yau, K. K. W. (2007). Predicting electricity energy consumption: A comparison of regression analysis, decision tree and neural networks. Energy, 32, Turban, E., & Sharda, R. & Delen, D. (20). Decision Support and Business Intelligent Systems. Pearson. 9.ed. 13. Wang, H., & Wang, S. (2008). A knowledge management approach to data mining process for business intelligence. Industrial Management & Data Systems, 108(5). 14. Yuan, J. L., & Fine, T. (1998). Neural-network design for small training sets of dimension. IEEE Transactions on Neural Networks, Wan, S., & Lei, T.C. (2009). A knowledge-based decision support system to analyze the debris-f problems at Chen-Yu-Lan River, Taiwan. Knowledge-Based System, 22(8), Zanakis, S. H., Fernandez, I.B. (2005). Competitiveness of nations: A knowledge discovery examination. European Journal of Operational Research, 166(1), Zhuang, Z. Y., Churilov, L., Burstein, F., & Sikaris, K. (2009). Combining data mining and case-based reasoning for intelligent decision support for pathology ordering by general practitioners. European Journal of Operational Research, 195,
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Data Mining Classification Techniques for Human Talent Forecasting
Data Mining Classification Techniques for Human Talent Forecasting Hamidah Jantan 1, Abdul Razak Hamdan 2 and Zulaiha Ali Othman 2 1 1 Faculty of Computer and Mathematical Sciences UiTM, Terengganu, 23000
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report
Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report G. Banos 1, P.A. Mitkas 2, Z. Abas 3, A.L. Symeonidis 2, G. Milis 2 and U. Emanuelson 4 1 Faculty
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
In this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
Introduction to Data Mining
Introduction to Data Mining José Hernández ndez-orallo Dpto.. de Systems Informáticos y Computación Universidad Politécnica de Valencia, Spain [email protected] Horsens, Denmark, 26th September 2005
How To Use Data Mining For Knowledge Management In Technology Enhanced Learning
Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad Email: [email protected]
96 Business Intelligence Journal January PREDICTION OF CHURN BEHAVIOR OF BANK CUSTOMERS USING DATA MINING TOOLS Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET
DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand
Data Mining and Business Intelligence CIT-6-DMB. http://blackboard.lsbu.ac.uk. Faculty of Business 2011/2012. Level 6
Data Mining and Business Intelligence CIT-6-DMB http://blackboard.lsbu.ac.uk Faculty of Business 2011/2012 Level 6 Table of Contents 1. Module Details... 3 2. Short Description... 3 3. Aims of the Module...
Prerequisites. Course Outline
MS-55040: Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot Description This three-day instructor-led course will introduce the students to the concepts of data mining,
Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing
www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS
PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao ABSTRACT Department of Computer Engineering, Fr.
Prediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three
What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO
What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,
Web Mining as a Tool for Understanding Online Learning
Web Mining as a Tool for Understanding Online Learning Jiye Ai University of Missouri Columbia Columbia, MO USA [email protected] James Laffey University of Missouri Columbia Columbia, MO USA [email protected]
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One
Studying Auto Insurance Data
Studying Auto Insurance Data Ashutosh Nandeshwar February 23, 2010 1 Introduction To study auto insurance data using traditional and non-traditional tools, I downloaded a well-studied data from http://www.statsci.org/data/general/motorins.
A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH
205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology
Towards applying Data Mining Techniques for Talent Mangement
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Towards applying Data Mining Techniques for Talent Mangement Hamidah Jantan 1,
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress)
DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress) Leo Pipino University of Massachusetts Lowell [email protected] David Kopcso Babson College [email protected] Abstract: A series of simulations
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
NEURAL NETWORKS IN DATA MINING
NEURAL NETWORKS IN DATA MINING 1 DR. YASHPAL SINGH, 2 ALOK SINGH CHAUHAN 1 Reader, Bundelkhand Institute of Engineering & Technology, Jhansi, India 2 Lecturer, United Institute of Management, Allahabad,
City University of Hong Kong. Information on a Course offered by Department of Management Sciences with effect from Semester A in 2010 / 2011
City University of Hong Kong Information on a Course offered by Department of Management Sciences with effect from Semester A in 200 / 20 Part I Course Title: Enterprise Data Mining Course Code: MS4224
Data Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University)
260 IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.6, June 2011 Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Content Problems of managing data resources in a traditional file environment Capabilities and value of a database management
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of
Index Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
Data Mining Techniques
15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses
City University of Hong Kong. Information on a Course offered by Department of Information Systems with effect from Semester B in 2013 / 2014
City University of Hong Kong Information on a Course offered by Department of Information Systems with effect from Semester B in 2013 / 2014 Part I Course Title: Course Code: Course Duration: Business
Application of Predictive Model for Elementary Students with Special Needs in New Era University
Application of Predictive Model for Elementary Students with Special Needs in New Era University Jannelle ds. Ligao, Calvin Jon A. Lingat, Kristine Nicole P. Chiu, Cym Quiambao, Laurice Anne A. Iglesia
Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms
Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Y.Y. Yao, Y. Zhao, R.B. Maguire Department of Computer Science, University of Regina Regina,
City University of Hong Kong. Information on a Course offered by the Department of Management Sciences with effect from Semester A in 2012 / 2013
City University of Hong Kong Information on a Course offered by the Department of Management Sciences with effect from Semester A in 2012 / 2013 Part I Course Title: Customer Relationship Management with
Neural Networks in Data Mining
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V6 PP 01-06 www.iosrjen.org Neural Networks in Data Mining Ripundeep Singh Gill, Ashima Department
Course 103402 MIS. Foundations of Business Intelligence
Oman College of Management and Technology Course 103402 MIS Topic 5 Foundations of Business Intelligence CS/MIS Department Organizing Data in a Traditional File Environment File organization concepts Database:
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Operations Research and Knowledge Modeling in Data Mining
Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 [email protected]
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: [email protected]
KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: [email protected] Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:
Application of Data Mining Methods in Health Care Databases
6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. Application of Data Mining Methods in Health Care Databases Ágnes Vathy-Fogarassy Department of Mathematics and
Random forest algorithm in big data environment
Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest
Foundations of Business Intelligence: Databases and Information Management
Chapter 6 Foundations of Business Intelligence: Databases and Information Management 6.1 2010 by Prentice Hall LEARNING OBJECTIVES Describe how the problems of managing data resources in a traditional
TIM 50 - Business Information Systems
TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz March 1, 2015 The Database Approach to Data Management Database: Collection of related files containing records on people, places, or things.
A Framework for Dynamic Faculty Support System to Analyze Student Course Data
A Framework for Dynamic Faculty Support System to Analyze Student Course Data J. Shana 1, T. Venkatachalam 2 1 Department of MCA, Coimbatore Institute of Technology, Affiliated to Anna University of Chennai,
Automatic Resolver Group Assignment of IT Service Desk Outsourcing
Automatic Resolver Group Assignment of IT Service Desk Outsourcing in Banking Business Padej Phomasakha Na Sakolnakorn*, Phayung Meesad ** and Gareth Clayton*** Abstract This paper proposes a framework
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
How To Teach Knowledge Management
STATE UNIVERSITY OF NEW YORK COLLEGE OF TECHNOLOGY CANTON, NEW YORK MINS/CITA 430 Data and Knowledge Management Prepared by: Charles Fenner Revised by: Eric Cheng CANINO SCHOOL OF ENGINEERING TECHNOLOGY
Standardization of Components, Products and Processes with Data Mining
B. Agard and A. Kusiak, Standardization of Components, Products and Processes with Data Mining, International Conference on Production Research Americas 2004, Santiago, Chile, August 1-4, 2004. Standardization
USE OF DATA MINING TO DERIVE CRM STRATEGIES OF AN AUTOMOBILE REPAIR SERVICE CENTER IN KOREA
USE OF DATA MINING TO DERIVE CRM STRATEGIES OF AN AUTOMOBILE REPAIR SERVICE CENTER IN KOREA Youngsam Yoon and Yongmoo Suh, Korea University, {mryys, ymsuh}@korea.ac.kr ABSTRACT Problems of a Korean automobile
5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2
Class Announcements TIM 50 - Business Information Systems Lecture 15 Database Assignment 2 posted Due Tuesday 5/26 UC Santa Cruz May 19, 2015 Database: Collection of related files containing records on
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data
Data Mining System, Functionalities and Applications: A Radical Review
Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially
E-Learning Using Data Mining. Shimaa Abd Elkader Abd Elaal
E-Learning Using Data Mining Shimaa Abd Elkader Abd Elaal -10- E-learning using data mining Shimaa Abd Elkader Abd Elaal Abstract Educational Data Mining (EDM) is the process of converting raw data from
International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING
A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING Ahmet Selman BOZKIR Hacettepe University Computer Engineering Department, Ankara, Turkey [email protected] Ebru Akcapinar
An Empirical Study of Application of Data Mining Techniques in Library System
An Empirical Study of Application of Data Mining Techniques in Library System Veepu Uppal Department of Computer Science and Engineering, Manav Rachna College of Engineering, Faridabad, India Gunjan Chindwani
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil [email protected] 2 Network Engineering
College of Health and Human Services. Fall 2013. Syllabus
College of Health and Human Services Fall 2013 Syllabus information placement Instructor description objectives HAP 780 : Data Mining in Health Care Time: Mondays, 7.20pm 10pm (except for 3 rd lecture
Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI
Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: [email protected] Data Mining a step in A KDD Process Data mining:
A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms
A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms Ian Davidson 1st author's affiliation 1st line of address 2nd line of address Telephone number, incl country code 1st
Syllabus. HMI 7437: Data Warehousing and Data/Text Mining for Healthcare
Syllabus HMI 7437: Data Warehousing and Data/Text Mining for Healthcare 1. Instructor Illhoi Yoo, Ph.D Office: 404 Clark Hall Email: [email protected] Office hours: TBA Classroom: TBA Class hours: TBA
PREDICTIVE DATA MINING ON WEB-BASED E-COMMERCE STORE
PREDICTIVE DATA MINING ON WEB-BASED E-COMMERCE STORE Jidi Zhao, Tianjin University of Commerce, [email protected] Huizhang Shen, Tianjin University of Commerce, [email protected] Duo Liu, Tianjin
Use of Data Mining in the field of Library and Information Science : An Overview
512 Use of Data Mining in the field of Library and Information Science : An Overview Roopesh K Dwivedi R P Bajpai Abstract Data Mining refers to the extraction or Mining knowledge from large amount of
Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining -
Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining - Hidenao Abe, Miho Ohsaki, Hideto Yokoi, and Takahira Yamaguchi Department of Medical Informatics,
Prediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives
Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives Describe how the problems of managing data resources in a traditional file environment are solved
APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS. email [email protected]
Eighth International IBPSA Conference Eindhoven, Netherlands August -4, 2003 APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION Christoph Morbitzer, Paul Strachan 2 and
A Data Warehouse Design for A Typical University Information System
(JCSCR) - ISSN 2227-328X A Data Warehouse Design for A Typical University Information System Youssef Bassil LACSC Lebanese Association for Computational Sciences Registered under No. 957, 2011, Beirut,
Big Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,
International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013
A Short-Term Traffic Prediction On A Distributed Network Using Multiple Regression Equation Ms.Sharmi.S 1 Research Scholar, MS University,Thirunelvelli Dr.M.Punithavalli Director, SREC,Coimbatore. Abstract:
Keywords Data Mining, Knowledge Discovery, Direct Marketing, Classification Techniques, Customer Relationship Management
Volume 4, Issue 6, June 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Simplified Data
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
Predicting Student Performance by Using Data Mining Methods for Classification
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance
ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION
ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION K.Vinodkumar 1, Kathiresan.V 2, Divya.K 3 1 MPhil scholar, RVS College of Arts and Science, Coimbatore, India. 2 HOD, Dr.SNS
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India [email protected]
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
