DATA MINING AND STATISTICAL METHODS USED FOR SCANNING CATEGORICAL DATA. Murtadha M Hamad Al-Anbar University-College of computers
|
|
|
- Terence Holmes
- 9 years ago
- Views:
Transcription
1 ISSN: DATA MINING AND STATISTICAL METHODS USED FOR SCANNING CATEGORICAL DATA Murtadha M Hamad Al-Anbar University-College of computers Received: 1/5/007 Accepted: 30/8/007 ABSTRACT: It has been shown that data mining uncovers patterns in data using predictive techniques. These patterns play a critical role in decision making because they reveal areas for process improvement. Statistical techniques such as Chi-square test for association are widely used in the medical field. Yet, the interpretation of some of the results approached by the use of this statistical techniques is seems to be a very difficult task. The type of association is often non-linear and hence will mask the important part of the use of this technique. In this research work a new approach is adopted by scanning the raw data for any possible association (linear or non-linear). More data mining methods and statistical inference were the base tools of this research work. Keywords: Chi-square test, Data mining, Kappa measures, Linear and non-linear association, Categorical data. 1. INTRODUCTION In the cross-classification of categorical data, researchers are always interested in the sense of searching the cross-classification table for any potential relation ship between groups of the cross classified variables. Such a relationship is statistically denoted as an association. Data mining often concerns with the meaning and quality of the information embedded in any given set of data. Ideas from information measurements and statistical analysis will be merged in order to establish a linkage between these two tools [1]. Such a linkage will allow the users to handle a straightforward interpretation as to unmask the type and degree of association, and hence will enhance the meaning of the results obtained. Most analysts separate data mining software into two groups: data mining tools and data mining applications. Data mining tools provide a number of techniques that can be applied to any business problem. Regardless of whether we are aware of them, our daily lives are influenced by data mining
2 applications. For example, almost every financial transaction is processed by a data mining application to detect fraud. Both data mining tools and data mining applications are valuables, however. Increasingly organizations as data mining tools and data mining applications together in a integrated environment for predictive analytic. Assume a concept of event patterns as an embodiment of information. Consider a set of mutually exclusive random variables {Xi : i = 1 k}. an instantiation of any sub set of variables in X referred to as an event pattern. In this research, light will be shed only on discrete, finite, multi-valued random variables. Using multi-valued discrete variables to represent a physical phenomenon, a concept, or an object, is common in a variety of fields such as medicine, business and economics. For example, in a medical diagnosis problem, gender and condition may be two variables of interest. A particular patient always has one and only one gender, meanwhile could be a located to none, one or more disease(s) []. The concept of data mining passed upon data patterns is to identify events patterns that are either statistically significant or not. One approach towards identifying statistical significant information is passed on event association [3]. Significant association may be determined by statistical hypothesis test passed on mutual information measure or residual analysis. as reported elsewhere, mutual information with regard to information theory is asymptotically distributed as chisquare distribution. this result has been extended elsewhere [4] to model residual analysis as a normally distributed random variable. In doing so, statistical hypothesis test passed on residual analysis may be used as a conceptual tool to discover data patterns with significant event associations. Another results discovered recently [5] is an algebraic linkage between information measure and statistical analysis that suggests yet another approach for detecting events association passed on symbol probability ratio..data Handling and Algorithm as to clarify the algorithm with real set of data that were part of the data set collected by holmquest et al [6] in an investigation into observer reliability in the histological classification of carcinoma in situ and related lesions of the uterine cervix [7], will be used (table 1). 1: negative : A typical squamous 3: Carcinoma in situ 4: Squamous carcinoma with early stromal invasion. 5: Invasive carcinoma The algorithm
3 Before they are plunged into the process of calculating the chisquare statistics and its relevant kappa [8] it is useful to state the gradual steps of the computer algorithm from the beginning till the handling of the decision statement. The following steps illustrate the detailed algorithm: 1. A database containing information about patients relevant to this research must be available. In this context a virtual medical database has been prepared to handle the real data considered in this research. A table containing the required in formation must be identified. In this context a virtual table has been designed to involve the following fields: - Patients number - Age - Sex - Date of admission - Complain - Symptoms - Mass - Pathologist 1 - Pathologist The table contained information about the real patients as posed by the example mentioned in everitt B. S. [6]. 3. Cases with negative mass findings has been ignored in the study. Only those cases with positive mass findings were involved. This procedure has been done by the use of filter statement available within the Microsoft database program. 4. A different procedure has been implemented to handle the cross-classification table of beliefs from pathologist 1 and (Table 1). 5. A crystal report containing the cross classified table and the results of both chi-square statistics and kappa has been designed. The report also contained a decision statement the magnitude of kappa based on the comparison of the calculated value of kappa with its theoretical range of values. The detailed procedure for calculating chi-square test and kappa is given according to the following: a. The value of Chi-sqaure statistics calculation: m k ( Oy E ) y x = i = 1 j = 1 E y where m is the number of rows and k is the number of columns for the cross classified table: O is the observed frequency and E is the expected frequency calculated by multiplying the corresponding row and column totals and divide the result by the grand total. b. Calculation kappa : According to the data of table 1, the following components are going to be calculated: n p = C u / T= i= = 0.635
4 p 0 n ( TiTn / T ) 100 = = = % i 1 6* 7 6* * * *3 118 Kappa = P P O 0 1 P0 = = % = 0.3 Therefore In order to calculate the variance of kappa, the proportion of each entry of table 1 will be divided by 118 (table ) 1 P n u Var( Kappa) = 4 1 ( 1 0) i = T P + Sd = Var( Kappa) = [( 1 P ) ( P P )( 1 P )] c i j n n ( 1 P0 ) Pn ( Pi + Pj ) ( P0 Pc Pc + P0 ) i= 1 j= 1 0 = % confidence intervals=kappa ±1.98sd / T =(0.33,0.61) 3.Data model In this research, a data table has been done which contains the various number of the different diagnosis cases and the number of the cases dealt with (118) as the shown in table (1). As figure 1 show: 1. Some statistical tools were used as helping devices, such as chi-square statistics and kappa metric in table (1) for obtaining table (). Then, the standard deviation was calculated.. A database under the name (pathologists.mdb) was built and designed, which contains (118) records corresponding with the cases. The data base is expandable to include further number of records. Filtering of the data base records has been carried out in order to deal with the cases of positive mass under study. The results of 1 and, and the use of the table below [8]:
5 Were conducive to a report containing the essential information which has a role in making the decision that leads to diagnosis of the infection level of the studied cases. 4.Discussion Results In this paper, 118 cases of factual data have been dealt with, as shown in table (1) and data base (pathologists.mdb). After examining the statistical concepts, it has been noticed that using the weighted Kappa Metric is important in satisfactorily classifying and partitioning the data groups of the above data base according to the studied cases. it has also been noticed that Kappa Metric has an active role and important indicator relative to the data observed by ( pathologists). Despite the complexities that accompany the calculation, the role of Kappa Metric is greater when the observes are more than two. From algorithm (step 5) we got the following: Sd = Var( Kappa) = %Confidenceinmtervals = Kappa ± 1.98Sd / T = ( 0.33,0.61) From table 3 we got the optimal value (0.47) of strength agreement: Moderate Kappa= 0.47 and hence a moderate linear association is detected This messages one of six possible that the program may revealed according to the value of Kappa as stated in table 3. The message motional below the crystal report is the actual output of the program according to the value of Kappa (0.47). The 95% confidence intervals indicated that the value of Kappa will never be out of range (0.33, 0.61). 5.Conclusion The study has reached the following conclusions: 1- Using the statistical concepts and tools as supporting tools in dealing with data mining has a role in uncovering data which are not easily revealed by normal methods. - Several interesting results are found in this research. First, the concept of data patterns allows us to visualize any early step of data mining has being a process of finding significant event associations. This process lends itself to a set of data patterns for discovering an inference model; where such a model encapsulates significant behavior of the data as measured by statistical analysis as well as information measure. 3- Data mining uncovers patterns in data using predictive techniques. These patterns play a critical role in decision making because they reveal areas for processes improvement. Using data mining, organizations can increase the profitability of their interactions with customers, detect fraud, and improve risk management. The
6 patterns uncovered using data mining help organizations make better and timelier decisions. 6.Refernces [1] Sy B. K., "Pattern-based Inference Approach for Data Mining", Queen College, Department of Computer Science, [] Sy B. K. & D. Sher, "An abstraction Theory framework for Probabilistic Inference", Proc. Of the workshop on spatial and temporal Interaction, Nov [3] Goldstein D, Ghosh D, Conlon E, "Statistical issues in the clustering of gene expression data", 00, 1: [4] Wrong A.K.C., "High order pattern discovery from Discrete valued Data", IEEE Trans. On Knowledge and Data Engineering, 9(6): ,1997. [5] Sy B. K., "Informationtheoretical and statistical approaches for independence Test ",Proc. Of the international S-PLUS Users Conference, published by Mathsoft Inc., Washington D.C., Oct [6] N.D. Holmquist & O.D. Williams "Variabililty in classification of carcinoma in Situ of the Uterine Cervix", Archives of pathology, [7] B.S. Everitt, "statistical methods for medical Investigations", First published in great Britain [8] S. Swift, "Consensus clustering and functional interpretation of gene-expression data ",Genome Biology, Nov. 004.
7 Create & Design (pathologist.mdb) using DB program Create table 1 with observed frequencies Statistical processing (preprocessing) calculating Chi-square Kappa measures Obtain table Using weighted-kappa guideline Report with decision statement Report with decision statement End Figure 1: Suggested Data Model for current work
8 Table 1: observed frequencies of biopsy slides classified by two pathologist according to most involved lesion of the uterine cervix. Pathologist Pathologist Total Ti Total T T Table : proportion of each entry as compared to the grand total T i T j Table 3: Evaluation of observed Kappa values Weighted_kappa Strength of agreement 0.00 Poor Slight Fair Moderate Substantial Almost perfect
9 الخلاصة طرق التحري والا حصاء المستخدمة في مسح البيانات الفي وية مرتضى محمد حمد E. mail : [email protected] لقد تبين ان اسلوب التحري عن البيانات يكشف لنا الكثير من الانماط الغير المعروفة والتي لها دورها الهام في عملية صنع القرار. ان هذه الانماط يمكن ان تلعب دورا مهما في ازالة الستار عن المساحات التي يمكن من خلالها تحسين عملية صنع القرار. التقنية الاحصاي ية كا ختبار chi-square لتحديد نوع الاقتران كثيرة الاستعمال في الحقل الطبي. رغم ذلك تفسير البعض من النتاي ج التي جرى حسابها باستعمال هذه التقنية الاحصاي ية تبدو مهمة صعبة جدا. ان نوع الاقتران في اغلب الاحيان يكون لاخطيا ولذلك سيخفي الجزء المهم لاستعمال هذه التقنية. في هذا البحث تم التعامل باسلوب جديد من خلال مسح البيانات الاولية لاي اقتران محتمل (خطي او لاخطي). طريقة التحري عن البيانات والاستدلال الاحصاي ي تعتبر من الادوات الاساسية لهذا البحث.
Vol. 35, No. 3, Sept 30,2000 ملخص تعتبر الخوارزمات الجينية واحدة من أفضل طرق البحث من ناحية األداء. فبالرغم من أن استخدام هذه الطريقة ال يعطي الحل
AIN SHAMS UNIVERSITY FACULTY OF ENGINEERING Vol. 35, No. 3, Sept 30,2000 SCIENTIFIC BULLETIN Received on : 3/9/2000 Accepted on: 28/9/2000 pp : 337-348 GENETIC ALGORITHMS AND ITS USE WITH BACK- PROPAGATION
Is it statistically significant? The chi-square test
UAS Conference Series 2013/14 Is it statistically significant? The chi-square test Dr Gosia Turner Student Data Management and Analysis 14 September 2010 Page 1 Why chi-square? Tests whether two categorical
In this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Fairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
Data Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan
Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
Testing Research and Statistical Hypotheses
Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
CLASSIFYING SERVICES USING A BINARY VECTOR CLUSTERING ALGORITHM: PRELIMINARY RESULTS
CLASSIFYING SERVICES USING A BINARY VECTOR CLUSTERING ALGORITHM: PRELIMINARY RESULTS Venkat Venkateswaran Department of Engineering and Science Rensselaer Polytechnic Institute 275 Windsor Street Hartford,
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
Chapter 4. Probability and Probability Distributions
Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the
Evaluating the Area of a Circle and the Volume of a Sphere by Using Monte Carlo Simulation. Abd Al -Kareem I. Sheet * E.mail: kareem59y@yahoo.
Iraqi Journal of Statistical Science (14) 008 p.p. [48-6] Evaluating the Area of a Circle and the Volume of a Sphere by Using Monte Carlo Simulation Abd Al -Kareem I. Sheet * E.mail: [email protected]
DHL Data Mining Project. Customer Segmentation with Clustering
DHL Data Mining Project Customer Segmentation with Clustering Timothy TAN Chee Yong Aditya Hridaya MISRA Jeffery JI Jun Yao 3/30/2010 DHL Data Mining Project Table of Contents Introduction to DHL and the
Easily Identify Your Best Customers
IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do
Machine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
Chi-square test Fisher s Exact test
Lesson 1 Chi-square test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions
Text Analytics. A business guide
Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application
Association Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools
Paper by W. F. Cody J. T. Kreulen V. Krishna W. S. Spangler Presentation by Dylan Chi Discussion by Debojit Dhar THE INTEGRATION OF BUSINESS INTELLIGENCE AND KNOWLEDGE MANAGEMENT BUSINESS INTELLIGENCE
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
A Data Mining Study of Weld Quality Models Constructed with MLP Neural Networks from Stratified Sampled Data
A Data Mining Study of Weld Quality Models Constructed with MLP Neural Networks from Stratified Sampled Data T. W. Liao, G. Wang, and E. Triantaphyllou Department of Industrial and Manufacturing Systems
Big Data: Rethinking Text Visualization
Big Data: Rethinking Text Visualization Dr. Anton Heijs [email protected] Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Simulating Chi-Square Test Using Excel
Simulating Chi-Square Test Using Excel Leslie Chandrakantha John Jay College of Criminal Justice of CUNY Mathematics and Computer Science Department 524 West 59 th Street, New York, NY 10019 [email protected]
ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM
ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM IRANDOC CASE STUDY Ammar Jalalimanesh a,*, Elaheh Homayounvala a a Information engineering department, Iranian Research Institute for
White Paper April 2006
White Paper April 2006 Table of Contents 1. Executive Summary...4 1.1 Scorecards...4 1.2 Alerts...4 1.3 Data Collection Agents...4 1.4 Self Tuning Caching System...4 2. Business Intelligence Model...5
Section 12 Part 2. Chi-square test
Section 12 Part 2 Chi-square test McNemar s Test Section 12 Part 2 Overview Section 12, Part 1 covered two inference methods for categorical data from 2 groups Confidence Intervals for the difference of
RMTD 404 Introduction to Linear Models
RMTD 404 Introduction to Linear Models Instructor: Ken A., Assistant Professor E-mail: [email protected] Phone: (312) 915-6852 Office: Lewis Towers, Room 1037 Office hour: By appointment Course Content
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA
CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA Chapter 13 introduced the concept of correlation statistics and explained the use of Pearson's Correlation Coefficient when working
REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION
REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety
Analyzing Research Data Using Excel
Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial
Linear Models in STATA and ANOVA
Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples
An Algorithm for Data Hiding in Binary Images. Eman Th. Sedeek Al-obaidy Veterinary Medicine College University of Mosul
Raf. J. of Comp. & Math s., Vol. 5, No. 2, 2008 An Algorithm for Data Hiding in Binary Images Eman Th. Sedeek Al-obaidy Veterinary Medicine College University of Mosul Received on: 1/10/2007 Accepted on:
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social
Implications of Big Data for Statistics Instruction 17 Nov 2013
Implications of Big Data for Statistics Instruction 17 Nov 2013 Implications of Big Data for Statistics Instruction Mark L. Berenson Montclair State University MSMESB Mini Conference DSI Baltimore November
A Divided Regression Analysis for Big Data
Vol., No. (0), pp. - http://dx.doi.org/0./ijseia.0...0 A Divided Regression Analysis for Big Data Sunghae Jun, Seung-Joo Lee and Jea-Bok Ryu Department of Statistics, Cheongju University, 0-, Korea [email protected],
Analysing Questionnaires using Minitab (for SPSS queries contact -) [email protected]
Analysing Questionnaires using Minitab (for SPSS queries contact -) [email protected] Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Chapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
Data Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
Pentaho Data Mining Last Modified on January 22, 2007
Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org
Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki R. Wooten, PhD, LISW-CP 2,3, Jordan Brittingham, MSPH 4
1 Paper 1680-2016 Using GENMOD to Analyze Correlated Data on Military System Beneficiaries Receiving Inpatient Behavioral Care in South Carolina Care Systems Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
The Cost of Annoying Ads
The Cost of Annoying Ads DANIEL G. GOLDSTEIN Microsoft Research and R. PRESTON MCAFEE Microsoft Corporation and SIDDHARTH SURI Microsoft Research Display advertisements vary in the extent to which they
MBA 611 STATISTICS AND QUANTITATIVE METHODS
MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain
Application of Predictive Model for Elementary Students with Special Needs in New Era University
Application of Predictive Model for Elementary Students with Special Needs in New Era University Jannelle ds. Ligao, Calvin Jon A. Lingat, Kristine Nicole P. Chiu, Cym Quiambao, Laurice Anne A. Iglesia
MBA 8473 - Data Mining & Knowledge Discovery
MBA 8473 - Data Mining & Knowledge Discovery MBA 8473 1 Learning Objectives 55. Explain what is data mining? 56. Explain two basic types of applications of data mining. 55.1. Compare and contrast various
A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic
A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia
Simple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du [email protected] University of British Columbia
IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA
CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product
Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago [email protected] Keywords:
Standardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
CLUSTER ANALYSIS. Kingdom Phylum Subphylum Class Order Family Genus Species. In economics, cluster analysis can be used for data mining.
CLUSTER ANALYSIS Introduction Cluster analysis is a technique for grouping individuals or objects hierarchically into unknown groups suggested by the data. Cluster analysis can be considered an alternative
Model-Based Cluster Analysis for Web Users Sessions
Model-Based Cluster Analysis for Web Users Sessions George Pallis, Lefteris Angelis, and Athena Vakali Department of Informatics, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece [email protected]
Learning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control
Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;
Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics
Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This
COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH
205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology
Calculating, Interpreting, and Reporting Estimates of Effect Size (Magnitude of an Effect or the Strength of a Relationship)
1 Calculating, Interpreting, and Reporting Estimates of Effect Size (Magnitude of an Effect or the Strength of a Relationship) I. Authors should report effect sizes in the manuscript and tables when reporting
Survival Analysis of the Patients Diagnosed with Non-Small Cell Lung Cancer Using SAS Enterprise Miner 13.1
Paper 11682-2016 Survival Analysis of the Patients Diagnosed with Non-Small Cell Lung Cancer Using SAS Enterprise Miner 13.1 Raja Rajeswari Veggalam, Akansha Gupta; SAS and OSU Data Mining Certificate
Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
Hexaware E-book on Predictive Analytics
Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
ISSN: 2321-7782 (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
A Stock Pattern Recognition Algorithm Based on Neural Networks
A Stock Pattern Recognition Algorithm Based on Neural Networks Xinyu Guo [email protected] Xun Liang [email protected] Xiang Li [email protected] Abstract pattern respectively. Recent
Comparing Methods to Identify Defect Reports in a Change Management Database
Comparing Methods to Identify Defect Reports in a Change Management Database Elaine J. Weyuker, Thomas J. Ostrand AT&T Labs - Research 180 Park Avenue Florham Park, NJ 07932 (weyuker,ostrand)@research.att.com
January 26, 2009 The Faculty Center for Teaching and Learning
THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i
Introduction to Analysis of Variance (ANOVA) Limitations of the t-test
Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
Prediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
STATISTICS FOR PSYCHOLOGISTS
STATISTICS FOR PSYCHOLOGISTS SECTION: STATISTICAL METHODS CHAPTER: REPORTING STATISTICS Abstract: This chapter describes basic rules for presenting statistical results in APA style. All rules come from
What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO
What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,
Introduction to Longitudinal Data Analysis
Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction
Bill Burton Albert Einstein College of Medicine [email protected] April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1
Bill Burton Albert Einstein College of Medicine [email protected] April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce
Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL
Paper SA01-2012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations
Decision Support System For A Customer Relationship Management Case Study
61 Decision Support System For A Customer Relationship Management Case Study Ozge Kart 1, Alp Kut 1, and Vladimir Radevski 2 1 Dokuz Eylul University, Izmir, Turkey {ozge, alp}@cs.deu.edu.tr 2 SEE University,
Grid Density Clustering Algorithm
Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods
Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods Enrique Navarrete 1 Abstract: This paper surveys the main difficulties involved with the quantitative measurement
