DATA MINING FOR HADITH CLASSIFICATION BY KAWTHER A.ALDHLAN A thesis submitted in fulfilment of the requirement for the degree of Doctor of Philosophy in Information and Communication Technology Kulliyyah of Information and Communication Technology International Islamic University Malaysia FEBRUARY 2013
ABSTRACT The holy Qur'an and Hadith are the two fundamental resources of the legislation and law in Muslim community. Including the Islamic books, these resources can be used as the sole authoritative source of knowledge and wisdom. Besides, they stand out as the source of a large collection of analysis and interpretation texts, which could provide a gold standard for artificial intelligent (AI) knowledge extraction and knowledge representation experiments. Recently, there are increasing attentions to automate the Islamic resources Qur'an, Sunnah and tradition books, motivate researchers to look for mechanisms that can represent and discover the knowledge of these resources. In the present study, extracted Islamic knowledge representing the focal point of the research, three famous books in Hadith science framed the corpus of this study. The present study attempted to explore new approach to classify Hadith according to its validity degree (Sahih, Hasan, Da'eef and Maudoo') using data mining techniques, the proposed Hadith classifier (HC) model was built through learning process and was represented by the tree structure modeling. Moreover, the attributes of the instances originally were obtained from the source books directly. Whilst some of these attributes which is not mentioned in these books were indicated as null values, or missing values. A novel mechanism was employed to handle these missing data. This mechanism was generated based on the investigation methods of the Isnad in Hadith science. Representing or extracting Islamic knowledge is very critical step because it may affect life of Muslim, therefore, the results of the research were compared with the resource books, concurrently with the point of view of the expert in Hadith science. Indeed, the extracted knowledge shed light on the differences between Al-Imam Al-Bukhari, Al-Termithi and Al-Albani methods in takhareej AL-Hadith. Furthermore, the findings of the research showed that the performance of the proposed HC had significant effect with the proposed missing data detector method (MDD), the correct classification rate (CCR) was sharply increased from (50.1502 %) before using MDD to (97.597%) after applying it. Furthermore, the favorable results of comparing the performance of HC against naïve bayes classifier indicated that the decision tree (DT) Modeling is a viable approach to classify Hadith due to the excel performance, ease of implementation, and ease of rules induction and results interpretation. ii
HC HC CCR MDD MDD iii
APPROVAL PAGE The thesis of Kawther Binti Ali ALdhlan has been approved by the following: Akram M. Zeki Supervisor Ahmed M. Zeki Co-Supervisor Tengku Mohd Bin Tengku Sembok Internal Examiner Imad Fakhri Taha Alshaikhli Internal Examiner Hassanin M. Al-Barhamtoshy External Examiner Abdul Kabir Hussain Solihu Chairman iv
DECLARATION I hereby declare that this thesis is the result of my own investigations, except where otherwise stated. I also declare that it has not been previously or concurrently submitted as a whole for any other degrees at IIUM or other institutions. Kawther Binti Ali Dhlan Al-Dhlan Signature Date.. v
INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA DECLARATION OF COPYRIGHT AND AFFIRMATION OF FAIR USE OF UNPUBLISHED RESEARCH Copyright 3102 by Kawther Binti Ali Al-Dhlan. All rights reserved. DATA MINING FOR HADITH CLASSIFICATION No part of this upublished research may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise without prior written permission of the copyright holder except as provided below. 1. Any material contained in or derived from this unpublished research may only be used by others in their writing with due acknowledgement. 2. IIUM or its library will have the right to make and transmit copies (print or electronic) for institutional and academic purposes. 3. The IIUM library will have the right to make, store in a retrieval system and supply copies of this unpublished research if requested by other universities and research libraries. Affirmed by Kawther Binti Ali Dhlan Aldhlan.. Signature.. Date vi
To My Beloved Mother (May Allah Bless Her) To My Father To My Husband and My Family vii
ACKNOWLEDGEMENTS In the name of Allah, the Most Beneficent, the Most Merciful. All praise and gratitude to Almighty Allah S.W.T. for giving me an opportunity to undertake and complete this study. I felt very grateful for having an exceptional doctoral committee. Without their guidance and support, I may not be able to reach this far in my academic endeavor. First and foremost, my deepest gratitude goes to Dr. Akram M. Zeki, my major supervisor and my mentor. His dedication and patient in guiding me has helped me get through this study and make me what I am today. My deepest gratitude also goes to Dr. Ahmed M. Zeki, my co-supervisor, who have greatly contributed in assisting me to complete my work. I would to extent my thanks to the examiners committee for their valued notes. My personal thanks also go to all lecturers and staff of the Kulliyyah of Information and Communication Technology whose wisdom have shaped my way of looking at things one way or another. I am also grateful to the University Of Hail UOH for giving me this priceless opportunity to further my study in this doctoral program. And special thanks to Associate Prof. Dr. Naser Ismail from Islamic department in college of education in UOH for his efforts and guidance. Finally, I would like to extend my gratitude to my beloved husband, Hamad Alreshidi, for his understanding, collaborative and undivided support. To my children, Solaf, Turki and Wed may this success become an inspiration for their future. And to my mother (God bless her), father, sisters and brothers thank you for your unfailing faith, thank you for encouragements and prayers. viii
TABLE OF CONTENTS Abstract... ii Abstract in Arabic... iii Approval Page... v Declaration Page... vi Copyright Page... vii Dedication... viii Acknowledgement... ix List of Tables... xiii List of Figures... xv List of Abbreviations... xviii List of Symbols... xxi CHAPTER ONE: INTRODUCTION... 1 1.1 The Importance of Hadith... 1 1.2 Background of the Study... 3 1.3 Statement of the Problem... 8 1.4 Research Objectives... 10 1.5 Research Questions... 10 1.6 Research Hypotheses... 11 1.7 Significance of the Study... 11 1.8 Research Model... 12 1.8.1 The Research Framework... 12 1.8.2 Data Representation Models... 14 1.8.2.1 Hadith Structure... 14 1.8.2.2 The Representation of Hadith Structure... 15 1.9 Research Approach... 16 1.10 Organization of the Thesis... 17 CHAPTER TWO: OVERVIEW OF HADITH SCIENCE... 19 2.1 Introduction... 19 2.2 Definition of Hadith... 20 2.3 Component of Hadith... 20 2.4 Hadith Verification... 22 2.5 Classification of Hadith... 24 2.6 Summary... 33 CHAPTER THREE: LITRATURE REVIEW... 35 3.1 Introduction... 35 3.2 Overview of Data Mining... 35 3.2.1 What Is a Data Mining?... 35 3.2.2 Data Mining Approaches... 37 3.2.2.1 Supervised learning... 37 ix
3.2.2.2 Unsupervised Clustering... 45 3.2.2.3 Semi-supervised learning... 46 3.2.3 How to Mine Data... 46 3.2.3.1 The Knowledge Discovery Process... 47 3.2.4 Applications of Data Mining... 51 3.2.5 Why Data Mining... 57 3.3 The role of ICT in Islam... 58 3.4 Applications of DM in Hadith Science... 60 3.5 Summary... 67 CHAPTER FOUR: RESEARCH METHODOLOGY... 68 4.1 Introduction... 68 4.2 Research Framework... 68 4.3 Research Procedures... 72 4.4 The Sample of the Study... 81 4.5 Evaluation Strategy... 84 4.5.1 Confusion Matrix... 84 4.5.2 Correct Classification Rate... 85 4.5.3 Error Rate... 86 4.5.4 Sensitivity... 86 4.5.5 Specificity... 87 4.5.6 Precision... 87 4.5.7 F-Measure... 87 4.5.8 Receiver Operating Characteristic... 88 4.5.9 Area under the Curve... 90 4.6 Summary... 90 CHAPTER FIVE: DATA PRE-PROCESSING AND ALGORITHMS... 92 5.1 Introduction... 92 5.2 Corpus of the Study... 92 5.3 Data Pre-Processing... 94 5.4 Attributes Selection... 95 5.5 The Missing Data Detector Method (MDD)... 98 5.6 Decision Tree Construction... 113 5.6.1 Select the Attributes... 114 5.6.1.1 Entropy (Information Impurity)... 114 5.6.1.2 Information Gain Criterion... 115 5.6.1.3 Gain ratio... 116 5.6.2 Pruning Option... 116 5.6.2.1 Pre-Pruning... 117 5.6.2.2 Post-Pruning... 117 5.6.2.3 Comparisons of pre-pruning and post-pruning... 117 5.6.3 Rule Induction Based on the Pruned Tree... 119 5.7 C4.5 Algorithm... 119 5.8 Summary... 120 x
CHAPTER SIX: MODEL EVALUATION AND DISCUSSION... 121 6.1 Introduction... 121 6.2 Implementation... 122 6.3 Experiment Procedures... 122 6.3.1 Training Procedures... 122 6.3.2 Testing Procedures... 129 6.3.2.1 Trial#1: Test Phase with (33.3%) of the Sample size... 129 6.3.2.2 Trial#2: Test Phase with (33.3%) of the Sample size... 132 6.3.2.3 Trial#3: Test Phase with (11.1%) of the Sample size... 138 6.3.2.4 Trial#4: Test Phase with (11.1%) of the Sample size... 140 6.3.3 Comparing the Results of HC with the Expert Point View... 144 6.4 The Comparison between the HC and Naïve Bayes Classifier... 146 6.5 Summary... 149 CHAPTER SEVEN: CONCLUSION AND FUTURE WORKS... 151 7.1 Overview... 151 7.2 Summary of the proposed method... 151 7.3 Summary of Findings... 153 7.4 Comparative study with literature... 154 7.5 Conclusion... 156 7.6 Contributions... 157 7.7 Future Directions... 158 BIBLIOGRAPHY... 160 APPENDIX A... 169 GLOSSARY... 175 xi
LIST OF TABLES Table No. Page No. 3.1 Summarization of the previous works in Hadith 65 4.1 The attributes of the sample set 76 4.2 No. of Hadith in Al-Bukhari collection 81 4.3 Al-Bukhari narrators' grade 82 4.4 Confusion matrix 85 4.5 AUC index and its effectiveness for discrimination 90 5.1 Results of pre-processing phase for Al-Hadith in Figure 5.4 95 5.2 Hadith attributes according to (Al-Suyutih, 1965,Tahan,1996) 97 5.3 The Tracing Table of the Example (1) 103 5.4 Hadith terms used to indicate the narrator's reliability 104 5.5 The Tracing Table of the Example (2) 108 5.6 Hadith terms used to indicate the narrator's retention (preservation) 108 5.7 The Tracing Table of the Example (3) 111 5.8 The Tracing Table of the Example (4) 113 5.9 The rank of the attributes according to the entropy value 115 5.10 The rank of the attributes according to the information gained Method 115 5.11 The rank of the attributes according to gain ratio method 116 6.1 The information gained of the Hadith features 123 6.2 The evaluation results of Hadith Classifier (training phase) 127 6.3 The evaluation results of HC of trial#1 131 6.4 The evaluation results of HC of trial#2 136 6.5 The evaluation results of HC of trial#3 138 xii
6.6 The evaluation results of HC of trial#4 142 6.7 The RS Between the Results of HC and the Expert Point View 145 6.8 The comparison between Naïve Bayes classifier and HC classifier 146 7.1 Comparative Study with the literature 155 xiii
LIST OF FIGURES Figure No. Page No. 1.1 The research framework 12 1.2 The proposed Hadith Classifier HC 13 1.3 Hadith components 14 1.4 Bottom- Up approach 15 1.5 Top- Down approach 15 2.1 Components of Hadith 21 2.2 Hadith classification 24 2.3 Mursal م ر س ل (hurried) case 26 2.4 Munqati م ن ق ط ع (broken) case 26 2.5 Mu allaq م ع ل ق (hanging) case 27 3.1 The boundary hyperplane in support vector machine 39 3.2 Illustration of decision tree with replication 43 3.3 An overview of the knowledge discovery process (KDP) 50 4.1 Hadith classification without handling missing data 69 4.2 The process of missing data detector MDD 70 4.3 The proposed Hadith Classifier 71 4.4 The research procedures 73 4.5 The first stage of the experiment (Training phase) 78 4.6 The second stage of the experiment (Testing phase I) 79 4.7 The second stage of the experiment (Testing phase II) 80 4.8 AL-Ahadith in Sahih Al-Bukhari 81 4.9 Summary of Al-Bukhari narrators' grade 82 xiv
4.10 AL-Ahadith in Jami'u AL-Termithi 83 4.11 AL-Ahadith in Silsilat Al-AHadith Al-Dae'ifah w' Al-Mawdu'ah. 83 4.12 The confusion matrix of the training model 85 4.13 An ROC curve and different points of significance 89 5.1 Example of Sahih Hadith 93 5.2 Example of Hasan Hadith 93 5.3 Example of Da'eef Hadith 93 5.4 Example of Maudoo' Hadith 93 5.5 The metadata of the Hadith narrators in narrator table 99 5.6 The relationships between the narrators in the teacher table 99 5.7 The flowchart of the first method 100 5.8 The flowchart of the second method 102 5.9 The flowchart of the narrators' reliability method (Part I) 106 5.10 The flowchart of the narrators 'reliability method (Part II) 107 5.11 The flowchart of the method of determining the narrator's preservation 110 5.12 The flowchart of the method that determines the value of the Isnad defective 112 5.13 Part of DT tree for HC 118 6.1 The results of identification narrators of the Isnad chain 124 6.2 The induced rules after parsing the path from the root to the leaf node 124 6.3 The decision tree of the target Hadith Classifier 125 6.4 The confusion matrix of the training model 126 6.5 ROC curves of the classes in Hadith classifier (Training Phase) 128 6.6 Test procedures 129 6.7 The confusion matrix of Hadith Classifier of trial#1 130 xv
6.8 ROC curves of the classes in HC for trial#1 132 6.9 Isnad identification Process for 33.3% of test dataset 134 6.10 Classification process of 33.3% of test dataset 135 6.11 The confusion matrix of Hadith Classifier of trial#2 136 6.12 ROC curves of the classes in HC for trial#2 137 6.13 The confusion matrix of Hadith Classifier of trial#3 138 6.14 ROC curves of the classes in HC for trial#3 139 6.15 Isnad identification process for trial#4 140 6.16 Classification process of trial#4 141 6.17 The confusion matrix of Hadith Classifier of trial#4 141 6.18 ROC curves of the classes in Hadith Classifier for trial#4 143 6.19 ROC curves of the naïve Bayes classifier 148 6.20 CCR of naïve Bayes classifier and DT classifier 149 7.1 Summary of the research approach 152 xvi
LIST OF ABBREVIATIONS Acc ANN AUC CART CBR CCR CCRE CCRHC DFT DM DOB DOD DTs ER FN FP GA HC ICT ID ID3 KDP Accurate rate Artificial Neural Network Area Under the Curve Classification And Regression Trees Case-Based Reasoning Correct Classification Rate Correct Classification Rate according to the Expert Correct Classification Rate according to the HC. Document Frequency Thresholding Data Mining Date Of Birth Date Of Death Decision Trees Error Rate False Negative False Positive Genetic Algorithms Hadith Classifier Information Communication Technology Identification Interactive Dicotomizer Knowledge Discovery Process xvii
K-NN MAX MDD MIN NB NCP NOC NOCR NOP NOR NWP RI RS ROC SN SNoW SP SVD SVM TF/IDF TN TNR TP TPR K- Nearest Neighbor Maximum Missing Data Detector Minimum Naïve Bayes Number of Correct Prediction Total Number of Cases Number of Corrected Rules Total Number of Predictions Number of Rules Number of Wrong Predictions Rule Induction Rate of similarity. Receiver Operating Characteristic Sensitivity Sparse Network of Winnows Specificity Singular Value Decomposition Support Vector Machines Term Frequency Inverse Documentation Frequency True Negative True Negative Rate True Positive True Positive Rate xviii
UML VC VSM WWW Unified Modeling Language. Vapnik Chervonenkis Vector Space Model World Wide Web xix
LIST OF SYMBOLS C CCR ER FN FP H i I entropy IG J K L M N NCP NOC NOP NWP P(i) R cv R te SP TN Corpus Correct Classification Rate Error Rate False Negative False Positive Tested Hadith Information entropy Information Gain The total number of classes Attributes Dataset Feature Training instance/ sample size Number of Correct Prediction Total Number of cases Total Number of Predictions Number of wrong Predictions Probability The V-fold cross validation accuracy Validation accuracy Specificity True Negative xx
TNR TP TPR V v_best x i Y i True Negative Rate True Positive True Positive Rate Subsets The best attribute Instance Class xxi
CHAPTER ONE INTRODUCTION 1.1 THE IMPORTANCE OF HADITH The Qur'an is the last divine book, which was revealed from Allah as a declaration and guidance to mankind. It is an explanation of all things and means for men to be rightly guided. In many verses of the Qur'an, it is commanded to obey the prophet of Allah. (ق ل ا ط يع وا الل ه و ا ط يع وا الر س ول ف ا ن ت و ل و ا ف ا ن م ا ع ل ي ه م ا ح م ل و ع ل ي ك م م ا ح م ل ت م و ا ن ت ط يع وه ت ه ت د وا و م ا ع ل ى الر س ول ا لا ال ب لاغ ال م ب ين ) (سورة النور 54) Say: "Obey Allah and obey the Messenger, but if you turn away, he (Messenger Muhammad ) is only responsible for the duty placed on him (i.e. to convey Allah's Message) and you for that placed on you. If you obey him, you shall be on the right guidance. The Messenger's duty is only to convey (the message) in a clear way (i.e. to preach in a plain way)." (The Holy Qur'an 24: 54) 1 This is quite a significant point because understanding the Qur'an fully can only be possible with following the Sunnah of the prophet. Sunnah, means the actions, sayings and silent permissions (or disapprovals) of the Prophet. The word Sunnah is also used to refer to religious duties that are optional. Furthermore, Sunnah means the recorded sayings (Hadith) of Prophet Muhammad. In this sense, Muslims believe that the Sunnah of the Prophet Muhammad is the second of the two revealed fundamental sources of Islam, after the Holy Qur'an (Hasan, 2004). It is impossible to 1 Translation of the meaning of the Nobel Qur'an was taken from http://www.dar ussalam.com/thenoblequran/, The Nobel Qur'an. 1
understand the Qur'an without reference to the Hadith; and it is impossible to explain Hadith without relating it to the Qur'an. Where the Quran gives Muslims a broad framework for how they should live, the Hadith gives them specific information. For instance: Qur'an commands Muslim to pray, prophet Muhammad explained when and how to pray: (ص ل وا ك م ا ر ا ي ت م ون ي ا ص ل ي). ا خ ر ج ه الش ي خ ان Perform your prayer in the same manner you had seen me doing [Reported by Al-Bukhai& Muslim]. 2 Qur'an commands Muslim to make Hajj, prophet Muhammad explained how to perform Hajj (خ ذ وا ع ن ي م ن اس ك ك م) ا خ ر ج ه م س ل م Take from me your rituals (of Hajj) [Reported by Muslim] 3 Moreover, Allah has informed in the surah Al-e Imran that the Prophet had the characteristic of teaching the Qur'an and purifying mankind: (ل ق د م ن الل ه ع ل ى ال م و م ن ين ا ذ ب ع ث ف يه م ر س ولا م ن ا ن ف س ه م ي ت ل و ع ل ي ه م ا ي ات ه و ي ز ك يه م و ي ع ل م ه م ال ك ت اب و ال ح ك م ة و ا ن ك ان وا م ن ق ب ل ل ف ي ض لال م ب ين ) (سورة ا ل عمران 164) Indeed Allah conferred a great favor on the believers when He sent among them a Messenger (Muhammad ) from among themselves, reciting unto them His Verses (the Qur'an), and purifying them (from sins by their following him), and instructing them (in) the Book (the Qur'an) and Al-Hikmah [the wisdom and the Sunnah of the Prophet (i.e. his legal ways, statements, acts of worship, etc.)], while before that they had been in manifest error (The Holy Qur'an 2: 164) 2 Translation of the Hadith was taken from http://3refe.com/vb/showthread.php?t=160957 The official website of Dr.Al-Ereefi 3 Translation of the Hadith was taken from http://en.wathakker.net/articles/print/584 wathakker.com 2
It would be useful to attract attention to the phrase "teaching the book and the wisdom which emphasizes the relationship between Qur'an and Hadith and indicates Hadith as the second resource of the knowledge and wisdom, therefore, Hadith is an important source of reference for the development of Islamic laws, the Muslim community recorded prophet Muhammad's words and actions for posterity, and as the number of these reported conversations grew exponentially in the century after his death, the community developed sophisticated methods to evaluate their veracity to know which traditions were reliable, and which were clearly fraudulent. While the early collections of Hadith often contained Hadith that were of questionable origin, collections of authenticated Hadith called Sahih (sound, true, correct) were compiled, gradually by many Hadith scientists such as Al-Bukhari, Muslim, Al-Tirmidhi, Al- Nasa'i, Ibnu Majah, Abu Daud, Al-Darimi, Malek and Ibnu Hanbal (Hadithtraditional books, 2009). 1.2 BACKGROUND OF THE STUDY Nowadays, many studies are published and much software are developed to serve the prophetic tradition through several channels that can help students and scientists to find tremendous amount of information in the simplest and the easiest way. Furthermore, such software introduce quick electronic search instead of the manual search (Aldhlan et al., 2010). Most initial studies concentrated on transferring the Hadith resources books such as Al-Bukhari collection, Muslim collection and other books into databases either on websites or as software. However, it has become necessary to find smart approaches that can adopt Hadith literature rather than simply storing Hadith resources on a compact disk or websites with all probabilities of having negative 3