A CLOSED DOMAIN TEXT-TO-SPEECH SYNTHESIS SYSTEM FOR ARABIC LANGUAGE USING DATA MINING CLASSIFICATION TECHNIQUE
|
|
- Ralf Wood
- 7 years ago
- Views:
Transcription
1 A CLOSED DOMAIN TEXT-TO-SPEECH SYNTHESIS SYSTEM FOR ARABIC LANGUAGE USING DATA MINING CLASSIFICATION TECHNIQUE
2 A Closed Domain Text-To-Speech Synthesis System for Arabic Language using Data Mining Classification Technique by Maisa M. Al-Khudair Dr. Natheer Y. Khasawneh (Advisor) Thesis submitted in partial fulfillment of the requirements for the degree of M.Sc. in Computer Engineering At The Faculty of Graduate Studies Jordan University of Science and Technology August, 2009
3 A CLOSED DOMAIN TEXT-TO-SPEECH SYNTHESIS SYSTEM FOR ARABIC LANGUAGE USING DATA MINING CLASSIFICATION TECHNIQUE by Maisa M. Al-Khudair Signature of Author Committee Member Signature and Date Dr. Natheer Y. Khasawneh (Chairman).... Dr. Hassan Najadat (Member)... Dr. Mohammad A. Al-Jarrah (External Examiner, YU)... August, 2009
4 DEDICATION To my parents and my family. i
5 ACKNOWLEDGMENTS First of all, thanks to ALLAH who gave me the power, the effort, and the enthusiasm to start and complete this work. I would like to thank my supervisor Dr. Natheer Khaswaneh for his support, encouragement, valuable hints, and most importantly for his patience with me. Also, thanks go to the discussion committee members: Dr. Natheer Khaswaneh, Dr. Hassan Najadat and Dr. Mohammad Al-Jarrah for accepting to participate and for their constructive comments and suggestions. Thanking my parents would not be enough, thanks for your support and patience with me. I would like to give my thanks for both of you: Noor and Arwa for your ultimate support and care. Also I would like to especially thank: Mr. Harri Saarikoski for his great help; Ms. Amani Tawalbeh from Yarmouk FM radio stuff represented by general manager Mr. Bashar Qabalan and vice manager Mr. Al-Samadi.I would like to thanks also Amman Net radio stuff also especially Mr. Mohammad Al-Ersan for their great assistance. Finally, I would like to thank all my friends; and my friends from JUST also for their endless support, and for being with me in the hard situations that I have faced during the work in this thesis. ii
6 TABLE OF CONTENTS Title Page DEDICATION i ACKNOWLEDGMENTS ii TABLE OF CONTENTS..... iii LIST OF FIGURES vi LIST OF TABLES..... viii LIST OF APPENDICES ix ABSTRACT x Chapter One: Introduction Overview of Text-To-Speech System Motivation and Thesis Methodology Thesis Outline... 3 Chapter Two: Related Work Text-To-Speech System Text -To-Speech System Architecture Concatenative Synthesis Formant Synthesis Corpus-Based Synthesis Speech Database Reduction Diphone Unit Word and Syllable Units Multi-form Units Arabic Language Database Text Categorization Pre-processing Document Text Categorization Algorithms iii
7 2.3.3 K-Nearest Neighbor Classifier (KNN) Support Vector Machines (SVM) Decision Tress Constructing C4.5 Classifier Neural Networks Arabic Language Arabic Text-To-Speech Systems Arabic Text Categorization Chapter Three: Architecture of a Closed Domain Text- 30 To-Speech Synthesis System for Arabic Language using Data Mining Clustering Technique. 3.1 Problem Definition The Proposed Approach Arabic Text Categorization Collecting Text Data Document Preprocessing Word Level Vector Generating WEKA Results Arabic Text-To-Speech System Building Building Speech Database Voice Waveform Selection Voice Waveform Concatenation Chapter Four: Testing the Arabic Text-To-Speech System Test Group Method Test and Evaluation Results Perception of the Sentences iv
8 Perception of the Sentences using Word Correctness Business Category Test Politics Category Test Sport Category Test System Test Perception of the Sentences using Precision and Recall Measures Business Category Test Politics Category Test Sport Category Test System Test Results of Mean Opinion Score Method Naturalness Speed Sound Quality Pronunciation Intelligibility Stress/Intonation Chapter Five: Conclusions and Future Work References Appendix A: Test Questionnaire Appendix B: Arabic Speech Test Sentences Samples Appendix C: Test Listening Results Sample Arabic Abstract v
9 LIST OF FIGURES Figure Description Page 2.1 Text-to-speech system architecture Corpus-based synthesis components The proposed system architecture Arabic text categorization process steps An example of document preprocessing Sample arff file that considers using tf *idf weighting scheme Confusion matrix of J48 classifier WEKA J48 classification tree Results of classifying test data documents using four weighting 41 schemes 3.8 Speech synthesis process Steps of sounds concatenation method Perception of business sentences-first and second listening Perception of politics sentences-first and second listening Perception of sport sentences-first and second listening Perception of system sentences-first and second listening Precision and recall measures for the perception of business sentences-first and second listening 4.6 Precision and recall measures for the perception of politics sentences-first and second listening 4.7 Precision and recall measures for the perception of sport sentencesfirst and second listening 4.8 Precision and recall measures for the perception of system sentences-first and second listening 4.9 Naturalness of the voice Speed of the speech Quality of sound Pronunciation mistakes Pronunciation effect on understanding some words Concentration level needed for understanding vi
10 4.15 Annoying level of the speech Voice understanding The difficulty in understanding the voice The intoation of the system The stress of the system The intonation differences of the system 67 vii
11 LIST OF TABLES Table Description Page 3.1 Text documents distribution of three categories Arabic stop words samples Results of classifying test data documents using four weighting methods Results of reducing words number of three categories using various 43 threshold values 4.1 Age distribution of the listeners Listeners remarks on system pronouciation 63 viii
12 LIST OF APPENDICES Appendix Description Page A Test Questionnaire 75 B Arabic Speech Test Sentences Samples 82 C Test Listening Results Sample 83 ix
13 ABSTRACT A CLOSED DOMAIN TEXT-TO-SPEECH SYNTHESIS SYSTEM FOR ARABIC LANGUAGE USING DATA MINING CLASSIFICATION TECHNIQUE by Maisa M. Al-Khudair Interest in text-to-speech systems has been increasing lately due to the fact that it is becoming more important in different areas like: entertainment, helping handicapped people and improving the machine-human interaction [1]. Text-to-speech systems are implemented using different speech size units like diphones and syllables. A corpus-based system which builds speech using concatenating pre-recorded sounds has proven to be the best text-to-speech system because it gives high intelligibility and produces more natural sound. The complexity of processing the speech storage increases due to the fact that more memory size is needed to store the pre-recorded sounds. In this thesis, we present a text-to-speech system that evaluates the effect of reducing the speech database inventory size for an Arabic text-to-speech system. Data mining technique was used to generate a classification model that is used to classify new incoming text documents into the system and to determine the "bag of words" that would be used to build the speech corpus for the overall system which includes three main domains (i.e. categories): business, politics and sport. In order to evaluate the effect of database reduction of speech sounds on listeners; we had grouped the speech database sounds into distinct groups that contains different number of recorded sound files. The lower the number of sound waves in the group; the smaller the speech database size would be. When system tends at generating new sound sentence; the database speech groups are searched first in order to detect if a pre-recorded sound file that corresponds to required word in sentence exists. If the sound file is not found; a signal processing tool is used to synthesize the missing sound wave. The output sound sentence is generated by concatenating both: pre-recorded and synthesized sound waves together. A subjective test on listeners was done and results showed that decreasing the database size by 29% of speech size reduces the recognition correctness by 0.57% while decreasing it by 86% of the size reduces the recognition process by 1.29% of correctness. x
14 Chapter One: Introduction 1.1 Overview of Text-To-Speech System Interest in text-to-speech systems had begun since 1960's where a lot of work in this field of language technology had presented starting with Bell's Labs research and ending up with Microsoft Agent which represents commercial systems. A text-to-speech system can be defined as the system that has written text in any language as an input and produces speech sounds corresponding to the input text as an output [1, 2]. Text-to-speech systems that consider using corpus based approach had gained more interest in the last decade [22]. A corpus based system depends on concatenating prerecorded sounds in order to convert written text into speech sounds that listeners can understand. Corpus based speech synthesized systems proved to give higher naturalness and intelligibility. A drawback of these systems is the larger database storage (i.e. inventory) size that these systems need to accommodate the increasing number of prerecording sounds to be stored. Also, more processing is required which makes these systems more costly [22]. Work regarding speech database reduction was introduced by many researchers. This work includes many languages like Thai and English languages [9, 13]. While some of these works were evaluated using subjective and objective tests; a similar work that considers database reduction for Arabic language could not be found. Existing speech database reduction approaches considered using various speech size units (i.e. diphones, syllables and words) which led into different results. While some researchers concluded 1
15 that larger database corpus size would provide higher text-to-speech sound quality, others claimed that such conclusions are not necessary correct [9]. The necessity to build an Arabic text-to-speech system which implements the usage of corpus-based synthesized system and considers reducing speech database size would be a challenging task to investigate. This implementation gains more importance when no previous studies exist in the literature. 1.2 Motivation and Thesis Methodology Text-to-speech systems in general are getting an increasing attention widely around the world since many languages had been considered by text-to-speech systems (i.e. English, French and German). Trying to add Arabic to such languages and improving the quality of produced voices could be an important improvement to such systems. Developing an Arabic text-to-speech system with high quality output voice would be useful for special category of society like handicapped. It would make many written text documents that media sites produce (i.e. news, weather and sport web sites) reachable. Minimizing the databases size that is used to store sound speech units is a target in limited-size devices such as Hand Hold devices and PDA's. Minimizing database size can be affected by the size of units selection that are used in synthesis process [9]. Reducing space used to store speech data can make the use of text-to-speech systems in limited-size devices reachable. No previous work that deals with reducing speech database for Arabic language was found in literature. In this thesis, we produce a solution to this problem that considers using the data classification technique to process the text data and produce the necessary 2
16 Appendix B: Arabic Speech Test Sentences Samples Category Sentence Business Business Politics Politics Sport Sport قال قال تمثل مغنية تامة حتى على نسبة صالح في فيفا كامل ما على خاصة في ل جميل بي 82
17 Appendix C: Test Listening Results Sample 1st Listening (Politics) ExpNo UserID Category FileID Variation Correct None Absent Percent Correct 1 One Politics 2_7_ % 2 Two Politics 5_4_ % 3 Three Politics 1_1_ % 4 Four Politics 4_5_ % 5 Five Politics 7_2_ % 6 Six Politics 3_6_ % 7 Seven Politics 6_3_ % 8 One Politics 2_1_ % 9 Two Politics 5_5_ % 10 Three Politics 1_2_ % 11 Four Politics 4_6_ % 12 Five Politics 7_3_ % 13 Six Politics 3_7_ % 14 Seven Politics 6_4_ % 15 One Politics 2_2_ % 16 Two Politics 5_6_ % 17 Three Politics 1_3_ % 18 Four Politics 4_7_ % 19 Five Politics 7_4_ % 20 Six Politics 3_1_ % 21 Seven Politics 6_5_ % 22 One Politics 2_3_ % 23 Two Politics 5_7_ % 24 Three Politics 1_4_ % 25 Four Politics 4_1_ % 26 Five Politics 7_5_ % 27 Six Politics 3_2_ % 28 Seven Politics 6_6_ % 29 One Politics 2_4_ % 30 Two Politics 5_1_ % 31 Three Politics 1_5_ % 32 Four Politics 4_2_ % 33 Five Politics 7_6_ % 34 Six Politics 3_3_ % 35 Seven Politics 6_7_ % 36 One Politics 2_5_ % 37 Two Politics 5_2_ % 38 Three Politics 1_6_ % 39 Four Politics 4_3_ % 40 Five Politics 7_7_ % 41 Six Politics 3_4_ % 42 Seven Politics 6_1_ % 83
18 1st Listening (Politics) ExpNo UserID Category FileID Variation Correct None Absent Percent Correct 43 One Politics 2_6_ % 44 Two Politics 5_3_ % 45 Three Politics 1_7_ % 46 Four Politics 4_4_ % 47 Five Politics 7_1_ % 48 Six Politics 3_5_ % 49 Seven Politics 6_2_ % 84
An Arabic Text-To-Speech System Based on Artificial Neural Networks
Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department
More informationAutomatic Evaluation Software for Contact Centre Agents voice Handling Performance
International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,
More informationThirukkural - A Text-to-Speech Synthesis System
Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,
More informationText-To-Speech Technologies for Mobile Telephony Services
Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary
More informationIndex Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationDYNAMIC GRAPH ANALYSIS FOR LOAD BALANCING APPLICATIONS
DYNAMIC GRAPH ANALYSIS FOR LOAD BALANCING APPLICATIONS DYNAMIC GRAPH ANALYSIS FOR LOAD BALANCING APPLICATIONS by Belal Ahmad Ibraheem Nwiran Dr. Ali Shatnawi Thesis submitted in partial fulfillment of
More informationData Mining: A Preprocessing Engine
Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,
More informationMaschinelles Lernen mit MATLAB
Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical
More informationCourse Syllabus. Purposes of Course:
Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationPractical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING
Practical Applications of DATA MINING Sang C Suh Texas A&M University Commerce r 3 JONES & BARTLETT LEARNING Contents Preface xi Foreword by Murat M.Tanik xvii Foreword by John Kocur xix Chapter 1 Introduction
More information1. Classification problems
Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification
More informationHow To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn
More informationA Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
More informationAdvice for Students completing the B.S. degree in Computer Science based on Quarters How to Satisfy Computer Science Related Electives
Advice for Students completing the B.S. degree in Computer Science based on Quarters How to Satisfy Computer Science Related Electives Students completing their B.S. degree under quarters had a requirement
More information131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
More informationWhat is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO
What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,
More informationWireless Remote Monitoring System for ASTHMA Attack Detection and Classification
Department of Telecommunication Engineering Hijjawi Faculty for Engineering Technology Yarmouk University Wireless Remote Monitoring System for ASTHMA Attack Detection and Classification Prepared by Orobh
More informationMaster s Program in Information Systems
The University of Jordan King Abdullah II School for Information Technology Department of Information Systems Master s Program in Information Systems 2006/2007 Study Plan Master Degree in Information Systems
More informationTable of Contents. Chapter No. 1 Introduction 1. iii. xiv. xviii. xix. Page No.
Table of Contents Title Declaration by the Candidate Certificate of Supervisor Acknowledgement Abstract List of Figures List of Tables List of Abbreviations Chapter Chapter No. 1 Introduction 1 ii iii
More informationAutomated News Item Categorization
Automated News Item Categorization Hrvoje Bacan, Igor S. Pandzic* Department of Telecommunications, Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia {Hrvoje.Bacan,Igor.Pandzic}@fer.hr
More informationClassification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data
Proceedings of Student-Faculty Research Day, CSIS, Pace University, May 2 nd, 2014 Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition
More informationEFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationData Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition
Brochure More information from http://www.researchandmarkets.com/reports/2170926/ Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd
More informationEnhancing Quality of Data using Data Mining Method
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN 25-967 WWW.JOURNALOFCOMPUTING.ORG 9 Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationTEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE
TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE Sangam P. Borkar M.E. (Electronics)Dissertation Guided by Prof. S. P. Patil Head of Electronics Department Rajarambapu Institute of Technology Sakharale,
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationAgreement on Dual Degree Master Program in Computer Science. Politechnika Warszawska. Technische Universität Berlin
Agreement on Dual Degree Master Program in Computer Science between Politechnika Warszawska Faculty of Electronics and Information Technology and Technische Universität Berlin School of Electrical Engineering
More informationSupervised Learning Evaluation (via Sentiment Analysis)!
Supervised Learning Evaluation (via Sentiment Analysis)! Why Analyze Sentiment? Sentiment Analysis (Opinion Mining) Automatically label documents with their sentiment Toward a topic Aggregated over documents
More informationSchneps, Leila; Colmez, Coralie. Math on Trial : How Numbers Get Used and Abused in the Courtroom. New York, NY, USA: Basic Books, 2013. p i.
New York, NY, USA: Basic Books, 2013. p i. http://site.ebrary.com/lib/mcgill/doc?id=10665296&ppg=2 New York, NY, USA: Basic Books, 2013. p ii. http://site.ebrary.com/lib/mcgill/doc?id=10665296&ppg=3 New
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationCITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理
CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學 Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理 Submitted to Department of Electronic Engineering 電 子 工 程 學 系 in Partial Fulfillment
More informationTai Kam Fong, Jackie. Master of Science in E-Commerce Technology
Trend Following Algorithms in Automated Stock Market Trading by Tai Kam Fong, Jackie Master of Science in E-Commerce Technology 2011 Faculty of Science and Technology University of Macau Trend Following
More informationCS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
More informationData Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent
More informationA secure face tracking system
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 10 (2014), pp. 959-964 International Research Publications House http://www. irphouse.com A secure face tracking
More informationMachine Learning: Overview
Machine Learning: Overview Why Learning? Learning is a core of property of being intelligent. Hence Machine learning is a core subarea of Artificial Intelligence. There is a need for programs to behave
More information269 Business Intelligence Technologies Data Mining Winter 2011. (See pages 8-9 for information about 469)
269 Business Intelligence Technologies Data Mining Winter 2011 (See pages 8-9 for information about 469) University of California, Davis Graduate School of Management Professor Yinghui (Catherine) Yang
More informationDEA implementation and clustering analysis using the K-Means algorithm
Data Mining VI 321 DEA implementation and clustering analysis using the K-Means algorithm C. A. A. Lemos, M. P. E. Lins & N. F. F. Ebecken COPPE/Universidade Federal do Rio de Janeiro, Brazil Abstract
More informationFraming Business Problems as Data Mining Problems
Framing Business Problems as Data Mining Problems Asoka Diggs Data Scientist, Intel IT January 21, 2016 Legal Notices This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS
More informationHow To Get A Computer Science Degree
MAJOR: DEGREE: COMPUTER SCIENCE MASTER OF SCIENCE (M.S.) CONCENTRATIONS: HIGH-PERFORMANCE COMPUTING & BIOINFORMATICS CYBER-SECURITY & NETWORKING The Department of Computer Science offers a Master of Science
More informationImam Mohammad Ibn Saud Islamic University College of Computer and Information Sciences Department of Computer Sciences
1121-1122 In the Name Of Allah, the Most Beneficent, the Most Merciful Imam Mohammad Ibn Saud Islamic University Department of Computer Sciences Program Description of Master of Science in Computer Sciences
More informationTechnical Report. The KNIME Text Processing Feature:
Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG
More informationCUSTOMER RELATIONSHIP MANAGEMENT SYSTEMS IN THE SRI LANKAN HOSPITALITY INDUSTRY FOR SUSTAINED GROWTH AND DEVELOPMENT
CUSTOMER RELATIONSHIP MANAGEMENT SYSTEMS IN THE SRI LANKAN HOSPITALITY INDUSTRY FOR SUSTAINED GROWTH AND DEVELOPMENT MASTER OF BUSINESS ADMINISTRATION IN INFORMATION TECHNOLOGY S M Wijewansa Department
More informationBing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web
More informationData quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
More informationEXTERNAL CRITICAL SUCCESS FACTORS FOR THE GROWTH OF STARTUP SOFTWARE DEVELOPMENT COMPANIES IN SRI LANKA
EXTERNAL CRITICAL SUCCESS FACTORS FOR THE GROWTH OF STARTUP SOFTWARE DEVELOPMENT COMPANIES IN SRI LANKA By J. C. Nanayakkara The dissertation was submitted to the Department of Computer Science and Engineering
More informationWeb Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
More informationARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
More informationClassification Techniques (1)
10 10 Overview Classification Techniques (1) Today Classification Problem Classification based on Regression Distance-based Classification (KNN) Net Lecture Decision Trees Classification using Rules Quality
More informationMachine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
More informationCONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationCustomer satisfaction towards service quality in 5star hotels industry in Paphos
Neapolis University HEPHAESTUS Repository School of Economic Sciences and Business http://hephaestus.nup.ac.cy Master Degree Thesis 2012 Customer satisfaction towards service quality in 5star hotels industry
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
More informationText Analytics Illustrated with a Simple Data Set
CSC 594 Text Mining More on SAS Enterprise Miner Text Analytics Illustrated with a Simple Data Set This demonstration illustrates some text analytic results using a simple data set that is designed to
More informationData Mining with Weka
Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Data Mining with Weka a practical course on how to
More informationIntroduction to Big Data Science
Introduction to Big Data Science 13 th Period Project: Situation Awareness and Statistical Analysis On Big Data Big Data Science 1 Contents What is Situation Awareness (SA)? 3 Levels for SA Role of Data
More informationElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis
ElegantJ BI White Paper The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis Integrated Business Intelligence and Reporting for Performance Management, Operational
More informationASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL
International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR
More informationACKNOWLEDGMENT. I would like to thank Allah for giving me the patience to work hard and overcome all the
ACKNOWLEDGMENT I would like to thank Allah for giving me the patience to work hard and overcome all the research obstacles. My full gratitude is to Dr. Mohammed Al-Jarrah and Dr. Izzat Alsmadi for their
More informationPredicting Flight Delays
Predicting Flight Delays Dieterich Lawson jdlawson@stanford.edu William Castillo will.castillo@stanford.edu Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing
More informationE-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce
More informationMGT/B 296 Business Intelligence Technologies Data Mining Spring 2010
MGT/B 296 Business Intelligence Technologies Data Mining Spring 2010 University of California, Davis Graduate School of Management Professor Yinghui (Catherine) Yang Room 3418, Gallagher Hall, UC Davis
More informationAnalysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati
More informationFootball Match Winner Prediction
Football Match Winner Prediction Kushal Gevaria 1, Harshal Sanghavi 2, Saurabh Vaidya 3, Prof. Khushali Deulkar 4 Department of Computer Engineering, Dwarkadas J. Sanghvi College of Engineering, Mumbai,
More informationData Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over
More informationClassification of Engineering Consultancy Firms Using Self-Organizing Maps: A Scientific Approach
International Journal of Civil & Environmental Engineering IJCEE-IJENS Vol:13 No:03 46 Classification of Engineering Consultancy Firms Using Self-Organizing Maps: A Scientific Approach Mansour N. Jadid
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationApplication of Data mining in Medical Applications
Application of Data mining in Medical Applications by Arun George Eapen A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Applied Science
More informationComparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations
Volume 3, No. 8, August 2012 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at www.jgrcs.info Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationA design of the transcoder to convert the VoiceXML documents into the XHTML+Voice documents
A design of the transcoder to convert the VoiceXML documents into the XHTML+Voice documents JIEUN KIM, JIEUN PARK, JUNSUK PARK, DONGWON HAN Computer & Software Technology Lab, Electronics and Telecommunications
More informationTHE CASE FOR VALUE MANAGEMENT TO BE INCLUDED IN EVERY CONSTRUCTION PROJECT DESIGN PROCESS
THESIS KAKITANGAN THE CASE FOR VALUE MANAGEMENT TO BE INCLUDED IN EVERY CONSTRUCTION PROJECT DESIGN PROCESS By FOTOSTAT TIDAK DIBEMARKAN AINIJAAPAR This dissertation is submitted in partial fulfillment
More informationCorpus Driven Malayalam Text-to-Speech Synthesis for Interactive Voice Response System
Corpus Driven Malayalam Text-to-Speech Synthesis for Interactive Voice Response System Arun Soman, Sachin Kumar S., Hemanth V. K., M. Sabarimalai Manikandan, K. P. Soman Centre for Excellence in Computational
More informationCreating voices for the Festival speech synthesis system.
M. Hood Supervised by A. Lobb and S. Bangay G01H0708 Creating voices for the Festival speech synthesis system. Abstract This project focuses primarily on the process of creating a voice for a concatenative
More informationHow To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
More informationThe University of Jordan
The University of Jordan Master in Web Intelligence Non Thesis Department of Business Information Technology King Abdullah II School for Information Technology The University of Jordan 1 STUDY PLAN MASTER'S
More informationON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION
ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationBIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
More informationA Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model
A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model Twinkle Patel, Ms. Ompriya Kale Abstract: - As the usage of credit card has increased the credit card fraud has also increased
More informationQuality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report
Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report G. Banos 1, P.A. Mitkas 2, Z. Abas 3, A.L. Symeonidis 2, G. Milis 2 and U. Emanuelson 4 1 Faculty
More informationMasters in Information Technology
Computer - Information Technology MSc & MPhil - 2015/6 - July 2015 Masters in Information Technology Programme Requirements Taught Element, and PG Diploma in Information Technology: 120 credits: IS5101
More informationSentiment Analysis of Movie Reviews and Twitter Statuses. Introduction
Sentiment Analysis of Movie Reviews and Twitter Statuses Introduction Sentiment analysis is the task of identifying whether the opinion expressed in a text is positive or negative in general, or about
More informationTABLE OF CONTENTS CHAPTER TITLE PAGE
viii TABLE OF CONTENTS CHAPTER TITLE PAGE TITLE PAGE DECLARATION DEDICATION ACKNOWLEDGEMENT ABSTRACT ABSTRAK TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES LIST OF APPENDICES I II III IV VI VII VIII
More informationStatistical Validation and Data Analytics in ediscovery. Jesse Kornblum
Statistical Validation and Data Analytics in ediscovery Jesse Kornblum Administrivia Silence your mobile Interactive talk Please ask questions 2 Outline Introduction Big Questions What Makes Things Similar?
More informationMACHINE LEARNING BASICS WITH R
MACHINE LEARNING [Hands-on Introduction of Supervised Machine Learning Methods] DURATION 2 DAY The field of machine learning is concerned with the question of how to construct computer programs that automatically
More informationIT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
More informationMANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL
MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL G. Maria Priscilla 1 and C. P. Sumathi 2 1 S.N.R. Sons College (Autonomous), Coimbatore, India 2 SDNB Vaishnav College
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationAUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
More informationBisecting K-Means for Clustering Web Log data
Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining
More information