A CLOSED DOMAIN TEXT-TO-SPEECH SYNTHESIS SYSTEM FOR ARABIC LANGUAGE USING DATA MINING CLASSIFICATION TECHNIQUE

Size: px
Start display at page:

Download "A CLOSED DOMAIN TEXT-TO-SPEECH SYNTHESIS SYSTEM FOR ARABIC LANGUAGE USING DATA MINING CLASSIFICATION TECHNIQUE"

Transcription

1 A CLOSED DOMAIN TEXT-TO-SPEECH SYNTHESIS SYSTEM FOR ARABIC LANGUAGE USING DATA MINING CLASSIFICATION TECHNIQUE

2 A Closed Domain Text-To-Speech Synthesis System for Arabic Language using Data Mining Classification Technique by Maisa M. Al-Khudair Dr. Natheer Y. Khasawneh (Advisor) Thesis submitted in partial fulfillment of the requirements for the degree of M.Sc. in Computer Engineering At The Faculty of Graduate Studies Jordan University of Science and Technology August, 2009

3 A CLOSED DOMAIN TEXT-TO-SPEECH SYNTHESIS SYSTEM FOR ARABIC LANGUAGE USING DATA MINING CLASSIFICATION TECHNIQUE by Maisa M. Al-Khudair Signature of Author Committee Member Signature and Date Dr. Natheer Y. Khasawneh (Chairman).... Dr. Hassan Najadat (Member)... Dr. Mohammad A. Al-Jarrah (External Examiner, YU)... August, 2009

4 DEDICATION To my parents and my family. i

5 ACKNOWLEDGMENTS First of all, thanks to ALLAH who gave me the power, the effort, and the enthusiasm to start and complete this work. I would like to thank my supervisor Dr. Natheer Khaswaneh for his support, encouragement, valuable hints, and most importantly for his patience with me. Also, thanks go to the discussion committee members: Dr. Natheer Khaswaneh, Dr. Hassan Najadat and Dr. Mohammad Al-Jarrah for accepting to participate and for their constructive comments and suggestions. Thanking my parents would not be enough, thanks for your support and patience with me. I would like to give my thanks for both of you: Noor and Arwa for your ultimate support and care. Also I would like to especially thank: Mr. Harri Saarikoski for his great help; Ms. Amani Tawalbeh from Yarmouk FM radio stuff represented by general manager Mr. Bashar Qabalan and vice manager Mr. Al-Samadi.I would like to thanks also Amman Net radio stuff also especially Mr. Mohammad Al-Ersan for their great assistance. Finally, I would like to thank all my friends; and my friends from JUST also for their endless support, and for being with me in the hard situations that I have faced during the work in this thesis. ii

6 TABLE OF CONTENTS Title Page DEDICATION i ACKNOWLEDGMENTS ii TABLE OF CONTENTS..... iii LIST OF FIGURES vi LIST OF TABLES..... viii LIST OF APPENDICES ix ABSTRACT x Chapter One: Introduction Overview of Text-To-Speech System Motivation and Thesis Methodology Thesis Outline... 3 Chapter Two: Related Work Text-To-Speech System Text -To-Speech System Architecture Concatenative Synthesis Formant Synthesis Corpus-Based Synthesis Speech Database Reduction Diphone Unit Word and Syllable Units Multi-form Units Arabic Language Database Text Categorization Pre-processing Document Text Categorization Algorithms iii

7 2.3.3 K-Nearest Neighbor Classifier (KNN) Support Vector Machines (SVM) Decision Tress Constructing C4.5 Classifier Neural Networks Arabic Language Arabic Text-To-Speech Systems Arabic Text Categorization Chapter Three: Architecture of a Closed Domain Text- 30 To-Speech Synthesis System for Arabic Language using Data Mining Clustering Technique. 3.1 Problem Definition The Proposed Approach Arabic Text Categorization Collecting Text Data Document Preprocessing Word Level Vector Generating WEKA Results Arabic Text-To-Speech System Building Building Speech Database Voice Waveform Selection Voice Waveform Concatenation Chapter Four: Testing the Arabic Text-To-Speech System Test Group Method Test and Evaluation Results Perception of the Sentences iv

8 Perception of the Sentences using Word Correctness Business Category Test Politics Category Test Sport Category Test System Test Perception of the Sentences using Precision and Recall Measures Business Category Test Politics Category Test Sport Category Test System Test Results of Mean Opinion Score Method Naturalness Speed Sound Quality Pronunciation Intelligibility Stress/Intonation Chapter Five: Conclusions and Future Work References Appendix A: Test Questionnaire Appendix B: Arabic Speech Test Sentences Samples Appendix C: Test Listening Results Sample Arabic Abstract v

9 LIST OF FIGURES Figure Description Page 2.1 Text-to-speech system architecture Corpus-based synthesis components The proposed system architecture Arabic text categorization process steps An example of document preprocessing Sample arff file that considers using tf *idf weighting scheme Confusion matrix of J48 classifier WEKA J48 classification tree Results of classifying test data documents using four weighting 41 schemes 3.8 Speech synthesis process Steps of sounds concatenation method Perception of business sentences-first and second listening Perception of politics sentences-first and second listening Perception of sport sentences-first and second listening Perception of system sentences-first and second listening Precision and recall measures for the perception of business sentences-first and second listening 4.6 Precision and recall measures for the perception of politics sentences-first and second listening 4.7 Precision and recall measures for the perception of sport sentencesfirst and second listening 4.8 Precision and recall measures for the perception of system sentences-first and second listening 4.9 Naturalness of the voice Speed of the speech Quality of sound Pronunciation mistakes Pronunciation effect on understanding some words Concentration level needed for understanding vi

10 4.15 Annoying level of the speech Voice understanding The difficulty in understanding the voice The intoation of the system The stress of the system The intonation differences of the system 67 vii

11 LIST OF TABLES Table Description Page 3.1 Text documents distribution of three categories Arabic stop words samples Results of classifying test data documents using four weighting methods Results of reducing words number of three categories using various 43 threshold values 4.1 Age distribution of the listeners Listeners remarks on system pronouciation 63 viii

12 LIST OF APPENDICES Appendix Description Page A Test Questionnaire 75 B Arabic Speech Test Sentences Samples 82 C Test Listening Results Sample 83 ix

13 ABSTRACT A CLOSED DOMAIN TEXT-TO-SPEECH SYNTHESIS SYSTEM FOR ARABIC LANGUAGE USING DATA MINING CLASSIFICATION TECHNIQUE by Maisa M. Al-Khudair Interest in text-to-speech systems has been increasing lately due to the fact that it is becoming more important in different areas like: entertainment, helping handicapped people and improving the machine-human interaction [1]. Text-to-speech systems are implemented using different speech size units like diphones and syllables. A corpus-based system which builds speech using concatenating pre-recorded sounds has proven to be the best text-to-speech system because it gives high intelligibility and produces more natural sound. The complexity of processing the speech storage increases due to the fact that more memory size is needed to store the pre-recorded sounds. In this thesis, we present a text-to-speech system that evaluates the effect of reducing the speech database inventory size for an Arabic text-to-speech system. Data mining technique was used to generate a classification model that is used to classify new incoming text documents into the system and to determine the "bag of words" that would be used to build the speech corpus for the overall system which includes three main domains (i.e. categories): business, politics and sport. In order to evaluate the effect of database reduction of speech sounds on listeners; we had grouped the speech database sounds into distinct groups that contains different number of recorded sound files. The lower the number of sound waves in the group; the smaller the speech database size would be. When system tends at generating new sound sentence; the database speech groups are searched first in order to detect if a pre-recorded sound file that corresponds to required word in sentence exists. If the sound file is not found; a signal processing tool is used to synthesize the missing sound wave. The output sound sentence is generated by concatenating both: pre-recorded and synthesized sound waves together. A subjective test on listeners was done and results showed that decreasing the database size by 29% of speech size reduces the recognition correctness by 0.57% while decreasing it by 86% of the size reduces the recognition process by 1.29% of correctness. x

14 Chapter One: Introduction 1.1 Overview of Text-To-Speech System Interest in text-to-speech systems had begun since 1960's where a lot of work in this field of language technology had presented starting with Bell's Labs research and ending up with Microsoft Agent which represents commercial systems. A text-to-speech system can be defined as the system that has written text in any language as an input and produces speech sounds corresponding to the input text as an output [1, 2]. Text-to-speech systems that consider using corpus based approach had gained more interest in the last decade [22]. A corpus based system depends on concatenating prerecorded sounds in order to convert written text into speech sounds that listeners can understand. Corpus based speech synthesized systems proved to give higher naturalness and intelligibility. A drawback of these systems is the larger database storage (i.e. inventory) size that these systems need to accommodate the increasing number of prerecording sounds to be stored. Also, more processing is required which makes these systems more costly [22]. Work regarding speech database reduction was introduced by many researchers. This work includes many languages like Thai and English languages [9, 13]. While some of these works were evaluated using subjective and objective tests; a similar work that considers database reduction for Arabic language could not be found. Existing speech database reduction approaches considered using various speech size units (i.e. diphones, syllables and words) which led into different results. While some researchers concluded 1

15 that larger database corpus size would provide higher text-to-speech sound quality, others claimed that such conclusions are not necessary correct [9]. The necessity to build an Arabic text-to-speech system which implements the usage of corpus-based synthesized system and considers reducing speech database size would be a challenging task to investigate. This implementation gains more importance when no previous studies exist in the literature. 1.2 Motivation and Thesis Methodology Text-to-speech systems in general are getting an increasing attention widely around the world since many languages had been considered by text-to-speech systems (i.e. English, French and German). Trying to add Arabic to such languages and improving the quality of produced voices could be an important improvement to such systems. Developing an Arabic text-to-speech system with high quality output voice would be useful for special category of society like handicapped. It would make many written text documents that media sites produce (i.e. news, weather and sport web sites) reachable. Minimizing the databases size that is used to store sound speech units is a target in limited-size devices such as Hand Hold devices and PDA's. Minimizing database size can be affected by the size of units selection that are used in synthesis process [9]. Reducing space used to store speech data can make the use of text-to-speech systems in limited-size devices reachable. No previous work that deals with reducing speech database for Arabic language was found in literature. In this thesis, we produce a solution to this problem that considers using the data classification technique to process the text data and produce the necessary 2

16 Appendix B: Arabic Speech Test Sentences Samples Category Sentence Business Business Politics Politics Sport Sport قال قال تمثل مغنية تامة حتى على نسبة صالح في فيفا كامل ما على خاصة في ل جميل بي 82

17 Appendix C: Test Listening Results Sample 1st Listening (Politics) ExpNo UserID Category FileID Variation Correct None Absent Percent Correct 1 One Politics 2_7_ % 2 Two Politics 5_4_ % 3 Three Politics 1_1_ % 4 Four Politics 4_5_ % 5 Five Politics 7_2_ % 6 Six Politics 3_6_ % 7 Seven Politics 6_3_ % 8 One Politics 2_1_ % 9 Two Politics 5_5_ % 10 Three Politics 1_2_ % 11 Four Politics 4_6_ % 12 Five Politics 7_3_ % 13 Six Politics 3_7_ % 14 Seven Politics 6_4_ % 15 One Politics 2_2_ % 16 Two Politics 5_6_ % 17 Three Politics 1_3_ % 18 Four Politics 4_7_ % 19 Five Politics 7_4_ % 20 Six Politics 3_1_ % 21 Seven Politics 6_5_ % 22 One Politics 2_3_ % 23 Two Politics 5_7_ % 24 Three Politics 1_4_ % 25 Four Politics 4_1_ % 26 Five Politics 7_5_ % 27 Six Politics 3_2_ % 28 Seven Politics 6_6_ % 29 One Politics 2_4_ % 30 Two Politics 5_1_ % 31 Three Politics 1_5_ % 32 Four Politics 4_2_ % 33 Five Politics 7_6_ % 34 Six Politics 3_3_ % 35 Seven Politics 6_7_ % 36 One Politics 2_5_ % 37 Two Politics 5_2_ % 38 Three Politics 1_6_ % 39 Four Politics 4_3_ % 40 Five Politics 7_7_ % 41 Six Politics 3_4_ % 42 Seven Politics 6_1_ % 83

18 1st Listening (Politics) ExpNo UserID Category FileID Variation Correct None Absent Percent Correct 43 One Politics 2_6_ % 44 Two Politics 5_3_ % 45 Three Politics 1_7_ % 46 Four Politics 4_4_ % 47 Five Politics 7_1_ % 48 Six Politics 3_5_ % 49 Seven Politics 6_2_ % 84

An Arabic Text-To-Speech System Based on Artificial Neural Networks

An Arabic Text-To-Speech System Based on Artificial Neural Networks Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department

More information

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,

More information

Thirukkural - A Text-to-Speech Synthesis System

Thirukkural - A Text-to-Speech Synthesis System Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,

More information

Text-To-Speech Technologies for Mobile Telephony Services

Text-To-Speech Technologies for Mobile Telephony Services Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary

More information

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

DYNAMIC GRAPH ANALYSIS FOR LOAD BALANCING APPLICATIONS

DYNAMIC GRAPH ANALYSIS FOR LOAD BALANCING APPLICATIONS DYNAMIC GRAPH ANALYSIS FOR LOAD BALANCING APPLICATIONS DYNAMIC GRAPH ANALYSIS FOR LOAD BALANCING APPLICATIONS by Belal Ahmad Ibraheem Nwiran Dr. Ali Shatnawi Thesis submitted in partial fulfillment of

More information

Data Mining: A Preprocessing Engine

Data Mining: A Preprocessing Engine Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,

More information

Maschinelles Lernen mit MATLAB

Maschinelles Lernen mit MATLAB Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical

More information

Course Syllabus. Purposes of Course:

Course Syllabus. Purposes of Course: Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING Practical Applications of DATA MINING Sang C Suh Texas A&M University Commerce r 3 JONES & BARTLETT LEARNING Contents Preface xi Foreword by Murat M.Tanik xvii Foreword by John Kocur xix Chapter 1 Introduction

More information

1. Classification problems

1. Classification problems Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification

More information

How To Solve The Kd Cup 2010 Challenge

How To Solve The Kd Cup 2010 Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

More information

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

More information

Advice for Students completing the B.S. degree in Computer Science based on Quarters How to Satisfy Computer Science Related Electives

Advice for Students completing the B.S. degree in Computer Science based on Quarters How to Satisfy Computer Science Related Electives Advice for Students completing the B.S. degree in Computer Science based on Quarters How to Satisfy Computer Science Related Electives Students completing their B.S. degree under quarters had a requirement

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,

More information

Wireless Remote Monitoring System for ASTHMA Attack Detection and Classification

Wireless Remote Monitoring System for ASTHMA Attack Detection and Classification Department of Telecommunication Engineering Hijjawi Faculty for Engineering Technology Yarmouk University Wireless Remote Monitoring System for ASTHMA Attack Detection and Classification Prepared by Orobh

More information

Master s Program in Information Systems

Master s Program in Information Systems The University of Jordan King Abdullah II School for Information Technology Department of Information Systems Master s Program in Information Systems 2006/2007 Study Plan Master Degree in Information Systems

More information

Table of Contents. Chapter No. 1 Introduction 1. iii. xiv. xviii. xix. Page No.

Table of Contents. Chapter No. 1 Introduction 1. iii. xiv. xviii. xix. Page No. Table of Contents Title Declaration by the Candidate Certificate of Supervisor Acknowledgement Abstract List of Figures List of Tables List of Abbreviations Chapter Chapter No. 1 Introduction 1 ii iii

More information

Automated News Item Categorization

Automated News Item Categorization Automated News Item Categorization Hrvoje Bacan, Igor S. Pandzic* Department of Telecommunications, Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia {Hrvoje.Bacan,Igor.Pandzic}@fer.hr

More information

Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data

Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data Proceedings of Student-Faculty Research Day, CSIS, Pace University, May 2 nd, 2014 Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition

More information

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition Brochure More information from http://www.researchandmarkets.com/reports/2170926/ Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd

More information

Enhancing Quality of Data using Data Mining Method

Enhancing Quality of Data using Data Mining Method JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN 25-967 WWW.JOURNALOFCOMPUTING.ORG 9 Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE

TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE Sangam P. Borkar M.E. (Electronics)Dissertation Guided by Prof. S. P. Patil Head of Electronics Department Rajarambapu Institute of Technology Sakharale,

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Agreement on Dual Degree Master Program in Computer Science. Politechnika Warszawska. Technische Universität Berlin

Agreement on Dual Degree Master Program in Computer Science. Politechnika Warszawska. Technische Universität Berlin Agreement on Dual Degree Master Program in Computer Science between Politechnika Warszawska Faculty of Electronics and Information Technology and Technische Universität Berlin School of Electrical Engineering

More information

Supervised Learning Evaluation (via Sentiment Analysis)!

Supervised Learning Evaluation (via Sentiment Analysis)! Supervised Learning Evaluation (via Sentiment Analysis)! Why Analyze Sentiment? Sentiment Analysis (Opinion Mining) Automatically label documents with their sentiment Toward a topic Aggregated over documents

More information

Schneps, Leila; Colmez, Coralie. Math on Trial : How Numbers Get Used and Abused in the Courtroom. New York, NY, USA: Basic Books, 2013. p i.

Schneps, Leila; Colmez, Coralie. Math on Trial : How Numbers Get Used and Abused in the Courtroom. New York, NY, USA: Basic Books, 2013. p i. New York, NY, USA: Basic Books, 2013. p i. http://site.ebrary.com/lib/mcgill/doc?id=10665296&ppg=2 New York, NY, USA: Basic Books, 2013. p ii. http://site.ebrary.com/lib/mcgill/doc?id=10665296&ppg=3 New

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理

CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理 CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學 Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理 Submitted to Department of Electronic Engineering 電 子 工 程 學 系 in Partial Fulfillment

More information

Tai Kam Fong, Jackie. Master of Science in E-Commerce Technology

Tai Kam Fong, Jackie. Master of Science in E-Commerce Technology Trend Following Algorithms in Automated Stock Market Trading by Tai Kam Fong, Jackie Master of Science in E-Commerce Technology 2011 Faculty of Science and Technology University of Macau Trend Following

More information

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

More information

A secure face tracking system

A secure face tracking system International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 10 (2014), pp. 959-964 International Research Publications House http://www. irphouse.com A secure face tracking

More information

Machine Learning: Overview

Machine Learning: Overview Machine Learning: Overview Why Learning? Learning is a core of property of being intelligent. Hence Machine learning is a core subarea of Artificial Intelligence. There is a need for programs to behave

More information

269 Business Intelligence Technologies Data Mining Winter 2011. (See pages 8-9 for information about 469)

269 Business Intelligence Technologies Data Mining Winter 2011. (See pages 8-9 for information about 469) 269 Business Intelligence Technologies Data Mining Winter 2011 (See pages 8-9 for information about 469) University of California, Davis Graduate School of Management Professor Yinghui (Catherine) Yang

More information

DEA implementation and clustering analysis using the K-Means algorithm

DEA implementation and clustering analysis using the K-Means algorithm Data Mining VI 321 DEA implementation and clustering analysis using the K-Means algorithm C. A. A. Lemos, M. P. E. Lins & N. F. F. Ebecken COPPE/Universidade Federal do Rio de Janeiro, Brazil Abstract

More information

Framing Business Problems as Data Mining Problems

Framing Business Problems as Data Mining Problems Framing Business Problems as Data Mining Problems Asoka Diggs Data Scientist, Intel IT January 21, 2016 Legal Notices This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS

More information

How To Get A Computer Science Degree

How To Get A Computer Science Degree MAJOR: DEGREE: COMPUTER SCIENCE MASTER OF SCIENCE (M.S.) CONCENTRATIONS: HIGH-PERFORMANCE COMPUTING & BIOINFORMATICS CYBER-SECURITY & NETWORKING The Department of Computer Science offers a Master of Science

More information

Imam Mohammad Ibn Saud Islamic University College of Computer and Information Sciences Department of Computer Sciences

Imam Mohammad Ibn Saud Islamic University College of Computer and Information Sciences Department of Computer Sciences 1121-1122 In the Name Of Allah, the Most Beneficent, the Most Merciful Imam Mohammad Ibn Saud Islamic University Department of Computer Sciences Program Description of Master of Science in Computer Sciences

More information

Technical Report. The KNIME Text Processing Feature:

Technical Report. The KNIME Text Processing Feature: Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG

More information

CUSTOMER RELATIONSHIP MANAGEMENT SYSTEMS IN THE SRI LANKAN HOSPITALITY INDUSTRY FOR SUSTAINED GROWTH AND DEVELOPMENT

CUSTOMER RELATIONSHIP MANAGEMENT SYSTEMS IN THE SRI LANKAN HOSPITALITY INDUSTRY FOR SUSTAINED GROWTH AND DEVELOPMENT CUSTOMER RELATIONSHIP MANAGEMENT SYSTEMS IN THE SRI LANKAN HOSPITALITY INDUSTRY FOR SUSTAINED GROWTH AND DEVELOPMENT MASTER OF BUSINESS ADMINISTRATION IN INFORMATION TECHNOLOGY S M Wijewansa Department

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

EXTERNAL CRITICAL SUCCESS FACTORS FOR THE GROWTH OF STARTUP SOFTWARE DEVELOPMENT COMPANIES IN SRI LANKA

EXTERNAL CRITICAL SUCCESS FACTORS FOR THE GROWTH OF STARTUP SOFTWARE DEVELOPMENT COMPANIES IN SRI LANKA EXTERNAL CRITICAL SUCCESS FACTORS FOR THE GROWTH OF STARTUP SOFTWARE DEVELOPMENT COMPANIES IN SRI LANKA By J. C. Nanayakkara The dissertation was submitted to the Department of Computer Science and Engineering

More information

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

More information

Classification Techniques (1)

Classification Techniques (1) 10 10 Overview Classification Techniques (1) Today Classification Problem Classification based on Regression Distance-based Classification (KNN) Net Lecture Decision Trees Classification using Rules Quality

More information

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

More information

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19 PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Customer satisfaction towards service quality in 5star hotels industry in Paphos

Customer satisfaction towards service quality in 5star hotels industry in Paphos Neapolis University HEPHAESTUS Repository School of Economic Sciences and Business http://hephaestus.nup.ac.cy Master Degree Thesis 2012 Customer satisfaction towards service quality in 5star hotels industry

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

Text Analytics Illustrated with a Simple Data Set

Text Analytics Illustrated with a Simple Data Set CSC 594 Text Mining More on SAS Enterprise Miner Text Analytics Illustrated with a Simple Data Set This demonstration illustrates some text analytic results using a simple data set that is designed to

More information

Data Mining with Weka

Data Mining with Weka Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Data Mining with Weka a practical course on how to

More information

Introduction to Big Data Science

Introduction to Big Data Science Introduction to Big Data Science 13 th Period Project: Situation Awareness and Statistical Analysis On Big Data Big Data Science 1 Contents What is Situation Awareness (SA)? 3 Levels for SA Role of Data

More information

ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis

ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis ElegantJ BI White Paper The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis Integrated Business Intelligence and Reporting for Performance Management, Operational

More information

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR

More information

ACKNOWLEDGMENT. I would like to thank Allah for giving me the patience to work hard and overcome all the

ACKNOWLEDGMENT. I would like to thank Allah for giving me the patience to work hard and overcome all the ACKNOWLEDGMENT I would like to thank Allah for giving me the patience to work hard and overcome all the research obstacles. My full gratitude is to Dr. Mohammed Al-Jarrah and Dr. Izzat Alsmadi for their

More information

Predicting Flight Delays

Predicting Flight Delays Predicting Flight Delays Dieterich Lawson jdlawson@stanford.edu William Castillo will.castillo@stanford.edu Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing

More information

E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

More information

MGT/B 296 Business Intelligence Technologies Data Mining Spring 2010

MGT/B 296 Business Intelligence Technologies Data Mining Spring 2010 MGT/B 296 Business Intelligence Technologies Data Mining Spring 2010 University of California, Davis Graduate School of Management Professor Yinghui (Catherine) Yang Room 3418, Gallagher Hall, UC Davis

More information

Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News

Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati

More information

Football Match Winner Prediction

Football Match Winner Prediction Football Match Winner Prediction Kushal Gevaria 1, Harshal Sanghavi 2, Saurabh Vaidya 3, Prof. Khushali Deulkar 4 Department of Computer Engineering, Dwarkadas J. Sanghvi College of Engineering, Mumbai,

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Classification of Engineering Consultancy Firms Using Self-Organizing Maps: A Scientific Approach

Classification of Engineering Consultancy Firms Using Self-Organizing Maps: A Scientific Approach International Journal of Civil & Environmental Engineering IJCEE-IJENS Vol:13 No:03 46 Classification of Engineering Consultancy Firms Using Self-Organizing Maps: A Scientific Approach Mansour N. Jadid

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Application of Data mining in Medical Applications

Application of Data mining in Medical Applications Application of Data mining in Medical Applications by Arun George Eapen A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Applied Science

More information

Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations Volume 3, No. 8, August 2012 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at www.jgrcs.info Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

A design of the transcoder to convert the VoiceXML documents into the XHTML+Voice documents

A design of the transcoder to convert the VoiceXML documents into the XHTML+Voice documents A design of the transcoder to convert the VoiceXML documents into the XHTML+Voice documents JIEUN KIM, JIEUN PARK, JUNSUK PARK, DONGWON HAN Computer & Software Technology Lab, Electronics and Telecommunications

More information

THE CASE FOR VALUE MANAGEMENT TO BE INCLUDED IN EVERY CONSTRUCTION PROJECT DESIGN PROCESS

THE CASE FOR VALUE MANAGEMENT TO BE INCLUDED IN EVERY CONSTRUCTION PROJECT DESIGN PROCESS THESIS KAKITANGAN THE CASE FOR VALUE MANAGEMENT TO BE INCLUDED IN EVERY CONSTRUCTION PROJECT DESIGN PROCESS By FOTOSTAT TIDAK DIBEMARKAN AINIJAAPAR This dissertation is submitted in partial fulfillment

More information

Corpus Driven Malayalam Text-to-Speech Synthesis for Interactive Voice Response System

Corpus Driven Malayalam Text-to-Speech Synthesis for Interactive Voice Response System Corpus Driven Malayalam Text-to-Speech Synthesis for Interactive Voice Response System Arun Soman, Sachin Kumar S., Hemanth V. K., M. Sabarimalai Manikandan, K. P. Soman Centre for Excellence in Computational

More information

Creating voices for the Festival speech synthesis system.

Creating voices for the Festival speech synthesis system. M. Hood Supervised by A. Lobb and S. Bangay G01H0708 Creating voices for the Festival speech synthesis system. Abstract This project focuses primarily on the process of creating a voice for a concatenative

More information

How To Use Neural Networks In Data Mining

How To Use Neural Networks In Data Mining International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

The University of Jordan

The University of Jordan The University of Jordan Master in Web Intelligence Non Thesis Department of Business Information Technology King Abdullah II School for Information Technology The University of Jordan 1 STUDY PLAN MASTER'S

More information

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

More information

A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model

A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model Twinkle Patel, Ms. Ompriya Kale Abstract: - As the usage of credit card has increased the credit card fraud has also increased

More information

Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report

Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report G. Banos 1, P.A. Mitkas 2, Z. Abas 3, A.L. Symeonidis 2, G. Milis 2 and U. Emanuelson 4 1 Faculty

More information

Masters in Information Technology

Masters in Information Technology Computer - Information Technology MSc & MPhil - 2015/6 - July 2015 Masters in Information Technology Programme Requirements Taught Element, and PG Diploma in Information Technology: 120 credits: IS5101

More information

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction Sentiment Analysis of Movie Reviews and Twitter Statuses Introduction Sentiment analysis is the task of identifying whether the opinion expressed in a text is positive or negative in general, or about

More information

TABLE OF CONTENTS CHAPTER TITLE PAGE

TABLE OF CONTENTS CHAPTER TITLE PAGE viii TABLE OF CONTENTS CHAPTER TITLE PAGE TITLE PAGE DECLARATION DEDICATION ACKNOWLEDGEMENT ABSTRACT ABSTRAK TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES LIST OF APPENDICES I II III IV VI VII VIII

More information

Statistical Validation and Data Analytics in ediscovery. Jesse Kornblum

Statistical Validation and Data Analytics in ediscovery. Jesse Kornblum Statistical Validation and Data Analytics in ediscovery Jesse Kornblum Administrivia Silence your mobile Interactive talk Please ask questions 2 Outline Introduction Big Questions What Makes Things Similar?

More information

MACHINE LEARNING BASICS WITH R

MACHINE LEARNING BASICS WITH R MACHINE LEARNING [Hands-on Introduction of Supervised Machine Learning Methods] DURATION 2 DAY The field of machine learning is concerned with the question of how to construct computer programs that automatically

More information

IT services for analyses of various data samples

IT services for analyses of various data samples IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical

More information

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL G. Maria Priscilla 1 and C. P. Sumathi 2 1 S.N.R. Sons College (Autonomous), Coimbatore, India 2 SDNB Vaishnav College

More information

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

Bisecting K-Means for Clustering Web Log data

Bisecting K-Means for Clustering Web Log data Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining

More information