Meta-Interpretive Learning of Data Transformation Programs
|
|
|
- Charity Cox
- 10 years ago
- Views:
Transcription
1 Meta-Interpretive Learning of Data Transformation Programs Andrew Cropper, Alireza Tamaddoni-Nezhad, and Stephen H. Muggleton Department of Computing, Imperial College London Abstract. Data Transformation is an important part of data curation, which involves the maintenance of research data on a long-term basis with the aim of allowing its re-use. This activity, wide-spread in commercial and academic data analytics projects, is labour intensive, involving the manual construction and debugging of large numbers of small, special purpose data transformation programs. This paper investigates learning data transformation programs from examples provided by the user. Though small, such programs can be relatively complex, involving problem decomposition, repetition and recognition of context. These features together with the need to learn from a handful of user examples make this a hard task for standard ILP systems. We use a Meta-Interpretive Learning approach which supports these various requirements through predicate invention and the learning of recursive programs from small numbers of examples. In our approach we develop a general technique for applying Metagol DF to both semi-structured and unstructured data. Our experiments are conducted on tasks involving semi-structured medical patient data and un-structured ecological text. The results indicate that high levels of predictive accuracy can be achieved in these tasks from small number of training examples. 1 Introduction Suppose you are given a large number of patient records in a semi-structured format and are required to transform them to a given structured format as shown in Fig. 1. In order to avoid manually editing the transformations, you might decide to write a small program to extract the relevant fields from the input and produce them in the output. Fig. 1c illustrates the kind of Prolog transformation program for this task which can be generated automatically from a small number of examples such as those in Fig. 1a (see Section 5). In this program, predicate invention is used to introduce f1 f2 and the primitive background predicates find patient id/2, find int/2, and open interval/4 are used to identify the various fields to be transposed from input to output. P year lung disease: n/a, Diagnosis: Unknown P Diagnosis: carcinoma, lung disease: unknown P Diagnosis: pneumonia 55.9 (a) Input P Unknown P carcinoma P pneumonia (b) Output f(a,b):- f2(a,c), f1(c,b). f2(a,b):- find patient id(a,c), find int(c,b). f1(a,b):- open interval(a,b,[ :, ],[, n ]). f1(a,b):- open interval(a,b,[ :, ],[,, ]). (c) Learned program Fig. 1: Transformation of medical records from semi-structured format This paper uses the recently developed Meta-Interpretive Learning (MIL) framework [9, 3, 4]. MIL differs from most state-of-the-art ILP approaches by supporting predicate invention for problem decomposition
2 2 and the learning of recursive programs. MIL has [10] been shown to be capable of learning a simple robot strategy for building a stable wall from a small set of initial/final state pairs. The work in this paper uses the same general approach as that employed in [7] for building string transformation programs. Here we look at the more general problem of learning data transformation programs. The paper is arranged as follows. Section 2 describes related work. Section 3 describes the theoretical framework. Section 4 describes the framework implementation. Section 5 describes experiments in ecological and medical domains. Finally, Section 6 concludes the paper and details future work. 2 Related work In [2] the authors compared statistical and relational methods to extract facts from a MEDLINE corpus. The primary limitation of the statistical approach, they state, is its inability to express the linguistic structure of the text; by contrast, the relational approach allows these features to be encoded as background knowledge, as parse trees, specifically. The relational approach used an ILP system similar to FOIL [11] to learn extraction rules. Similar works include [5], who used the ILP system Aleph [13] to learn rules to extract relations from a MEDLINE corpus; and [1], who used the ILP system FOIL to learn rules to extract relations from Nature and New Scientist articles. These works focused on constructing the appropriate problem representation, including determining the necessary linguistic features to be included in the background knowledge. In contrast to these approaches, we use the state-of-the-art ILP system Metagol DF, which supports predicate invention, the learning of recursive theories, and positive-only learning, none of which is supported by FOIL nor Aleph. FlashExtract [6] is a framework to extract fields from documents, such as text files and web pages. In this framework, the user highlights one or two examples of each field in document and FlashExtract attempts to extract all other instances of such fields, arranging them in a structured format, such as a table. FlashExtract uses an inductive synthesis algorithm to synthesise the extraction program using a domain-specific language built upon a pre-specified algebra of few core operators (map, filter, merge, and pair). In contrast to FlashExtact, our approach allows for the inclusion of background knowledge and the learning of transformation operators [?]. Wu and Knoblock [15] recently describe a Programming-by-Example technique which iteratively learns data transformation programs by example. While the overall aims of their approach are similar to ours, their approach does not support automatic problem decomposition using predicate invention. 3 Framework MIL [9, 10] is a form of ILP based on an adapted Prolog meta-interpreter. Whereas a standard Prolog meta-interpreter attempts to prove a goal by repeatedly fetching first-order clauses whose heads unify with a given goal, a MIL learner attempts to prove a set of goals by repeatedly fetching higher-order metarules (Fig. 2b) whose heads unify with a given goal. The resulting meta-substitutions are saved in an abduction store, and can be re-used in later proofs. Following the proof of a set of goals, a hypothesis is formed by applying the meta-substitutions onto their corresponding metarules, allowing for a form of ILP which supports predicate invention and the learning of recursive theories. General formal framework In the general framework for data transformation we assumes that the user provides examples E of how data should be transformed. Each example e E consists of a pair d1, d2 where d1 D1 and d2 D2 are input and output data records respectively. Given background knowledge B, in the form of existing transformations and the user-provided examples E the aim of the learning is to generate a transformational function τ : D1 D2 such that B, τ = E.
3 4 Implementation Fig. 2a shows Metagol DF [7], an implementation of the MIL framework, similar in form to a standard Prolog meta-interpreter. The metarule set (Fig. 2b) is defined separately. 3 prove([], P rog, P rog). prove([atom As], P rog1, P rog2) : metarule(name, MetaSub, (Atom :- Body), Order), Order, abduce(metasub(name, MetaSub), P rog1, P rog3), prove(body, P rog3, P rog4), prove(as, P rog4, P rog2). (a) Prolog code for generalised meta-interpreter Name Metarule Order Base P (x, y) Q(x, y) P Q Chain P (x, y) Q(x, z), R(z, y) P Q, P R Curry P (x, y) Q(x, y, c 1, c 2 ) P Q TailRec P (x, y) Q(x, z), P (z, y) P Q, x z y (b) Metarules with associated ordering constraints, where is a pre-defined ordering over symbols in the signature. The letters P, Q, and R denote existentially quantified higherorder variables; x, y, and z denote universally quantified firstorder variables; and c 1 and c 2 denote existentially quantified first-order variables. Fig. 2: Metagol DF meta-interpreter (a) and metarules (b) 4.1 Transformation language The transformation framework 1 includes three predicates: find sublist/3, closed interval/4, and open interval/4. These predicates transform an input state to an output state. The find sublist/3 predicate finds a complete sublist in an input and writes it to the output. The closed interval/4 and open interval/4 predicates find a sublist denoted by start and end delimiters, and write it to the output. For example, if the input = [i,n,d,u,c,t,i,o,n], start delimiter = [n,d], and end delimiter = [t,i], then the predicate open interval/4 predicate writes [u,c] to the output, and the predicate closed interval/4 writes [n,d,u,c,t,i] to the output. The delimiters are the set of all sublists of length k in an input. The value of k is pre-specified as part of the background knowledge, and, as the experiments demonstrate in Section 5, affects the predictive accuracy of the learner. 5 Experiments This section details experiments in learning data transformation programs. We test the following null hypothesis: Metagol DF cannot learn data transformation programs with higher than default predictive accuracies. To test this hypothesis, we apply our framework in two domains: ecological scholarly papers, and medical patient records. In all experiments, we use the Chain and Curry metarules (Fig. 2b). 5.1 Ecological scholarly papers In this experiment, the aim is to learn programs to extract relations from natural language taken from ecological scholarly papers. Materials The dataset contains 32 positive real-world examples taken from ecological scholarly papers (adopted from [14]) of a natural language sentence paired with a list of relations to be extracted. Fig. 3 shows two example pairings, and the partial background knowledge used in the experiment, where natural language sentences are represented as lists of terms. We provide a complete set of known species, but we do not provide predation terms in the background knowledge, so Metagol must use the general purpose background predicates, described in Section 4, to learn a solution. To generate negative testing examples, we create a frequency distribution over input lengths from the training 1 Full code and experimental data are available at
4 4 examples and a frequency distribution over words and punctuation in the training examples. To create a negative testing example input, we randomly select a length n from the length distribution and then randomly select with replacement n words or punctuation characters. To create the negative testing output, we randomly select without replacement three words from the input. Harpalus rufipes eats large prey such as Lepidoptera. Bembidion lampros : In cereals the main food was Collembola. (a) Input examples Harpalus rufipes eats Lepidoptera Bembidion lampros food Collembola (b) Output examples find species(a,b):- known species(species), find sublist(a,b,species). known species([l,o,r,i,c,e,r,a,,p,i,l,i,c,o,r,n,i,s]). known species([h,a,r,p,a,l,u,s,,r,u,f,i,p,e,s]). (c) Partial background knowledge for the ecological experiment Fig. 3: Ecological input (a) and output (b) examples with the background knowledge (c) Methods For each m in the interval [1,5], we randomly select without replacement m positive training examples, 20 positive testing examples, and 20 negative testing examples. We learn using the positive training examples only, but we test with both positive and negative examples, and thus the default predictive accuracy is 50%. We average predictive accuracies and learning times over 20 trials. We repeat the experiment for delimiter sizes 1, 2, and 3. Results Fig. 4 shows that predictive accuracies improve with an increasing number of training examples, with over 80% predictive accuracy from two examples. In all cases, the predictive accuracies of learned solutions are greater than the default accuracy, and thus the null hypothesis is rejected. The results show that in this scenario there is no significant difference in predictive accuracy between delimiter sizes 1 and 2. For a delimiter size of 3, however, the predictive accuracy tails off because Metagol could not learn sufficiently discriminative delimiters under the size constraints. Fig. 4c show an example learned solution. Mean predictive accuracy default accuracy 0.4 Mean learning time (seconds) f(a,b):- f3(a,c), find species(c,b). f3(a,b):- find species(a,c), f2(c,b). f2(a,b):- closed interval(a,b,[f,o],[o,d]). f3(a,b):- find species(a,c), f1(c,b). f1(a,b):- closed interval(a,b,[e,a],[t,s]). (c) Learned program (a) Predictive accuracies (b) Learning times Fig. 4: Ecological learning results when varying number of training examples
5 5 5.2 Patient medical records In this experiment, the aim is to learn programs to extract values from patient medical records. Materials The dataset contains 16 positive patient medical records, modelled on real-world examples 2, paired with a list of values to be extracted. Figs. 1a and 1b show simplified input/output example pairings. The experimental dataset, however, contains one additional input and output value, which is a floating integer value. We provide the following background predicates: find int/2, find float/2, and find patient id/2. We do not, however, provide a predicate to identify the diagnosis field, so Metagol must use the general purpose background predicates, described in Section 4, to learn a solution. To generate negative testing examples, we use an almost identical approach to the ecological experiment, and details are omitted for brevity. Methods The methods are almost identical to ecological experiment, but we use 40 testing examples, half positive and half negative. Results Fig. 5 shows that predictive accuracies improve with an increasing number of training examples, with over 80% predictive accuracy from a single example. In all cases, the predictive accuracies of learned solutions are greater than the default accuracy, and thus the null hypothesis is rejected. The results suggest a difference in predictive accuracy between delimiter sizes 1, 2, and 3, with a delimiter size of two achieving the highest predictive accuracy. Fig. 5c shows an example solution for a delimiter size of 2. Mean predictive accuracy default accuracy 0.4 Mean learning time (seconds) f(a,b):- f2(a,c), f2(c,b). f2(a,b):- find patient id(a,c), find int(c,b). f2(a,b):- f1(a,c), find float(c,b). f1(a,b):- open interval(a,b,[ :, ],[, n ]). f1(a,b):- open interval(a,b,[ :, ],[,, ]). (c) Learned program (a) Predictive accuracies (b) Learning times Fig. 5: Medical record learning results when varying number of training examples 6 Conclusion and further work We have investigated learning programs which transform data from one format to another. A general framework for the problem is presented and experiments conducted on ecological and medical data sets indicate that MIL is capable of generating accurate data transformation programs from small numbers of user-provided examples. Further work This paper provides an initial investigation into an important new group of ILP applications relevant to real-world big data problems. Further work remains to achieve this potential. 2
6 6 In particular, in an extended paper we hope to perform comparisons with alternative ILP and other machine learning techniques on this problem. Although recursion and predicate invention appear important for data transformation problems, it is unclear how critical these are in either of the data sets studied. We would therefore like to study a broader set of datasets to understand this issue better. Several issues need to be studied further in scaling-up the work reported. To begin with, although training data required will be small, owing to the requirements of user-provided annotation, test data will typically consist of large numbers of instances. Running Prolog hypotheses on such data will be time-consuming, and we would like to investigate generating hypotheses in a scripting language, such as Python. For data transformation problems such as the Ecological data set we would also like to investigate the value of large scale background knowledge, which might provide deeper natural language analysis based on dictionaries, tokenisation, part-of-speech tagging, and specialised ontologies. We would also like to investigate Probabilistic ILP approaches [12], such as [8], which have the potential to not only provide probabilistic preferences over hypothesised programs, but also the potential of dealing with issues such as noise which are ubiquitous within real-world data. In the context of free-text data these approaches might also be integrated with finding highest probability parses. References 1. James S Aitken. Learning information extraction rules: An inductive logic programming approach. In ECAI, pages , Mark Craven, Johan Kumlien, et al. Constructing biological knowledge bases by extracting information from text sources. In ISMB, volume 1999, pages 77 86, A. Cropper and S.H. Muggleton. Learning efficient logical robot strategies involving composable objects. In Proceedings of the 24th International Joint Conference Artificial Intelligence (IJCAI 2015), In press. 4. A. Cropper and S.H. Muggleton. Logical minimisation of meta-rules within meta-interpretive learning. In Proceedings of the 24th International Conference on Inductive Logic Programming. Springer-Verlag, In press. 5. Mark Goadrich, Louis Oliphant, and Jude Shavlik. Learning ensembles of first-order clauses for recall-precision curves: A case study in biomedical information extraction. In Inductive logic programming, pages Springer, Vu Le and Sumit Gulwani. Flashextract: A framework for data extraction by examples. In ACM SIGPLAN Notices, volume 49, pages ACM, D. Lin, E. Dechter, K. Ellis, J.B. Tenenbaum, and S.H. Muggleton. Bias reformulation for one-shot function induction. In Proceedings of the 23rd European Conference on Artificial Intelligence (ECAI 2014), pages , Amsterdam, IOS Press. 8. S.H. Muggleton, D. Lin, J. Chen, and A. Tamaddoni-Nezhad. Metabayes: Bayesian meta-interpretative learning using higher-order stochastic refinement. In Gerson Zaverucha, Vitor Santos Costa, and Aline Marins Paes, editors, Proceedings of the 23rd International Conference on Inductive Logic Programming (ILP 2013), pages 1 17, Berlin, Springer-Verlag. LNAI S.H. Muggleton, D. Lin, N. Pahlavi, and A. Tamaddoni-Nezhad. Meta-interpretive learning: application to grammatical inference. Machine Learning, 94:25 49, S.H. Muggleton, D. Lin, and A. Tamaddoni-Nezhad. Meta-interpretive learning of higher-order dyadic datalog: Predicate invention revisited. Machine Learning, Published online: DOI /s y. 11. J.R. Quinlan and R.M Cameron-Jones. FOIL: a midterm report. In P. Brazdil, editor, Proceedings of the 6th European Conference on Machine Learning, volume 667 of Lecture Notes in Artificial Intelligence, pages Springer-Verlag, L. De Raedt, P. Frasconi, K. Kersting, and S.H. Muggleton, editors. Probabilistic Inductive Logic Programming. Springer-Verlag, Berlin, LNAI Ashwin Srinivasan. The Aleph Manual. University of Oxford, KD Sunderland. The diet of some predatory arthropods in cereal crops. Journal of Applied Ecology, 12(2): , Bo Wu and Craig A. Knoblock. An iterative approach to synthesize data transformation programs. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI), 2015.
BET : An Inductive Logic Programming Workbench
BET : An Inductive Logic Programming Workbench Srihari Kalgi, Chirag Gosar, Prasad Gawde Ganesh Ramakrishnan, Kekin Gada, Chander Iyer, Kiran T.V.S, and Ashwin Srinivasan Department of Computer Science
Background knowledge-enrichment for bottom clauses improving.
Background knowledge-enrichment for bottom clauses improving. Orlando Muñoz Texzocotetla and René MacKinney-Romero Departamento de Ingeniería Eléctrica Universidad Autónoma Metropolitana México D.F. 09340,
Learning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
ML for the Working Programmer
ML for the Working Programmer 2nd edition Lawrence C. Paulson University of Cambridge CAMBRIDGE UNIVERSITY PRESS CONTENTS Preface to the Second Edition Preface xiii xv 1 Standard ML 1 Functional Programming
A View Integration Approach to Dynamic Composition of Web Services
A View Integration Approach to Dynamic Composition of Web Services Snehal Thakkar, Craig A. Knoblock, and José Luis Ambite University of Southern California/ Information Sciences Institute 4676 Admiralty
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three
Special Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016
Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00
Doctor of Philosophy in Computer Science
Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects
Lecture 13 of 41. More Propositional and Predicate Logic
Lecture 13 of 41 More Propositional and Predicate Logic Monday, 20 September 2004 William H. Hsu, KSU http://www.kddresearch.org http://www.cis.ksu.edu/~bhsu Reading: Sections 8.1-8.3, Russell and Norvig
Web-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, [email protected] Abstract. Despite the dramatic growth of online genomic
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
Cassandra. References:
Cassandra References: Becker, Moritz; Sewell, Peter. Cassandra: Flexible Trust Management, Applied to Electronic Health Records. 2004. Li, Ninghui; Mitchell, John. Datalog with Constraints: A Foundation
2014/02/13 Sphinx Lunch
2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Database trends: XML data storage
Database trends: XML data storage UC Santa Cruz CMPS 10 Introduction to Computer Science www.soe.ucsc.edu/classes/cmps010/spring11 [email protected] 25 April 2011 DRC Students If any student in the class
Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.
White Paper Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. Using LSI for Implementing Document Management Systems By Mike Harrison, Director,
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
The Optimality of Naive Bayes
The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most
InvGen: An Efficient Invariant Generator
InvGen: An Efficient Invariant Generator Ashutosh Gupta and Andrey Rybalchenko Max Planck Institute for Software Systems (MPI-SWS) Abstract. In this paper we present InvGen, an automatic linear arithmetic
Introduction to Learning & Decision Trees
Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing
Winter 2016 Course Timetable. Legend: TIME: M = Monday T = Tuesday W = Wednesday R = Thursday F = Friday BREATH: M = Methodology: RA = Research Area
Winter 2016 Course Timetable Legend: TIME: M = Monday T = Tuesday W = Wednesday R = Thursday F = Friday BREATH: M = Methodology: RA = Research Area Please note: Times listed in parentheses refer to the
DATA MINING IN FINANCE
DATA MINING IN FINANCE Advances in Relational and Hybrid Methods by BORIS KOVALERCHUK Central Washington University, USA and EVGENII VITYAEV Institute of Mathematics Russian Academy of Sciences, Russia
Towards Semantics-Enabled Distributed Infrastructure for Knowledge Acquisition
Towards Semantics-Enabled Distributed Infrastructure for Knowledge Acquisition Vasant Honavar 1 and Doina Caragea 2 1 Artificial Intelligence Research Laboratory, Department of Computer Science, Iowa State
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
Argumentación en Agentes Inteligentes: Teoría y Aplicaciones Prof. Carlos Iván Chesñevar
Argumentation in Intelligent Agents: Theory and Applications Carlos Iván Chesñevar [email protected] http://cs.uns.edu.ar/~cic cic Part 5 - Outline Main research projects in argumentation Main conferences
Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence
Government of Russian Federation Federal State Autonomous Educational Institution of High Professional Education National Research University «Higher School of Economics» Faculty of Computer Science School
Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari [email protected]
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari [email protected] Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
Reusable Knowledge-based Components for Building Software. Applications: A Knowledge Modelling Approach
Reusable Knowledge-based Components for Building Software Applications: A Knowledge Modelling Approach Martin Molina, Jose L. Sierra, Jose Cuena Department of Artificial Intelligence, Technical University
Chapter 7. Functions and onto. 7.1 Functions
Chapter 7 Functions and onto This chapter covers functions, including function composition and what it means for a function to be onto. In the process, we ll see what happens when two dissimilar quantifiers
Course Outline Department of Computing Science Faculty of Science. COMP 3710-3 Applied Artificial Intelligence (3,1,0) Fall 2015
Course Outline Department of Computing Science Faculty of Science COMP 710 - Applied Artificial Intelligence (,1,0) Fall 2015 Instructor: Office: Phone/Voice Mail: E-Mail: Course Description : Students
A Mind Map Based Framework for Automated Software Log File Analysis
2011 International Conference on Software and Computer Applications IPCSIT vol.9 (2011) (2011) IACSIT Press, Singapore A Mind Map Based Framework for Automated Software Log File Analysis Dileepa Jayathilake
Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina
Graduate Co-op Students Information Manual Department of Computer Science Faculty of Science University of Regina 2014 1 Table of Contents 1. Department Description..3 2. Program Requirements and Procedures
2110711 THEORY of COMPUTATION
2110711 THEORY of COMPUTATION ATHASIT SURARERKS ELITE Athasit Surarerks ELITE Engineering Laboratory in Theoretical Enumerable System Computer Engineering, Faculty of Engineering Chulalongkorn University
Implementation of hybrid software architecture for Artificial Intelligence System
IJCSNS International Journal of Computer Science and Network Security, VOL.7 No.1, January 2007 35 Implementation of hybrid software architecture for Artificial Intelligence System B.Vinayagasundaram and
CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott
Domain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
Improving Knowledge-Based System Performance by Reordering Rule Sequences
Improving Knowledge-Based System Performance by Reordering Rule Sequences Neli P. Zlatareva Department of Computer Science Central Connecticut State University 1615 Stanley Street New Britain, CT 06050
CENG 734 Advanced Topics in Bioinformatics
CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Master s Program in Information Systems
The University of Jordan King Abdullah II School for Information Technology Department of Information Systems Master s Program in Information Systems 2006/2007 Study Plan Master Degree in Information Systems
Semantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
The Relationship Between Precision-Recall and ROC Curves
Jesse Davis [email protected] Mark Goadrich [email protected] Department of Computer Sciences and Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, 2 West Dayton Street,
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data
Predicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
Modeling BPMN Diagrams within XTT2 Framework. A Critical Analysis**
AUTOMATYKA 2011 Tom 15 Zeszyt 2 Antoni Ligêza*, Tomasz Maœlanka*, Krzysztof Kluza*, Grzegorz Jacek Nalepa* Modeling BPMN Diagrams within XTT2 Framework. A Critical Analysis** 1. Introduction Design, analysis
Machine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
Master of Science in Computer Science
Master of Science in Computer Science Background/Rationale The MSCS program aims to provide both breadth and depth of knowledge in the concepts and techniques related to the theory, design, implementation,
Extending Semantic Resolution via Automated Model Building: applications
Extending Semantic Resolution via Automated Model Building: applications Ricardo Caferra Nicolas Peltier LIFIA-IMAG L1F1A-IMAG 46, Avenue Felix Viallet 46, Avenue Felix Viallei 38031 Grenoble Cedex FRANCE
A View Integration Approach to Dynamic Composition of Web Services
A View Integration Approach to Dynamic Composition of Web Services Snehal Thakkar University of Southern California/ Information Sciences Institute 4676 Admiralty Way, Marina Del Rey, California 90292
Chapter 7 Uncomputability
Chapter 7 Uncomputability 190 7.1 Introduction Undecidability of concrete problems. First undecidable problem obtained by diagonalisation. Other undecidable problems obtained by means of the reduction
Analyzing & Debugging ILP Data Mining Query Execution
Analyzing & Debugging ILP Data Mining Query Execution Remko Tronçon Katholieke Universiteit Leuven Dept. of Computer Science Celestijnenlaan 200A B-3001 Leuven, Belgium [email protected] Gerda
Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.
Chronological Sampling for Email Filtering Ching-Lung Fu 2, Daniel Silver 1, and James Blustein 2 1 Acadia University, Wolfville, Nova Scotia, Canada 2 Dalhousie University, Halifax, Nova Scotia, Canada
Advanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
A Business Process Services Portal
A Business Process Services Portal IBM Research Report RZ 3782 Cédric Favre 1, Zohar Feldman 3, Beat Gfeller 1, Thomas Gschwind 1, Jana Koehler 1, Jochen M. Küster 1, Oleksandr Maistrenko 1, Alexandru
Tensor Factorization for Multi-Relational Learning
Tensor Factorization for Multi-Relational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany [email protected] 2 Siemens AG, Corporate
µz An Efficient Engine for Fixed points with Constraints
µz An Efficient Engine for Fixed points with Constraints Kryštof Hoder, Nikolaj Bjørner, and Leonardo de Moura Manchester University and Microsoft Research Abstract. The µz tool is a scalable, efficient
Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research [email protected]
Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research [email protected] Introduction Logistics Prerequisites: basics concepts needed in probability and statistics
Qualitative Modelling
Qualitative Modelling Ivan Bratko Faculty of Computer and Information Sc., University of Ljubljana Abstract. Traditional, quantitative simulation based on quantitative models aims at producing precise
Masters in Information Technology
Computer - Information Technology MSc & MPhil - 2015/6 - July 2015 Masters in Information Technology Programme Requirements Taught Element, and PG Diploma in Information Technology: 120 credits: IS5101
npsolver A SAT Based Solver for Optimization Problems
npsolver A SAT Based Solver for Optimization Problems Norbert Manthey and Peter Steinke Knowledge Representation and Reasoning Group Technische Universität Dresden, 01062 Dresden, Germany [email protected]
Masters in Computing and Information Technology
Masters in Computing and Information Technology Programme Requirements Taught Element, and PG Diploma in Computing and Information Technology: 120 credits: IS5101 CS5001 or CS5002 CS5003 up to 30 credits
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
Big Data & Scripting Part II Streaming Algorithms
Big Data & Scripting Part II Streaming Algorithms 1, Counting Distinct Elements 2, 3, counting distinct elements problem formalization input: stream of elements o from some universe U e.g. ids from a set
Integrating Cognitive Models Based on Different Computational Methods
Integrating Cognitive Models Based on Different Computational Methods Nicholas L. Cassimatis ([email protected]) Rensselaer Polytechnic Institute Department of Cognitive Science 110 8 th Street Troy, NY 12180
What is Learning? CS 391L: Machine Learning Introduction. Raymond J. Mooney. Classification. Problem Solving / Planning / Control
What is Learning? CS 391L: Machine Learning Introduction Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem solving
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
A Case Retrieval Method for Knowledge-Based Software Process Tailoring Using Structural Similarity
A Case Retrieval Method for Knowledge-Based Software Process Tailoring Using Structural Similarity Dongwon Kang 1, In-Gwon Song 1, Seunghun Park 1, Doo-Hwan Bae 1, Hoon-Kyu Kim 2, and Nobok Lee 2 1 Department
Mining the Software Change Repository of a Legacy Telephony System
Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,
CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Effective Data Mining Using Neural Networks
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 8, NO. 6, DECEMBER 1996 957 Effective Data Mining Using Neural Networks Hongjun Lu, Member, IEEE Computer Society, Rudy Setiono, and Huan Liu,
72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD
72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD Paulo Gottgtroy Auckland University of Technology [email protected] Abstract This paper is
Sanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 [email protected] 1. Introduction The field of data mining and knowledgee discovery is emerging as a
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
12: Analysis of Variance. Introduction
1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider
Learning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context
Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context Alejandro Corbellini 1,2, Silvia Schiaffino 1,2, Daniela Godoy 1,2 1 ISISTAN Research Institute, UNICEN University,
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
