An Incrementally Trainable Statistical Approach to Information Extraction Based on Token Classification and Rich Context Models
|
|
- Vernon Randall
- 8 years ago
- Views:
Transcription
1 Dissertation (Ph.D. Thesis) An Incrementally Trainable Statistical Approach to Information Extraction Based on Token Classification and Rich Context Models Christian Siefkes Disputationen: 16th February 2007 Primary Supervisor: Prof. Dr. Heinz F. Schweppe Database and Information Systems Group Fachbereich Mathematik und Informatik Freie Universität Berlin
2 Supervisors: Prof. Dr. Heinz F. Schweppe Database and Information Systems Group Institute for Computer Science Freie Universität Berlin Prof. Dr. Bernhard Thalheim Systems for Information Management Institute of Computer Science and Applied Mathematics Christian-Albrechts-Universität zu Kiel This thesis has been supported by the German Research Society, Berlin-Brandenburg Graduate School in Distributed Information Systems (DFG grant no. GRK 316).
3 Dedicated to the memory of my parents, Uta Siefkes ( ) and Harm Siefkes ( )
4
5 Abstract Most of the information stored in digital form is hidden in natural language (NL) texts. While information retrieval (IR) helps to locate documents which might contain the facts needed, there is no way to answer queries. The purpose of information extraction (IE) is to find desired pieces of information in NL texts and store them in a form that is suitable for automatic querying and processing. The goal of this thesis has been the development and evaluation of a trainable statistical IE approach. This approach introduces new functionality not supported by current IE systems, such as support for incremental training to reduce the human training effort by allowing a more interactive workflow. The IE system introduced in this thesis is designed as a generic framework for statistical classification-based information extraction that allows modifying and exchanging all core components (such as classification algorithm, context representations, tagging strategies) independently of each other. The thesis includes a systematic analysis of switching one such component (the tagging strategies). Several new sources of information are explored for improving extraction quality. Especially we introduce rich tree-based context representations that combine document structure and generic XML markup with more conventional linguistic and semantic sources of information. Preparing these rich context representations makes it necessary to unify various and partially conflicting sources of information (such as structural markup and linguistic annotations) in XML-style trees. For this purpose, we develop a merging algorithm that can repair nesting errors and related problems in XML-like input. As the core of the classification-based IE approach, we introduce a generic classification algorithm (Winnow+OSB) that combines online learning with novel feature combination techniques. We show that this algorithm is not only suitable for information extraction, but also for other tasks such as text classification. Among other good results, the classifier was found to be one of the two best filters submitted for the 2005 Spam Filtering Task of the Text REtrieval Conference (TREC). The thesis includes a detailed evaluation of the resulting IE which shows that the results reached by our system are better than or competitive with those of other stateof-the-art IE systems. The evaluation includes an ablation study that measures the influence of various factors on the overall results and finds that all of them contribute to the good results of our system. It also includes an analysis of the utility of interactive incremental training that confirms that this newly introduced training regimen can be very helpful for reducing the human training effort. The quantitative evaluation is complemented with an analysis of the kinds of mistakes made during extraction and their likely causes that allows a better understanding of where and how we can expect further improvements in information extraction quality to be made and which limits might exist for information extraction systems in general.
6
7 Wir sehen ein kompliziertes Netz von Ähnlichkeiten, die einander übergreifen und kreuzen. Ähnlichkeiten im Großen und Kleinen. Ich kann diese Ähnlichkeiten nicht besser charakterisieren als durch das Wort Familienähnlichkeiten ; denn so übergreifen und kreuzen sich die verschiedenen Ähnlichkeiten, die zwischen den Gliedern einer Familie bestehen: Wuchs, Gesichtszüge, Augenfarbe, Gang, Temperament, etc. etc. Und ich werde sagen: die Spiele bilden eine Familie. [... ] Wie würden wir denn jemandem erklären, was ein Spiel ist? Ich glaube, wir werden ihm Spiele beschreiben, und wir könnten der Beschreibung hinzufügen: das, und Ähnliches, nennt man,spiele. Und wissen wir selbst denn mehr? Können wir etwa nur dem Anderen nicht genau sagen, was ein Spiel ist? Aber das ist nicht Unwissenheit. Wir kennen die Grenzen nicht, weil keine gezogen sind. Ludwig Wittgenstein, Philosophische Untersuchungen Leeloo: Hello. Korben Dallas: Oh, so you speak English now. Leeloo: Yes. I learned. The Fifth Element (1997)
8
9 Contents 1 Introduction Motivation and Goals Contributions Outline of this Work Acknowledgments I The Field of Information Extraction 13 2 Information Extraction Information Retrieval, Text Mining, and Other Related Areas Overview and Classification of Approaches Architecture and Workflow Tasks to Handle Architecture of a Typical IE System Active Learning and Incremental Learning Workflow Statistical Approaches Probabilistic Semantic Parsing Hidden Markov Models Maximum Entropy Markov Models and Conditional Random Fields Token Classification Fragment Classification and Bayesian Networks Non-Statistical Approaches Covering Algorithms Relational Rule Learners Wrapper Induction Hybrid Approaches Knowledge-based Approaches Comparison of Existing Approaches Types of Tasks Handled Types of Texts Handled Considered Features Tagging Requirements and Learning Characteristics
10 Contents II Analysis 47 7 Aims and Requirements Aims of Our Approach Further Requirements Chosen Approach Non-Goals Assumptions Novel Assumptions General Assumptions Suitability of Tasks Target Schemas and Input/Output Models Target Schemas Formats for Input Texts Input Formats for Answer Keys Serialization of Extracted Attribute Values III Algorithms and Models Modeling Information Extraction as a Classification Task Idea and Concept Tagging Strategies Classification Algorithm and Feature Combination Techniques The Winnow Classification Algorithm Feature Combination Techniques Alternative Classification Algorithms and Implementations Preprocessing and Context Representation Preprocessing Tree-based Context Representation Tokenization Merging Conflicting and Incomplete XML Markup Introduction and Motivation Types of Errors in XML-like Input Configurable Settings and Heuristics for Repair Algorithm Description Limitations Application in Our Approach Related Work Weakly Hierarchical Extraction 99 2
11 Contents 14.1 Introduction Inheritance Hierarchies of Attributes Strictly Hierarchical Approach and Related Problems Weakly Hierarchical Approach Integration into Information Extraction Approach IV Evaluation Evaluation Goals and Metrics Goals and Limitations of Quantitative Evaluation Evaluation Methodology Evaluation Metrics Text Classification Experiments Introduction Text Classification Setup for Spam Filtering Experimental Results on the SpamAssassin Corpus TREC Spam Filtering Challenge Concluding Remarks Extraction of Attribute Values Test Corpora Evaluation Results for the Seminar Announcements Corpus Evaluation Results for the Corporate Acquisitions Corpus Ablation Study and Utility of Incremental Training Ablation Study Utility of Interactive Incremental Training Comparison of Tagging Strategies Idea and Setup Comparison Results Analysis Weakly Hierarchical Extraction Experimental Setup Experimental Results Concluding Remarks Mistake Analysis Mistake Types Distribution of Mistakes Type Confusion Additional Manual Analysis Length Analysis
12 Contents V Conclusions Conclusion and Outlook Discussion of Results Summary of Contributions Future Work Bibliography 175 A Schema for Augmented Text 185 B Curriculum Vitae 189 C Zusammenfassung in deutscher Sprache 191 4
13 List of Tables 2.1 Applications of IR, IE, and Text Mining Overview of the Selected Approaches and Systems Example of External Answer Keys Properties of Tagging Strategies Labeling Example SBPH 5 Feature Combinations Containing f Features Generated by SBPH and OSB Regular Expressions Used for Tokenization Promotion and Demotion Factors Threshold Thickness Comparison of SBPH and OSB with Different Feature Storage Sizes Utility of Single Tokens (Unigrams) Sliding Window Size Preprocessing Comparison with Naive Bayes and CRM Results on the Seminar Corpus System Comparison on the Seminar Corpus (F-measure) Results on the Acquisitions Corpus System Comparison on the Acquisitions Corpus (F-measure) Ablation Study: Seminar Announcements Ablation Study: Corporate Acquisitions Results with Incremental Feedback Incremental Feedback: User Effort for Correcting the Training Set Incremental Feedback: User Effort for Correcting the Evaluation Set F-measure Percentages for Incremental Training F-measure Percentages for Batch Training Incremental Training: Significance of Changes Compared to IOB Batch Training: Significance of Changes Compared to IOB Recall Reached by Supertype Recognizers on Subtype Answer Keys Seminar Corpus: Length Distribution of Answer Keys Acquisitions Corpus: Length Distribution of Answer Keys
14
15 List of Figures 3.1 Tasks to Be Handled Architecture of a Typical IE System Sample Interface: Information Extraction from Messages Sample Input Text Sample Text with Inline Annotations Partial DOM Tree of a Simple HTML Document with Linguistic Annotations Processed File from the Seminar Announcements Corpus Inverted Subtree of the Elements Considered for a Context Representation Learning Curve for the best setting (Winnow 1.23,0.83,5%, 600,000 features, OSB 5 ) ROC curve for the best filters (Source: [Cor05, Fig. 2]) Results on the Seminar Corpus Seminar Corpus: Precision and Recall Improvements System Comparison: F-measure Averages on the Seminar Corpus Results on the Acquisitions Corpus Acquisitions Corpus: Precision and Recall Improvements System Comparison: F-measure Averages on the Acquisitions Corpus Ablation Study: Seminar Announcements Ablation Study: Corporate Acquisitions Incremental Feedback: Learning Curve (average precision, recall, and F-measure on all documents processed so far) Incremental Feedback: Correct, Missing, and Spurious Predictions in the Training Set Incremental Feedback: Correct, Missing, and Spurious Predictions in the Evaluation Set Seminar Corpus: Inheritance Hierarchy Acquisitions Corpus: Inheritance Hierarchy Seminar Corpus: F-measure Results Acquisitions Corpus: F-measure Results Acquisitions Corpus: Collapsing Short and Long Names Seminar Corpus: Temporal Predictions Seminar Corpus: Mistakes Combinations
16 List of Figures 21.2 Seminar Corpus: Distribution of Mistake Types Acquisitions Corpus: Mistakes Combinations Acquisitions Corpus: Distribution of Mistake Types Seminar Corpus: Confusion Matrix (expected type predicted type) Acquisitions Corpus: Confusion Matrix (expected type predicted type) Seminar Corpus: Precision and Recall by Token Length Seminar Corpus: F-Measure by Token Length Seminar Corpus: Weighted Averages by Token Length Acquisitions Corpus: Precision and Recall by Token Length Acquisitions Corpus: F-Measure by Token Length Acquisitions Corpus: Weighted Averages by Token Length
Search Engines Chapter 2 Architecture. 14.4.2011 Felix Naumann
Search Engines Chapter 2 Architecture 14.4.2011 Felix Naumann Overview 2 Basic Building Blocks Indexing Text Acquisition Text Transformation Index Creation Querying User Interaction Ranking Evaluation
More informationBing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web
More informationCombining Winnow and Orthogonal Sparse Bigrams for Incremental Spam Filtering
Combining Winnow and Orthogonal Sparse Bigrams for Incremental Spam Filtering Christian Siefkes 1, Fidelis Assis 2, Shalendra Chhabra 3, and William S. Yerazunis 4 1 Berlin-Brandenburg Graduate School
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Introduction Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 13 Introduction Goal of machine learning: Automatically learn how to
More informationProjektgruppe. Information Extraction An Incomplete Overview
Projektgruppe Henning Wachsmuth Information Extraction An Incomplete Overview 12. Mai 2010 1 Einführungsvorträge Verfassen von Seminarvortrag und paper Prof. Dr. Gregor Engels, Donnerstag 15.4., 16h-18h
More informationGETTING FEEDBACK REALLY FAST WITH DESIGN THINKING AND AGILE SOFTWARE ENGINEERING
GETTING FEEDBACK REALLY FAST WITH DESIGN THINKING AND AGILE SOFTWARE ENGINEERING Dr. Tobias Hildenbrand & Christian Suessenbach, SAP AG Entwicklertag Karlsruhe, 22 May 2014 Ich wollte Mitarbeiter so motivieren,
More informationCombining Winnow and Orthogonal Sparse Bigrams for Incremental Spam Filtering
Combining Winnow and Orthogonal Sparse Bigrams for Incremental Spam Filtering Christian Siefkes 1, Fidelis Assis 2, Shalendra Chhabra 3, and William S. Yerazunis 4 1 Berlin-Brandenburg Graduate School
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationSearch Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
More informationJulia Englert, PhD Student. Curriculum Vitae
Julia Englert, PhD Student Curriculum Vitae Name: Nationality: Julia Valerie Englert German Date of Birth: April 14 th 1987 E-Mail: j.englert@uni-saarland.de Phone 0049-681-302-68563 Office Address: Saarland
More informationExemplar for Internal Achievement Standard. German Level 1
Exemplar for Internal Achievement Standard German Level 1 This exemplar supports assessment against: Achievement Standard 90885 Interact using spoken German to communicate personal information, ideas and
More informationPractical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING
Practical Applications of DATA MINING Sang C Suh Texas A&M University Commerce r 3 JONES & BARTLETT LEARNING Contents Preface xi Foreword by Murat M.Tanik xvii Foreword by John Kocur xix Chapter 1 Introduction
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationStefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
More informationEmail Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
More informationNatural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationStefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
More informationHighly Scalable Discriminative Spam Filtering
Highly Scalable Discriminative Spam Filtering Michael Brückner, Peter Haider, and Tobias Scheffer Max Planck Institute for Computer Science, Saarbrücken, Germany Humboldt Universität zu Berlin, Germany
More informationOverview. What is Information Retrieval? Classic IR: Some basics Link analysis & Crawlers Semantic Web Structured Information Extraction/Wrapping
Overview What is Information Retrieval? Classic IR: Some basics Link analysis & Crawlers Semantic Web Structured Information Extraction/Wrapping Hidir Aras, Digitale Medien 1 Agenda (agreed so far) 08.4:
More informationBrauchen die Digital Humanities eine eigene Methodologie?
Deutsche DH, Passau 26.03.2014 Brauchen die Digital Humanities eine eigene Methodologie? 26. März 2014 Heyer / Niekler / Wiedemann 1 Übersicht Aspekte der Operationalisierung geistes- und sozialwissenschaftlicher
More informationModule Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
More informationBayesian Spam Filtering
Bayesian Spam Filtering Ahmed Obied Department of Computer Science University of Calgary amaobied@ucalgary.ca http://www.cpsc.ucalgary.ca/~amaobied Abstract. With the enormous amount of spam messages propagating
More informationMit einem Auge auf den mathema/schen Horizont: Was der Lehrer braucht für die Zukun= seiner Schüler
Mit einem Auge auf den mathema/schen Horizont: Was der Lehrer braucht für die Zukun= seiner Schüler Deborah Löwenberg Ball und Hyman Bass University of Michigan U.S.A. 43. Jahrestagung für DidakEk der
More informationNetwork Big Data: Facing and Tackling the Complexities Xiaolong Jin
Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10
More informationWeb Data Extraction: 1 o Semestre 2007/2008
Web Data : Given Slides baseados nos slides oficiais do livro Web Data Mining c Bing Liu, Springer, December, 2006. Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008
More informationUsability in SW-Engineering-Prozessen und in CMMI
Workshop USABILITY VDE Prüf- und Zertifizierungsinstitut Strategiekreis i-12 Usability in SW-Engineering-Prozessen und in CMMI Helmut Thoma Schweizer Informatik Gesellschaft Lehrbeauftragter Universität
More informationDetection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup
Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor
More informationOn Attacking Statistical Spam Filters
On Attacking Statistical Spam Filters Gregory L. Wittel and S. Felix Wu Department of Computer Science University of California, Davis One Shields Avenue, Davis, CA 95616 USA Paper review by Deepak Chinavle
More informationSimple Language Models for Spam Detection
Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to
More informationTS3: an Improved Version of the Bilingual Concordancer TransSearch
TS3: an Improved Version of the Bilingual Concordancer TransSearch Stéphane HUET, Julien BOURDAILLET and Philippe LANGLAIS EAMT 2009 - Barcelona June 14, 2009 Computer assisted translation Preferred by
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationDisambiguating Implicit Temporal Queries by Clustering Top Relevant Dates in Web Snippets
Disambiguating Implicit Temporal Queries by Clustering Top Ricardo Campos 1, 4, 6, Alípio Jorge 3, 4, Gaël Dias 2, 6, Célia Nunes 5, 6 1 Tomar Polytechnic Institute, Tomar, Portugal 2 HULTEC/GREYC, University
More informationFacilitating Business Process Discovery using Email Analysis
Facilitating Business Process Discovery using Email Analysis Matin Mavaddat Matin.Mavaddat@live.uwe.ac.uk Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process
More informationA STUDY IN USER-CENTRIC DATA INTEGRATION
A STUDY IN USER-CENTRIC DATA INTEGRATION Heiner Stuckenschmidt 1 2, Jan Noessner 1, and Faraz Fallahi 3 1 School of Business Informatics and Mathematics, University of Mannheim. 68159 Mannheim. Germany
More informationAn Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them
An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them Vangelis Karkaletsis and Constantine D. Spyropoulos NCSR Demokritos, Institute of Informatics & Telecommunications,
More information8. Machine Learning Applied Artificial Intelligence
8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name
More informationMining the Software Change Repository of a Legacy Telephony System
Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,
More informationInformation Discovery on Electronic Medical Records
Information Discovery on Electronic Medical Records Vagelis Hristidis, Fernando Farfán, Redmond P. Burke, MD Anthony F. Rossi, MD Jeffrey A. White, FIU FIU Miami Children s Hospital Miami Children s Hospital
More information01219211 Software Development Training Camp 1 (0-3) Prerequisite : 01204214 Program development skill enhancement camp, at least 48 person-hours.
(International Program) 01219141 Object-Oriented Modeling and Programming 3 (3-0) Object concepts, object-oriented design and analysis, object-oriented analysis relating to developing conceptual models
More informationLernsituation 9. Giving information on the phone. 62 Lernsituation 9 Giving information on the phone
Fachkunde 1, Lernfeld 2, Useful Office Vocabulary Lernsituation 9 Giving information on the phone Rolf astian, Managing Director of E Partners KG, recently visited the Promo World Fair in Düsseldorf, an
More informationUsability Evaluation of Modeling Languages
Usability Evaluation of Modeling Languages Bearbeitet von Christian Schalles 1. Auflage 2012. Taschenbuch. XXIII, 183 S. Paperback ISBN 978 3 658 00050 9 Format (B x L): 0 x 0 cm Gewicht: 275 g Weitere
More informationData Mining & Data Stream Mining Open Source Tools
Data Mining & Data Stream Mining Open Source Tools Darshana Parikh, Priyanka Tirkha Student M.Tech, Dept. of CSE, Sri Balaji College Of Engg. & Tech, Jaipur, Rajasthan, India Assistant Professor, Dept.
More informationVerteilte Systeme 3. Dienstevermittlung
VS32 Slide 1 Verteilte Systeme 3. Dienstevermittlung 3.2 Prinzipien einer serviceorientierten Architektur (SOA) Sebastian Iwanowski FH Wedel VS32 Slide 2 Prinzipien einer SOA 1. Definitionen und Merkmale
More informationProjektgruppe. Categorization of text documents via classification
Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction
More informationElena Chiocchetti & Natascia Ralli (EURAC) Tanja Wissik & Vesna Lušicky (University of Vienna)
Elena Chiocchetti & Natascia Ralli (EURAC) Tanja Wissik & Vesna Lušicky (University of Vienna) VII Conference on Legal Translation, Court Interpreting and Comparative Legilinguistics Poznań, 28-30.06.2013
More informationStatistical Feature Selection Techniques for Arabic Text Categorization
Statistical Feature Selection Techniques for Arabic Text Categorization Rehab M. Duwairi Department of Computer Information Systems Jordan University of Science and Technology Irbid 22110 Jordan Tel. +962-2-7201000
More informationIntelligent Systems: Three Practical Questions. Carsten Rother
Intelligent Systems: Three Practical Questions Carsten Rother 04/02/2015 Prüfungsfragen Nur vom zweiten Teil der Vorlesung (Dimitri Schlesinger, Carsten Rother) Drei Typen von Aufgaben: 1) Algorithmen
More informationDoctor of Philosophy in Computer Science
Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects
More informationAP GERMAN LANGUAGE AND CULTURE EXAM 2015 SCORING GUIDELINES
AP GERMAN LANGUAGE AND CULTURE EXAM 2015 SCORING GUIDELINES Identical to Scoring Guidelines used for French, Italian, and Spanish Language and Culture Exams Interpersonal Writing: E-mail Reply 5: STRONG
More informationSOFTWARE ENGINEERING PROGRAM
SOFTWARE ENGINEERING PROGRAM PROGRAM TITLE DEGREE TITLE Master of Science Program in Software Engineering Master of Science (Software Engineering) M.Sc. (Software Engineering) PROGRAM STRUCTURE Total program
More informationMining. Practical. Data. Monte F. Hancock, Jr. Chief Scientist, Celestech, Inc. CRC Press. Taylor & Francis Group
Practical Data Mining Monte F. Hancock, Jr. Chief Scientist, Celestech, Inc. CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor Ei Francis Group, an Informs
More informationCategorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
More informationError Log Processing for Accurate Failure Prediction. Humboldt-Universität zu Berlin
Error Log Processing for Accurate Failure Prediction Felix Salfner ICSI Berkeley Steffen Tschirpke Humboldt-Universität zu Berlin Introduction Context of work: Error-based online failure prediction: error
More informationName: Klasse: Datum: A. Was wissen Sie schon? What do you know already from studying Kapitel 1 in Vorsprung? True or false?
KAPITEL 1 Jetzt geht s los! Vor dem Anschauen A. Was wissen Sie schon? What do you know already from studying Kapitel 1 in Vorsprung? True or false? 1. Man sagt Hallo in Deutschland. 2. Junge Personen
More informationTIn 1: Lecture 3: Lernziele. Lecture 3 The Belly of the Architect. Basic internal components of the 8086. Pointers and data storage in memory
Mitglied der Zürcher Fachhochschule TIn 1: Lecture 3 The Belly of the Architect. Lecture 3: Lernziele Basic internal components of the 8086 Pointers and data storage in memory Architektur 8086 Besteht
More informationA: Ein ganz normaler Prozess B: Best Practices in BPMN 1.x. ITAB / IT Architekturbüro Rüdiger Molle März 2009
A: Ein ganz normaler Prozess B: Best Practices in BPMN 1.x ITAB / IT Architekturbüro Rüdiger Molle März 2009 März 2009 I T A B 2 Lessons learned Beschreibung eines GP durch das Business läßt Fragen der
More informationAnalysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati
More informationSVM Based Learning System For Information Extraction
SVM Based Learning System For Information Extraction Yaoyong Li, Kalina Bontcheva, and Hamish Cunningham Department of Computer Science, The University of Sheffield, Sheffield, S1 4DP, UK {yaoyong,kalina,hamish}@dcs.shef.ac.uk
More informationCombining Global and Personal Anti-Spam Filtering
Combining Global and Personal Anti-Spam Filtering Richard Segal IBM Research Hawthorne, NY 10532 Abstract Many of the first successful applications of statistical learning to anti-spam filtering were personalized
More informationMicrosoft Certified IT Professional (MCITP) MCTS: Windows 7, Configuration (070-680)
Microsoft Office Specialist Office 2010 Specialist Expert Master Eines dieser Examen/One of these exams: Eines dieser Examen/One of these exams: Pflichtexamen/Compulsory exam: Word Core (Exam 077-881)
More informationA prototype infrastructure for D Spin Services based on a flexible multilayer architecture
A prototype infrastructure for D Spin Services based on a flexible multilayer architecture Volker Boehlke 1,, 1 NLP Group, Department of Computer Science, University of Leipzig, Johanisgasse 26, 04103
More informationDistributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
More informationClever Search: A WordNet Based Wrapper for Internet Search Engines
Clever Search: A WordNet Based Wrapper for Internet Search Engines Peter M. Kruse, André Naujoks, Dietmar Rösner, Manuela Kunze Otto-von-Guericke-Universität Magdeburg, Institut für Wissens- und Sprachverarbeitung,
More informationMaster of Science in Computer Science
Master of Science in Computer Science Background/Rationale The MSCS program aims to provide both breadth and depth of knowledge in the concepts and techniques related to the theory, design, implementation,
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationLINGUISTIC SUPPORT IN "THESIS WRITER": CORPUS-BASED ACADEMIC PHRASEOLOGY IN ENGLISH AND GERMAN
ELN INAUGURAL CONFERENCE, PRAGUE, 7-8 NOVEMBER 2015 EUROPEAN LITERACY NETWORK: RESEARCH AND APPLICATIONS Panel session Recent trends in Bachelor s dissertation/thesis research: foci, methods, approaches
More informationEmail Spam Detection A Machine Learning Approach
Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn
More informationMachine Learning for Naive Bayesian Spam Filter Tokenization
Machine Learning for Naive Bayesian Spam Filter Tokenization Michael Bevilacqua-Linn December 20, 2003 Abstract Background Traditional client level spam filters rely on rule based heuristics. While these
More informationExchange Synchronization AX 2012
Exchange Synchronization AX 2012 Autor... Pascal Gubler Dokument... Exchange Synchronization 2012 (EN) Erstellungsdatum... 25. September 2012 Version... 2 / 17.06.2013 Content 1 PRODUKTBESCHREIBUNG...
More informationCENG 734 Advanced Topics in Bioinformatics
CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the
More informationThe Changing Global Egg Industry
Vol. 46 (2), Oct. 2011, Page 3 The Changing Global Egg Industry - The new role of less developed and threshold countries in global egg production and trade 1 - Hans-Wilhelm Windhorst, Vechta, Germany Introduction
More informationBuilding a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
More informationWeb Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
More informationSemantic Video Annotation by Mining Association Patterns from Visual and Speech Features
Semantic Video Annotation by Mining Association Patterns from and Speech Features Vincent. S. Tseng, Ja-Hwung Su, Jhih-Hong Huang and Chih-Jen Chen Department of Computer Science and Information Engineering
More informationDATA is just like CRUDE. It s valuable, but if unrefined it cannot really be used.
Data is the new Oil DATA is just like CRUDE. It s valuable, but if unrefined it cannot really be used. Clive Humby "Digitale Informationsspeicher Im Meer der Daten" "Die Menschen produzieren immer mehr
More informationInteractive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
More informationLan, Mingjun and Zhou, Wanlei 2005, Spam filtering based on preference ranking, in Fifth International Conference on Computer and Information
Lan, Mingjun and Zhou, Wanlei 2005, Spam filtering based on preference ranking, in Fifth International Conference on Computer and Information Technology : CIT 2005 : proceedings : 21-23 September, 2005,
More informationOptimized Scheduling in Real-Time Environments with Column Generation
JG U JOHANNES GUTENBERG UNIVERSITAT 1^2 Optimized Scheduling in Real-Time Environments with Column Generation Dissertation zur Erlangung des Grades,.Doktor der Naturwissenschaften" am Fachbereich Physik,
More informationWeb Database Integration
Web Database Integration Wei Liu School of Information Renmin University of China Beijing, 100872, China gue2@ruc.edu.cn Xiaofeng Meng School of Information Renmin University of China Beijing, 100872,
More informationLeitfaden für die Antragstellung zur Förderung einer nationalen Biomaterialbankeninitiative
Seite 1 von 8 Leitfaden für die Antragstellung zur Förderung einer nationalen Biomaterialbankeninitiative Anträge zu Biomaterialbanken sind entsprechend den Vorgaben dieses Leitfadens zu erstellen (DIN
More informationA Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
More informationVorläufiges English Programme im akademischen Jahr 2015/2016 Preliminary English Programme in the Academic Year 2015/2016 *for English see below*
Vorläufiges English Programme im akademischen Jahr 2015/2016 Preliminary English Programme in the Academic Year 2015/2016 *for English see below* Im akademischen Jahr 2015/2016 freuen wir uns Ihnen erneut
More informationPerformance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com
More informationida.com excellence in dependable automation
IEC 61508 Maintenance Status IEC 61508 Maintenance Projekt ist aus dem zulässigen Zeitrahmen gelaufen Viele Baustellen auch durch neue Mitglieder (Frankreich, USA, IEC 61511 Team) Bestehende Anforderungen,
More informationVoraussetzungen/ Prerequisites *for English see below*
English Programme im akademischen Jahr 2013/2014 English Programme in the Academic Year 2013/2014 *for English see below* Im akademischen Jahr 2013/2014 freuen wir uns Ihnen erneut ein Programm mit englischsprachigen
More informationCourse 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
More informationResearch Statement Immanuel Trummer www.itrummer.org
Research Statement Immanuel Trummer www.itrummer.org We are collecting data at unprecedented rates. This data contains valuable insights, but we need complex analytics to extract them. My research focuses
More informationChecklist Use this checklist to find out how much English you already know. Grundstufe 1 (Common European Framework: A1 Level)
Der XL Test: Was können Sie schon? Schätzen Sie Ihre Sprachkenntnisse selbst ein! Sprache: Englisch Mit der folgenden e haben Sie die Möglichkeit, Ihre Fremdsprachenkenntnisse selbst einzuschätzen. Die
More informationArchitecture of an Ontology-Based Domain- Specific Natural Language Question Answering System
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationExemplar for Internal Assessment Resource German Level 1. Resource title: Planning a School Exchange
Exemplar for internal assessment resource German 1.5A for Achievement Standard 90887! Exemplar for Internal Assessment Resource German Level 1 Resource title: Planning a School Exchange This exemplar supports
More informationSpam Filtering using Naïve Bayesian Classification
Spam Filtering using Naïve Bayesian Classification Presented by: Samer Younes Outline What is spam anyway? Some statistics Why is Spam a Problem Major Techniques for Classifying Spam Transport Level Filtering
More informationInnovative network monitoring technologies for hydraulically not separated large zones
Innovative network monitoring technologies for hydraulically not separated large zones MWM Martinek Water Management AUSTRIA What means Sisyphus to Water-Loss-Reduction? According to Greek Mythology Zeus
More informationSAP Sourcing/CLM Webcast Query & User/Contact Maintenance Web Service
SAP Sourcing/CLM Webcast Query & User/Contact Maintenance Web Service Vikram Shanmugasundaram / SAP Sourcing/CLM Center of Excellence Ed Dunne / SAP Sourcing/CLM Product Management November 2015 SAP Sourcing
More informationPPS Internet-Praktikum. Prof. Bernhard Plattner Institut für Technische Informatik und Kommunikationsnetze (TIK)
PPS Internet-Praktikum Prof. Bernhard Plattner Institut für Technische Informatik und Kommunikationsnetze (TIK) September 2011 Zielsetzung Von unserer Webpage: Das Ziel dieser PPS-Veranstaltung ist es,
More informationThemen der Praktikumsnachmittage. PPS Internet-Praktikum. Zielsetzung. Infrastruktur im ETF B5
PPS Internet-Praktikum Prof. Bernhard Plattner Institut für Technische Informatik und Kommunikationsnetze (TIK) Themen der Praktikumsnachmittage Aufbau und Analyse eines kleinen Netzwerks Routing Anwendungen
More informationCAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance
CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance Shen Wang, Bin Wang and Hao Lang, Xueqi Cheng Institute of Computing Technology, Chinese Academy of
More informationCrowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach
Outline Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach Jinfeng Yi, Rong Jin, Anil K. Jain, Shaili Jain 2012 Presented By : KHALID ALKOBAYER Crowdsourcing and Crowdclustering
More informationContent visualization of scientific corpora using an extensible relational database implementation
. Content visualization of scientific corpora using an extensible relational database implementation Eleftherios Stamatogiannakis, Ioannis Foufoulas, Theodoros Giannakopoulos, Harry Dimitropoulos, Natalia
More information