Open Domain Question Answering via Semantic Enrichment

Save this PDF as:

Size: px
Start display at page:

Transcription

6 log L = E Q log P (θ i α) + E Q log P (Z i θ i ) + E Q log P (w q i, t t i a Z i, β Q, β T ) (10) + H(Q(θ, Z)) Figure 5: JQA Generative Model Draw a pattern label: Z i Mult(θ i). Draw a word for the question q i from V Q : w β Q Z i,:. Draw a type for the answer a i from V T : t β T Z i,:. We formulate the lielihood of observing all the question and its associated answer types as follows: L = P (q i, t i a α, β Q, β T ) Here, = θ i P (θ i α)p (q i, t i a θ i, β Q, β T ) dθ i (6) P (q i, t i a θ i, β Q, β T ) = P (Z i θ i ) P (w Z i, β Q ) P (t Z i, β T ) (7) Z i w q i t t i a Finally, we resort to the maximum lielihood estimation approach to optimize the parameters in the model: 5.1 Model Solution arg max log L (8) α,β Q,βT Variational Inference Due to the interdependence of the latent variables, their true posterior distributions are computationally intractable. We introduce a variational distribution Q [4] in which the latent variables are independent of each other to approximate their true posterior distribution, i.e., Q(θ, Z) = Q(θ)Q(Z), where θ = {θ i, i D} and Z = {Z i, i D}. According to the variational distribution, Q(θ i ) Dir(γ i ), Q(Z i ) Mult(ϕ i ), where γ i and ϕ i are K 1 variational parameters. Instead of directly maximizing L which is intractable, we can maximize the lower bound of the log lielihood under the variational distribution and Jensen s inequality: log L E Q logp (D, θ, Z α, β T, β Q ) + H(Q) = log L (9) Each term on the right-hand side of the above equation, is a function over the model parameters as shown in Eqn. 11 to Eqn. 14. E Q log P (θ i α) = D B(α) + (α 1)[ψ(γ i, ) ψ( (11) γ i, )] where B(α) = Γ(α ) Γ( is the normalization constant of the α ) Dirichlet distribution Dir(α). E Q log P (Z i θ i ) = ϕ i, [ψ(γ i, ) ψ( γ i, )] D The third term E Q log P (w q i, t t i a Z i, β Q, β T ) = (12) ϕ i ( Nw qi log β Q,w + log β,t T ) (13) w q i t t i a where Nw qi is the frequency of word w in question q i. The entropy term H(Q(θ, Z)) = E Q log Q(θ i γ i ) + E Q log Q(Z i ϕ i ) = [log B(γ i ) ϕ i log ϕ i (γ i, 1)(ψ(γ i, ) ψ( γ i, ))] (14) Parameter Estimation The model parameters are estimated by using the variational expectation-maximization (EM) algorithm. In E-step, we update the variational parameters {γ s, ϕ s} while in M- step, we update the model parameters α, β Q, and β T so that log L is maximized. Specifically, E-step updates the variational parameters according to Eqn. 15 and 16. ϕ i, exp( Nw qi log β Q,w + log β,t T w q i t t i a + ψ(γ i, ) ψ( γ i, ) 1) (15) γ i, = α + ϕ i, (16) During M-step, we maximize the lower bound over the parameter α, β Q, and β T, by utilizing the classic L-BFGS optimization algorithm [29]. The derivatives over the parameter α are calculated in Eqn. 17. where D denotes all the questions and their associated answer types. We expand the lower bound of the log lielihood as follows: log L α = D [ ψ(α ) + ψ( α )] + [ψ(γ i, ) ψ( γ i, )] (17)

Question Answering with Subgraph Embeddings Antoine Bordes Facebook AI Research 112 avenue de Wagram, Paris, France abordes@fb.com Sumit Chopra Facebook AI Research 770 Broadway, New York, USA spchopra@fb.com

Multi-source hybrid Question Answering system Seonyeong Park, Hyosup Shim, Sangdo Han, Byungsoo Kim, Gary Geunbae Lee Pohang University of Science and Technology, Pohang, Republic of Korea {sypark322,

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

Analyzing the Dynamics of Research by Extracting Key Aspects of Scientific Papers

Analyzing the Dynamics of Research by Extracting Key Aspects of Scientific Papers Sonal Gupta Christopher Manning Natural Language Processing Group Department of Computer Science Stanford University Columbia

Watson: An Overview of the DeepQA Project

Watson: An Overview of the DeepQA Project AI Magazine 2010 IMB TEAM: David Ferrucci, Eric Brown, Jennifer Chu-Carroll, James Fan, David Gondek, Aditya A. Kalyanpur, Adam Lally, J. William Murdock, Eric

Mavuno: A Scalable and Effective Hadoop-Based Paraphrase Acquisition System

Mavuno: A Scalable and Effective Hadoop-Based Paraphrase Acquisition System Donald Metzler and Eduard Hovy Information Sciences Institute University of Southern California Overview Mavuno Paraphrases 101

Structure Mapping for Jeopardy! Clues

Structure Mapping for Jeopardy! Clues J. William Murdock murdockj@us.ibm.com IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 Abstract. The Jeopardy! television quiz show asks natural-language

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

Subordinating to the Majority: Factoid Question Answering over CQA Sites

Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei

Finding Answers in Semi-Structured Data Joe DiFebo Sean Massung December 19, 2012 Abstract Using Wikipedia [5] as a knowledge base is a relatively new branch in the thought process of the Question Answering

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

The Prolog Interface to the Unstructured Information Management Architecture Paul Fodor 1, Adam Lally 2, David Ferrucci 2 1 Stony Brook University, Stony Brook, NY 11794, USA, pfodor@cs.sunysb.edu 2 IBM

End-to-End Sentiment Analysis of Twitter Data

End-to-End Sentiment Analysis of Twitter Data Apoor v Agarwal 1 Jasneet Singh Sabharwal 2 (1) Columbia University, NY, U.S.A. (2) Guru Gobind Singh Indraprastha University, New Delhi, India apoorv@cs.columbia.edu,

Customer Intentions Analysis of Twitter Based on Semantic Patterns

Customer Intentions Analysis of Twitter Based on Semantic Patterns Mohamed Hamroun mohamed.hamrounn@gmail.com Mohamed Salah Gouider ms.gouider@yahoo.fr Lamjed Ben Said lamjed.bensaid@isg.rnu.tn ABSTRACT

Question Answering for Dutch: Simple does it

Question Answering for Dutch: Simple does it Arjen Hoekstra Djoerd Hiemstra Paul van der Vet Theo Huibers Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, P.O.

Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu

Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!

Personalization of Web Search With Protected Privacy

Personalization of Web Search With Protected Privacy S.S DIVYA, R.RUBINI,P.EZHIL Final year, Information Technology,KarpagaVinayaga College Engineering and Technology, Kanchipuram [D.t] Final year, Information

Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of

Statistical Machine Translation: IBM Models 1 and 2

Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation

I, ONLINE KNOWLEDGE TRANSFER FREELANCER PORTAL M. & A.

ONLINE KNOWLEDGE TRANSFER FREELANCER PORTAL M. Micheal Fernandas* & A. Anitha** * PG Scholar, Department of Master of Computer Applications, Dhanalakshmi Srinivasan Engineering College, Perambalur, Tamilnadu

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

Automatic Web Page Classification

Automatic Web Page Classification Yasser Ganjisaffar 84802416 yganjisa@uci.edu 1 Introduction To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory

Introduction. A. Bellaachia Page: 1

Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

FINDING EXPERT USERS IN COMMUNITY QUESTION ANSWERING SERVICES USING TOPIC MODELS

FINDING EXPERT USERS IN COMMUNITY QUESTION ANSWERING SERVICES USING TOPIC MODELS by Fatemeh Riahi Submitted in partial fulfillment of the requirements for the degree of Master of Computer Science at Dalhousie

» A Hardware & Software Overview. Eli M. Dow <emdow@us.ibm.com:>

» A Hardware & Software Overview Eli M. Dow Overview:» Hardware» Software» Questions 2011 IBM Corporation Early implementations of Watson ran on a single processor where it took 2 hours

Natural Language to Relational Query by Using Parsing Compiler

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

Mining the Software Change Repository of a Legacy Telephony System

Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines

, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised

Stanford s Distantly-Supervised Slot-Filling System

Stanford s Distantly-Supervised Slot-Filling System Mihai Surdeanu, Sonal Gupta, John Bauer, David McClosky, Angel X. Chang, Valentin I. Spitkovsky, Christopher D. Manning Computer Science Department,

Searching Questions by Identifying Question Topic and Question Focus

Searching Questions by Identifying Question Topic and Question Focus Huizhong Duan 1, Yunbo Cao 1,2, Chin-Yew Lin 2 and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China, 200240 {summer, yyu}@apex.sjtu.edu.cn

Text Analytics with Ambiverse. Text to Knowledge. www.ambiverse.com

Text Analytics with Ambiverse Text to Knowledge www.ambiverse.com Version 1.0, February 2016 WWW.AMBIVERSE.COM Contents 1 Ambiverse: Text to Knowledge............................... 5 1.1 Text is all Around

Volume 4, Issue 6, June 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Improving Web Image

The University of Amsterdam s Question Answering System at QA@CLEF 2007

The University of Amsterdam s Question Answering System at QA@CLEF 2007 Valentin Jijkoun, Katja Hofmann, David Ahn, Mahboob Alam Khalid, Joris van Rantwijk, Maarten de Rijke, and Erik Tjong Kim Sang ISLA,

Overview. Clustering. Clustering vs. Classification. Supervised vs. Unsupervised Learning. Connectionist and Statistical Language Processing

Overview Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes clustering vs. classification supervised vs. unsupervised

Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

The Scientific Data Mining Process

Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering

Research Statement Immanuel Trummer www.itrummer.org

Research Statement Immanuel Trummer www.itrummer.org We are collecting data at unprecedented rates. This data contains valuable insights, but we need complex analytics to extract them. My research focuses

TREC 2003 Question Answering Track at CAS-ICT

TREC 2003 Question Answering Track at CAS-ICT Yi Chang, Hongbo Xu, Shuo Bai Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China changyi@software.ict.ac.cn http://www.ict.ac.cn/

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY G.Evangelin Jenifer #1, Mrs.J.Jaya Sherin *2 # PG Scholar, Department of Electronics and Communication Engineering(Communication and Networking), CSI Institute

Probabilistic topic models for sentiment analysis on the Web

University of Exeter Department of Computer Science Probabilistic topic models for sentiment analysis on the Web Chenghua Lin September 2011 Submitted by Chenghua Lin, to the the University of Exeter as

Interactive Dynamic Information Extraction

Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken

SemLinker, a Modular and Open Source Framework for Named Entity Discovery and Linking

SemLinker, a Modular and Open Source Framework for Named Entity Discovery and Linking Marie-Jean Meurs 1, Hayda Almeida 2, Ludovic Jean-Louis 3, Eric Charton 4 1 Université du Québec à Montréal, 2 Concordia

TREC 2007 ciqa Task: University of Maryland

TREC 2007 ciqa Task: University of Maryland Nitin Madnani, Jimmy Lin, and Bonnie Dorr University of Maryland College Park, Maryland, USA nmadnani,jimmylin,bonnie@umiacs.umd.edu 1 The ciqa Task Information

A Framework for Ontology-Based Knowledge Management System

A Framework for Ontology-Based Knowledge Management System Jiangning WU Institute of Systems Engineering, Dalian University of Technology, Dalian, 116024, China E-mail: jnwu@dlut.edu.cn Abstract Knowledge

Answering Questions Requiring Cross-passage Evidence Kisuh Ahn Dept. of Linguistics Hankuk University of Foreign Studies San 89 Wangsan-li, Mohyeon-myeon Yongin-si, Gyeongi-do, Korea kisuhahn@gmail.com

Automatic Knowledge Base Construction Systems. Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014

Automatic Knowledge Base Construction Systems Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014 1 Text Contains Knowledge 2 Text Contains Automatically Extractable Knowledge 3

Clustering Technique in Data Mining for Text Documents

Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Machine Learning Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) What Is Machine Learning? A computer program is said to learn from experience E with respect to some class of

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven

Dynamics of Genre and Domain Intents

Dynamics of Genre and Domain Intents Shanu Sushmita, Benjamin Piwowarski, and Mounia Lalmas University of Glasgow {shanu,bpiwowar,mounia}@dcs.gla.ac.uk Abstract. As the type of content available on the

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

PRODUCT REVIEW RANKING SUMMARIZATION

PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,

Search Result Optimization using Annotators

Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

ILLINOIS MATH SOLVER: Math Reasoning on the Web

ILLINOIS MATH SOLVER: Math Reasoning on the Web Subhro Roy and Dan Roth University of Illinois, Urbana Champaign {sroy9, danr}@illinois.edu Abstract There has been a recent interest in understanding text

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior N.Jagatheshwaran 1 R.Menaka 2 1 Final B.Tech (IT), jagatheshwaran.n@gmail.com, Velalar College of Engineering and Technology,

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

A Statistical Text Mining Method for Patent Analysis

A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical

Guest Editors Introduction: Machine Learning in Speech and Language Technologies

Guest Editors Introduction: Machine Learning in Speech and Language Technologies Pascale Fung (pascale@ee.ust.hk) Department of Electrical and Electronic Engineering Hong Kong University of Science and

The Battle for the Future of Data Mining Oren Etzioni, CEO Allen Institute for AI (AI2) March 13, 2014

The Battle for the Future of Data Mining Oren Etzioni, CEO Allen Institute for AI (AI2) March 13, 2014 Big Data Tidal Wave What s next? 2 3 4 Deep Learning Some Skepticism about Deep Learning Of course,

3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work

Unsupervised Paraphrase Acquisition via Relation Discovery Takaaki Hasegawa Cyberspace Laboratories Nippon Telegraph and Telephone Corporation 1-1 Hikarinooka, Yokosuka, Kanagawa 239-0847, Japan hasegawa.takaaki@lab.ntt.co.jp

Decision Making in IBM Watson Question Answering

Decision Making in IBM Watson Question ing Dr. J. William Murdock IBM Watson Research Center IBM, the IBM logo, ibm.com, Smarter Planet and the planet icon are trademarks of International Business Machines

arxiv: v1 [cs.ir] 20 Dec 2016

Classification and Learning-to-rank Approaches for Cross-Device Matching at CIKM Cup 2016 Nam Khanh Tran L3S Research Center - Leibniz Universität Hannover ntran@l3s.de arxiv:1612.07117v1 [cs.ir] 20 Dec

Robust Question Answering for Speech Transcripts: UPC Experience in QAst 2009

Robust Question Answering for Speech Transcripts: UPC Experience in QAst 2009 Pere R. Comas and Jordi Turmo TALP Research Center Technical University of Catalonia (UPC) {pcomas,turmo}@lsi.upc.edu Abstract

Lecture 6: The Bayesian Approach

Lecture 6: The Bayesian Approach What Did We Do Up to Now? We are given a model Log-linear model, Markov network, Bayesian network, etc. This model induces a distribution P(X) Learning: estimate a set

Citation Context Sentiment Analysis for Structured Summarization of Research Papers

Citation Context Sentiment Analysis for Structured Summarization of Research Papers Niket Tandon 1,3 and Ashish Jain 2,3 ntandon@mpi-inf.mpg.de, ashish.iiith@gmail.com 1 Max Planck Institute for Informatics,

Search and Information Retrieval

Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

Folksonomies versus Automatic Keyword Extraction: An Empirical Study

Folksonomies versus Automatic Keyword Extraction: An Empirical Study Hend S. Al-Khalifa and Hugh C. Davis Learning Technology Research Group, ECS, University of Southampton, Southampton, SO17 1BJ, UK {hsak04r/hcd}@ecs.soton.ac.uk

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

CHAPTER 3 DATA MINING AND CLUSTERING

CHAPTER 3 DATA MINING AND CLUSTERING 3.1 Introduction Nowadays, large quantities of data are being accumulated. The amount of data collected is said to be almost doubled every 9 months. Seeking knowledge

Machine Learning using MapReduce

Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

Topic and Trend Detection in Text Collections using Latent Dirichlet Allocation

Topic and Trend Detection in Text Collections using Latent Dirichlet Allocation Levent Bolelli 1, Şeyda Ertekin 2, and C. Lee Giles 3 1 Google Inc., 76 9 th Ave., 4 th floor, New York, NY 10011, USA 2

Tensor Factorization for Multi-Relational Learning

Tensor Factorization for Multi-Relational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany nickel@dbs.ifi.lmu.de 2 Siemens AG, Corporate

ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU

ONTOLOGIES p. 1/40 ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU Unlocking the Secrets of the Past: Text Mining for Historical Documents Blockseminar, 21.2.-11.3.2011 ONTOLOGIES

Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted

Semi-Supervised Learning for Blog Classification

Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Semi-Supervised Learning for Blog Classification Daisuke Ikeda Department of Computational Intelligence and Systems Science,

Term extraction for user profiling: evaluation by the user

Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,

Micro blogs Oriented Word Segmentation System

Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,

Predicting stocks returns correlations based on unstructured data sources

Predicting stocks returns correlations based on unstructured data sources Mateusz Radzimski, José Luis Sánchez-Cervantes, José Luis López Cuadrado, Ángel García-Crespo Departamento de Informática Universidad

Domain Classification of Technical Terms Using the Web

Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

CHI DATABASE VISUALIZATION

CHI DATABASE VISUALIZATION Niko Vegt Introduction The CHI conference is leading within the field of interaction design. Thousands of papers are published for this conference in an orderly structure. These

Interactive Visual Text Analytics for Decision Making

Interactive Visual Text Analytics for Decision Making Shixia Liu Microsoft Research Asia 1 Text is Everywhere We use documents as primary information artifact in our lives Our access to documents has grown

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION

Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy

The Deep Web: Surfacing Hidden Value Michael K. Bergman Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy Presented by Mat Kelly CS895 Web-based Information Retrieval

Data mining knowledge representation

Data mining knowledge representation 1 What Defines a Data Mining Task? Task relevant data: where and how to retrieve the data to be used for mining Background knowledge: Concept hierarchies Interestingness

Distributed Regression For Heterogeneous Data Sets 1

Distributed Regression For Heterogeneous Data Sets 1 Yan Xing, Michael G. Madden, Jim Duggan, Gerard Lyons Department of Information Technology National University of Ireland, Galway Ireland {yan.xing,