Life of A Knowledge Base (KB)

Similar documents
Bayesian Networks. Mausam (Slides by UW-AI faculty)

Bayesian Networks. Read R&N Ch Next lecture: Read R&N

Cell Phone based Activity Detection using Markov Logic Network

The Basics of Graphical Models

Challenges and Lessons from NIST Data Science Pre-pilot Evaluation in Introduction to Data Science Course Fall 2015

Relational Dynamic Bayesian Networks: a report. Cristina Manfredotti

Automatic Knowledge Base Construction Systems. Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014

3. The Junction Tree Algorithms

Statistical Models in Data Mining

Analytics on Big Data

Introduction to Data Mining

Bayesian networks - Time-series models - Apache Spark & Scala

Multi-Class and Structured Classification

Final Project Report

Conditional Random Fields: An Introduction

LECTURE 4. Last time: Lecture outline

Machine Learning.

Question 2 Naïve Bayes (16 points)

Message-passing sequential detection of multiple change points in networks

Course: Model, Learning, and Inference: Lecture 5

Introduction to Data Mining

Estimating Software Reliability In the Absence of Data

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

MACHINE LEARNING IN HIGH ENERGY PHYSICS

5 Directed acyclic graphs

Querying Past and Future in Web Applications

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

Programming Tools based on Big Data and Conditional Random Fields

Classification Problems

Information Management course

Probability Generating Functions

Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié.

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

Data, Measurements, Features

Introduction. A. Bellaachia Page: 1

Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University

An Introduction to Data Mining

Classification algorithm in Data mining: An Overview

Evaluation of Machine Learning Techniques for Green Energy Prediction

Statistical Machine Learning from Data

Tracking Groups of Pedestrians in Video Sequences

HIGH PERFORMANCE BIG DATA ANALYTICS

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

8. Machine Learning Applied Artificial Intelligence

University of Chicago Graduate School of Business. Business 41000: Business Statistics Solution Key

Model-based Synthesis. Tony O Hagan

The Data Mining Process

Data Mining - Evaluation of Classifiers

Naive-Bayes Classification Algorithm

Study Manual. Probabilistic Reasoning. Silja Renooij. August 2015

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

Agenda. Interface Agents. Interface Agents

The University of Jordan

Spark. Fast, Interactive, Language- Integrated Cluster Computing

Probabilistic Networks An Introduction to Bayesian Networks and Influence Diagrams

CSCI567 Machine Learning (Fall 2014)

A Learning Based Method for Super-Resolution of Low Resolution Images

Travis Goodwin & Sanda Harabagiu

Big Data Science. Prof. Lise Getoor University of Maryland, College Park. October 17, 2013

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

A survey on click modeling in web search

Supervised Learning (Big Data Analytics)

Principles of Data Mining by Hand&Mannila&Smyth

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

Big Data Analytics. Lucas Rego Drumond

A hidden Markov model for criminal behaviour classification

Chapter 28. Bayesian Networks

IC05 Introduction on Networks &Visualization Nov

Probabilistic Assertions

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Bayesian Networks and Classifiers in Project Management

Machine Learning and Statistics: What s the Connection?

Feature Engineering in Machine Learning

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Mining Social Network Graphs

Business Statistics 41000: Probability 1

Statistics Graduate Courses

An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Part 2: Community Detection

Gibbs Sampling and Online Learning Introduction

Linear Threshold Units

Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

Christfried Webers. Canberra February June 2015

Graph Database Performance: An Oracle Perspective

Customer Classification And Prediction Based On Data Mining Technique

Big Data and Analytics: Challenges and Opportunities

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

STA 4273H: Statistical Machine Learning

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

Introduction to Markov Chain Monte Carlo

A CRF-based approach to find stock price correlation with company-related Twitter sentiment

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Resource management on Cloud systems with Machine Learning

Review. Bayesianism and Reliability. Today s Class

Knowledge-Based Probabilistic Reasoning from Expert Systems to Graphical Models

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Predict Influencers in the Social Network

Transcription:

Life of A Knowledge Base (KB) A knowledge base system is a special kind of database management system to for knowledge base management. KB extraction: knowledge extraction using statistical models in NLP/ML literature KB expansion: knowledge inference using deductive rules and knowledge with uncertainty KB evolution: knowledge update given new evidence and new data sources KB integration: knowledge integration from multiple sources (e.g., human, data sets, models) 21

Uncertainty Management Where does Uncertainty come from? Inherent in the state-of-the-art NLP results Incorrect, conflicting data sources Derived facts and query results Uncertainty vs. NULL Uncertainty vs. MAP/majority voting Probability Theory should lay the foundation for uncertainty management Probabilistic Knowledge Base System 22

Probabilistic Knowledge Base System Statistical machine learning models and data processing and management systems are the backbones Knowledge & uncertainty Knowledge Integration Questions Knowledge & uncertainty Answers Knowledge Extraction Knowledge Expansion Knowledge Evolution 23

Web Does Not Know All For example, the Web has much information on: X Food contains Y Chemical Y Chemicals prevents Z Disease The Web has little information on: X Food prevents Z Disease We can infer knowledge in addition to those extracted from the Web using first-order rules First-order Rules can be automatically extracted from Web [Sherlock-Holmes] Uncertainties (e.g., rules, facts) needs to be populated, maintained and managed 24

Probabilistic KB inference using MLN A Markov logic network is a set of formulae with weights. Together with a finite set of constants, it defines a Markov network (i.e., factor graph). 0.859 Contains(Food, Chemical) :- IsMadeFrom(Food, Ingredient), Contains(Ingredient, Chemical) Made (F1,I1) Prevents (C1,D1) 0.855 Prevents(Food, Disease) :- Contains(Food, Chemical), Prevents(Chemical, Disease) Contains (I1,C1) Contains (F1,C1) Prevents (F1,D1) C = {F1, C1, D1, I1} 25

Explicit vs. Implicit Information

Does Obama live in the US?

Knowledge Expansion

Challenges Scalable inference for knowledge expansion Rule learning Inference rules, e.g.: PoliticianOf(Person,City) LivesIn(Person,City) RetriedFrom(Prof,School) WorkedFor(Prof,School) Constraint rules, e.g.: (functional dependencies, conditional FDs) Every person can only have one birth place Every city can only belong to one country 29

From Big Data to Big Knowledge

From Big Data to Big Knowledge Data Facts Rules

Rule Learning over Knowledge Bases: Association Rule Mining T1 Milk Eggs Chips Bread T2 Pizza Eggs Meat T3 Soda Eggs Milk Meat

Association Rules T1 Milk Eggs Chips Bread T2 Pizza Eggs Meat Eggs Milk? T3 Soda Eggs Milk Meat Meat Eggs?

Association Rule Metrics Support (B -> H) Pr(B ^ H) # transactions (B ^ H) / total # transactions Confidence (B -> H) Pr (H B) = Pr(H ^ B) / Pr(B) (Bayes Rule) # transactions (B ^ H) / # transactions (B) Lift (B -> H) Pr(H ^ B) / Pr(B)Pr(H) # transactions (B ^ H) x total # transactions / # trans (B) x # trans (H)

Association Rules T1 Milk Eggs Chips Bread T2 Pizza Eggs Meat Eggs Milk? Support = 2/3 Confidence = 2/3 T3 Soda Eggs Milk Meat Meat Eggs? Support = 2/3 Confidence = 1

AMIE: Association Rules in a KB x, y are instance and X, Y are classes Items -> p(x,y) memberof(x, y), isalliedwith(x, z), etc. Transactions -> R(x,y) = {p_i(x,y)}, i=1 N {isalliedwith(x,y), hasfather(x,y), etc.} Support of p(x,y) -> q(x,y) # (X,Y) : B & H Confidence of p(x,y) -> q(x,y) # (X, Y) : B & H / #(X, Y) : B

Example Rule Extending to X, Y, Z Items -> p(x,y), p(x,z) or p(y,z) Transactions -> R(x,y,z) Support of p(x,y) ^ q(x,y) -> r(x,y) # (X,Y,Z) : B & H Confidence of p(x,y) -> q(x,y) # (X,Y,Z) : B & H / #(X,Y,Z) : B Example Rule: isalliedwith (X, Y) ^ memberof (Y, Z) memberof (X, Z)

Querying ProbKB s SPARQL over JENA RDF store?? And?? Top-K answers with ranked confidence value 38

Summary IE/NLP techniques KBs and probabilistic KBs MLN model, inference and rule learning 39

Logistics Project proposal due Sept. 26 th, instruction of the proposal report will be forthcoming Group size min = 2, max = 4, medium = 3 Computing resources: AWS credits and department/uf computing facilities Example project ideas will be posted related to graph analysis, text/image retrieval and knowledge base construction AWS, Map-Reduce, SQL/Hive, nosql, Tableau.. coursera, UF CISE intro to DS spring 14

Probabilistic graphical models Graphical models Probabilistic models Directed (Bayesian networks) Undirected (Markov Random fields - MRFs)

Bayesian Networks Directed Acyclic Graph (DAG) Nodes are random variables Edges indicate causal influences Conditional independence: each RV is conditionally independent of other nodes given its parent nodes Burglary Earthquake Alarm JohnCalls MaryCalls

Conditional Probability Tables Each node has a conditional probability table (CPT) that gives the probability of each of its values given every possible combination of values for its parents (conditioning case). Roots (sources) of the DAG that have no parents are given prior probabilities. P(B).001 Burglary Earthquake P(E).002 Alarm B E P(A) T T.95 T F.94 F T.29 F F.001 A P(J) T.90 F.05 JohnCalls MaryCalls A P(M) T.70 F.01

Joint Distributions for Bayes Nets A Bayesian Network implicitly defines a joint distribution. P( x, x,... x ) n P( x Parents( X 1 2 n i i i 1 Example P( J M A B E) P( J A) P( M A) P( A B E) P( B) P( E) 0.9 0.7 0.001 0.999 0.998 0.00062 ))

Naïve Bayes as a Bayes Net Naïve Bayes is a simple Bayes Net Y X 1 X 2 Xn Priors P(Y) and conditionals P(X i Y) for Naïve Bayes provide CPTs for the network.

Markov Networks Undirected graph over a set of random variables, where an edge represents a dependency. The Markov blanket of a node, X, in a Markov Net is the set of its neighbors in the graph (nodes that have an edge connecting to X). Every node in a Markov Net is conditionally independent of every other node given its Markov blanket.

Distribution for a Markov Network The distribution of a Markov net is most compactly described in terms of a set of potential functions (a.k.a. factors, compatibility functions), φ k, for each clique, k, in the graph. For each joint assignment of values to the variables in clique k, φ k assigns a non-negative real value that represents the compatibility of these values. The joint distribution of a Markov network is then defined by: 1 P( x1, x2,... x ) n k ( x{ k} ) Z k Where x {k} represents the joint assignment of the variables in clique k, and Z is a normalizing constant that makes a joint distribution that sums to 1. Z ) x k k ( x{ k}

Sample Markov Network B A 1 T T 100 T F 1 F T 1 F F 200 Burglary Earthquake E A 2 T T 50 T F 10 F T 1 F F 200 Alarm J A 3 T T 75 T F 10 F T 1 F F 200 JohnCalls MaryCalls M A 4 T T 50 T F 1 F T 10 F F 200

Logistic Regression as a Markov Net Logistic regression is a simple Markov Net potential functions φ k (x{k}) instead of conditional probability tables P(X i Y) Y X 1 X 2 Xn But only models the conditional distribution, P(Y X) and not the full joint P(X,Y) Same as a discriminatively trained naïve Bayes.

Generative vs. Discriminative Generative models and are not directly designed to maximize the performance of sequence labeling. They model the joint distribution P(O,Q). Generative NLP models are trained to have an accurate probabilistic model of the underlying language, and not all aspects of this model benefit the sequence labeling task. Discriminative models (CRFs) are specifically designed and trained to maximize performance of labeling, which leads to more accurate results. They model the conditional distribution P(Q O).

Classification Models Y X 1 X 2 Xn Generative Naïve Bayes Conditional Y Discriminative X 1 X 2 Xn Logistic Regression