comp4620/8620: Advanced Topics in AI Foundations of Artificial Intelligence



Similar documents
How To Understand Artificial Intelligence

One Decade of Universal Artificial Intelligence

How To Understand The Relation Between Simplicity And Probability In Computer Science

Space-Time Embedded Intelligence

Learning What to Value

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Basics of Statistical Machine Learning

Kolmogorov Complexity and the Incompressibility Method

Machine Learning: Overview

Knowledge-based systems and the need for learning

Real-world Limits to Algorithmic Intelligence

Reinforcement Learning and the Reward Engineering Principle

1 What is Machine Learning?

CHAPTER 2 Estimating Probabilities

1. Introduction. 2. An Agent-Environment Framework. Bill Hibbard

Learning is a very general term denoting the way in which agents:

Course 395: Machine Learning

One Decade of Universal Artificial Intelligence

Machine Learning.

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Course Outline Department of Computing Science Faculty of Science. COMP Applied Artificial Intelligence (3,1,0) Fall 2015

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Stiffie's On Line Scheduling Algorithm

Programming Using Python

Computing Near Optimal Strategies for Stochastic Investment Planning Problems

15-381: Artificial Intelligence. Probabilistic Reasoning and Inference

General Philosophy. Dr Peter Millican, Hertford College. Lecture 3: Induction

Properties of Stabilizing Computations

Machine Learning and Statistics: What s the Connection?

The Optimality of Naive Bayes

Teaching Introductory Artificial Intelligence with Pac-Man

Regular Languages and Finite Automata

Machine Learning Introduction

Reinforcement Learning

Lecture 1: Introduction to Reinforcement Learning

A Catalogue of the Steiner Triple Systems of Order 19

CSC384 Intro to Artificial Intelligence

cs171 HW 1 - Solutions

Course: Model, Learning, and Inference: Lecture 5

Decision Theory Rational prospecting

Bayesian Statistics: Indian Buffet Process

NP-Completeness and Cook s Theorem

Ronald Graham: Laying the Foundations of Online Optimization

So, how do you pronounce. Jilles Vreeken. Okay, now we can talk. So, what kind of data? binary. * multi-relational

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

The Basics of Graphical Models

Chapter 12 Discovering New Knowledge Data Mining

Introduction to Online Learning Theory

A Non-Linear Schema Theorem for Genetic Algorithms

Statistics Graduate Courses

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

A Learning Based Method for Super-Resolution of Low Resolution Images

Reading 13 : Finite State Automata and Regular Expressions

Supervised Learning (Big Data Analytics)

Review. Bayesianism and Reliability. Today s Class

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Free Human Memory Experiments For College Students

LZ77. Example 2.10: Let T = badadadabaab and assume d max and l max are large. phrase b a d adadab aa b

AI: A Modern Approach, Chpts. 3-4 Russell and Norvig

COMP 590: Artificial Intelligence

Notes on Complexity Theory Last updated: August, Lecture 1

Introduction to computer science

P vs NP problem in the field anthropology

Learning Agents: Introduction

Adaptive Online Gradient Descent

Introduction to Learning & Decision Trees

Web Data Mining: A Case Study. Abstract. Introduction


Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

A Brief Introduction to Property Testing

Bayesian Analysis for the Social Sciences

Lehrstuhl für Informatik 2

Introduction to Statistical Machine Learning

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

IAI : Expert Systems

About the Author. The Role of Artificial Intelligence in Software Engineering. Brief History of AI. Introduction 2/27/2013

6.2.8 Neural networks for data mining

Wald s Identity. by Jeffery Hein. Dartmouth College, Math 100

Reinforcement Learning

Notes from Week 1: Algorithms for sequential prediction

WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT?

Simulation-Based Security with Inexhaustible Interactive Turing Machines

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment

ISSUES IN RULE BASED KNOWLEDGE DISCOVERING PROCESS

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

Statistical Models in Data Mining

arxiv: v1 [math.pr] 5 Dec 2011

Algebraic Computation Models. Algebraic Computation Models

Prediction of Stock Performance Using Analytical Techniques

The Classes P and NP. mohamed@elwakil.net

6.852: Distributed Algorithms Fall, Class 2

Game Theory 1. Introduction

6.231 Dynamic Programming and Stochastic Control Fall 2008

Improving Knowledge-Based System Performance by Reordering Rule Sequences

What is Artificial Intelligence?

Applied Algorithm Design Lecture 5

On the Relationship between Classes P and NP

Transcription:

comp4620/8620: Advanced Topics in AI Foundations of Artificial Intelligence Marcus Hutter Australian National University Canberra, ACT, 0200, Australia http://www.hutter1.net/ ANU

Foundations of Artificial Intelligence - 15 - Marcus Hutter Table of Contents 1. A SHORT TOUR THROUGH THE COURSE 2. INFORMATION THEORY & KOLMOGOROV COMPLEXITY 3. BAYESIAN PROBABILITY THEORY 4. ALGORITHMIC PROBABILITY & UNIVERSAL INDUCTION 5. MINIMUM DESCRIPTION LENGTH 6. THE UNIVERSAL SIMILARITY METRIC 7. BAYESIAN SEQUENCE PREDICTION 8. UNIVERSAL RATIONAL AGENTS 9. THEORY OF RATIONAL AGENTS 10. APPROXIMATIONS & APPLICATIONS 11. DISCUSSION

A Short Tour Through the Course - 16 - Marcus Hutter 1 A SHORT TOUR THROUGH THE COURSE

A Short Tour Through the Course - 17 - Marcus Hutter Informal Definition of (Artificial) Intelligence Intelligence measures an agent s ability to achieve goals in a wide range of environments. [S. Legg and M. Hutter] Emergent: Features such as the ability to learn and adapt, or to understand, are implicit in the above definition as these capacities enable an agent to succeed in a wide range of environments. The science of Artificial Intelligence is concerned with the construction of intelligent systems/artifacts/agents and their analysis. What next? Substantiate all terms above: agent, ability, utility, goal, success, learn, adapt, environment,... Never trust a theory === if it is not supported by an experiment ===== experiment theory

A Short Tour Through the Course - 18 - Marcus Hutter Induction Prediction Decision Action Having or acquiring or learning or inducing a model of the environment an agent interacts with allows the agent to make predictions and utilize them in its decision process of finding a good next action. Induction infers general models from specific observations/facts/data, usually exhibiting regularities or properties or relations in the latter. Example Induction: Find a model of the world economy. Prediction: Use the model for predicting the future stock market. Decision: Decide whether to invest assets in stocks or bonds. Action: Trading large quantities of stocks influences the market.

A Short Tour Through the Course - 19 - Marcus Hutter Science Induction Occam s Razor Grue Emerald Paradox: Hypothesis 1: All emeralds are green. Hypothesis 2: All emeralds found till y2020 are green, thereafter all emeralds are blue. Which hypothesis is more plausible? H1! Justification? Occam s razor: take simplest hypothesis consistent with data. is the most important principle in machine learning and science. Problem: How to quantify simplicity? Beauty? Elegance? Description Length! [The Grue problem goes much deeper. This is only half of the story]

A Short Tour Through the Course - 20 - Marcus Hutter Information Theory & Kolmogorov Complexity Quantification/interpretation of Occam s razor: Shortest description of object is best explanation. Shortest program for a string on a Turing machine T leads to best extrapolation=prediction. K T (x) = min p {l(p) : T (p) = x} Prediction is best for a universal Turing machine U. Kolmogorov-complexity(x) = K(x) = K U (x) K T (x) + c T

A Short Tour Through the Course - 21 - Marcus Hutter Bayesian Probability Theory Given (1): Models P (D H i ) for probability of observing data D, when H i is true. Given (2): Prior probability over hypotheses P (H i ). Goal: Posterior probability P (H i D) of H i, after having seen data D. Solution: Bayes rule: P (H i D) = P (D H i) P (H i ) i P (D H i) P (H i ) (1) Models P (D H i ) usually easy to describe (objective probabilities) (2) But Bayesian prob. theory does not tell us how to choose the prior P (H i ) (subjective probabilities)

A Short Tour Through the Course - 22 - Marcus Hutter Algorithmic Probability Theory Epicurus: If more than one theory is consistent with the observations, keep all theories. uniform prior over all H i? Refinement with Occam s razor quantified in terms of Kolmogorov complexity: P (H i ) := 2 K T /U (H i ) Fixing T we have a complete theory for prediction. Problem: How to choose T. Choosing U we have a universal theory for prediction. Observation: Particular choice of U does not matter much. Problem: Incomputable.

A Short Tour Through the Course - 23 - Marcus Hutter Inductive Inference & Universal Forecasting Solomonoff combined Occam, Epicurus, Bayes, and Turing into one formal theory of sequential prediction. M(x) = probability that a universal Turing machine outputs x when provided with fair coin flips on the input tape. A posteriori probability of y given x is M(y x) = M(xy)/M(x). Given ẋ 1,.., ẋ t 1, the probability of x t is M(x t ẋ 1...ẋ t 1 ). Immediate applications : - Weather forecasting: x t {sun,rain}. - Stock-market prediction: x t {bear,bull}. - Continuing number sequences in an IQ test: x t N. Optimal universal inductive reasoning system!

A Short Tour Through the Course - 24 - Marcus Hutter The Minimum Description Length Principle Approximation of Solomonoff, since M is incomputable: M(x) 2 K U (x) (quite good) K U (x) K T (x) (very crude) Predict y of highest M(y x) is approximately same as MDL: Predict y of smallest K T (xy).

A Short Tour Through the Course - 25 - Marcus Hutter Application: Universal Clustering Question: When is object x similar to object y? Universal solution: x similar to y x can be easily (re)constructed from y K(x y) := min{l(p) : U(p, y) = x} is small. Universal Similarity: Symmetrize&normalize K(x y). Normalized compression distance: Approximate K K U by K T. Practice: For T choose (de)compressor like lzw or gzip or bzip(2). Multiple objects similarity matrix similarity tree. Applications: Completely automatic reconstruction (a) of the evolutionary tree of 24 mammals based on complete mtdna, and (b) of the classification tree of 52 languages based on the declaration of human rights and (c) many others. [Cilibrasi&Vitanyi 05]

A Short Tour Through the Course - 26 - Marcus Hutter Sequential Decision Theory Setup: For t = 1, 2, 3, 4,... Given sequence x 1, x 2,..., x t 1 (1) predict/make decision y t, (2) observe x t, (3) suffer loss Loss(x t, y t ), (4) t t + 1, goto (1) Goal: Minimize expected Loss. Greedy minimization of expected loss is optimal if: Important: Decision y t does not influence env. (future observations). Loss function is known. Problem: Expectation w.r.t. what? Solution: W.r.t. universal distribution M if true distr. is unknown.

A Short Tour Through the Course - 27 - Marcus Hutter Example: Weather Forecasting Observation x t X = {sunny, rainy} Decision y t Y = {umbrella, sunglasses} Loss sunny rainy umbrella 0.1 0.3 sunglasses 0.0 1.0 Taking umbrella/sunglasses does not influence future weather (ignoring butterfly effect)

A Short Tour Through the Course - 28 - Marcus Hutter Agent Model with Reward if actions/decisions a influence the environment q r 1 o 1 r 2 o 2 r 3 o 3 r 4 o 4 r 5 o 5 r 6 o 6... work Agent p tape... work Environment q tape... y 1 y 2 y 3 y 4 y 5 y 6...

A Short Tour Through the Course - 29 - Marcus Hutter Rational Agents in Known Environment Setup: Known deterministic or probabilistic environment Fields: AI planning & sequential decision theory & control theory Greedy maximization of reward r (= Loss) no longer optimal. Example: Chess Exploration versus exploitation problem. Agent has to be farsighted. Optimal solution: Maximize future (expected) reward sum, called value. Problem: Things drastically change if environment is unknown

A Short Tour Through the Course - 30 - Marcus Hutter Rational Agents in Unknown Environment Additional problem: (probabilistic) environment unknown. Fields: reinforcement learning and adaptive control theory Bayesian approach: Mixture distribution ξ. 1. What performance does Bayes-optimal policy imply? It does not necessarily imply self-optimization (Heaven&Hell example). 2. Computationally very hard problem. 3. Choice of horizon? Immortal agents are lazy. Universal Solomonoff mixture universal agent AIXI. Represents a formal (math., non-comp.) solution to the AI problem? Most (all AI?) problems are easily phrased within AIXI.

A Short Tour Through the Course - 31 - Marcus Hutter Computational Issues: Universal Search Levin search: Fastest algorithm for inversion and optimization problems. Theoretical application: Assume somebody found a non-constructive proof of P=NP, then Levin-search is a polynomial time algorithm for every NP (complete) problem. Practical (OOPS) applications (J. Schmidhuber) Mazes, towers of hanoi, robotics,... FastPrg: The asymptotically fastest and shortest algorithm for all well-defined problems. Computable Approximations of AIXI: AIXItl and AIξ and MC-AIXI-CTW and ΦMDP. Human Knowledge Compression Prize: (50 000 C=)

A Short Tour Through the Course - 32 - Marcus Hutter Monte-Carlo AIXI Applications without providing any domain knowledge, the same agent is able to self-adapt to a diverse range of interactive environments. Normalised Average Reward per Cycle 1 0.8 0.6 0.4 0.2 0 [VNH + 11] www.youtube.com/watch?v=yfsmhtmgdke Experience Optimal Cheese Maze Tiger 4x4 Grid TicTacToe Biased RPS Kuhn Poker Pacman 100 1000 10000 100000 1000000

A Short Tour Through the Course - 33 - Marcus Hutter Discussion at End of Course What has been achieved? Made assumptions. General and personal remarks. Open problems. Philosophical issues.

A Short Tour Through the Course - 34 - Marcus Hutter Exercises 1. [C10] What is the probability p that the sun will rise tomorrow, 2. [C15] Justify Laplace rule (p = n+1 n+2, where n= #days sun rose in past) 3. [C05] Predict sequences: 2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,? 3,1,4,1,5,9,2,6,5,3,?, 1,2,3,4,? 4. [C10] Argue in (1) and (3) for different continuations.

A Short Tour Through the Course - 35 - Marcus Hutter Introductory Literature [HMU06] J. E. Hopcroft, R. Motwani, and J. D. Ullman. Introduction to Automata Theory, Language, and Computation. Addison-Wesley, 3rd edition, 2006. [RN10] [LV08] [SB98] S. J. Russell and P. Norvig. Artificial Intelligence. A Modern Approach. Prentice-Hall, Englewood Cliffs, NJ, 3rd edition, 2010. M. Li and P. M. B. Vitányi. An Introduction to Kolmogorov Complexity and its Applications. Springer, Berlin, 3rd edition, 2008. R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998. [Leg08] S. Legg. Machine Super Intelligence. PhD Thesis, Lugano, 2008. [Hut05] M. Hutter. Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin, 2005. See http://www.hutter1.net/ai/introref.htm for more.