Selected Topics in Artificial Intelligence. Lectures presented at WWSI in October 2011
|
|
|
- Gwenda Harper
- 10 years ago
- Views:
Transcription
1 Selected Topics in Artificial Intelligence Lectures presented at WWSI in October
2 What is AI? Goal: Teaching machines to act intelligently Origins: Maths, logic, computer science, philosophy, psychology, cognitive science, biology,... History: Understanding intelligence is one of the oldest questions Turing introduced AI notions in his seminal work AI coined by John McCarthy in Dartmouth,
3 Course Overview AI Fundamentals 1. Characterizations of AI 3. Search in Problem Solving 2. Intelligent Agents 4. Knowledge Representation 5. Game Playing 3 Course materials:
4 Course Overview Cont. Machine Learning ( Every man has died, so we all will die ) 10. Overview of Machine Learning 11. Decision Tree Learning (optional) 12. Two Layer Artificial Neural Networks 13. Multi-Layer Artificial Neural Networks Evolutionary Algorithms (Breed your own populations and programs) 16. Genetic Algorithms 17. Genetic Programming 4 Course materials:
5 Schedule of classes Dzień tygodnia Data Typ zajęć Grupy Ilosc Godzin Czas trwania Oficjalne Dzwonki Aktualne Godziny Wykladowe piątek r. wykład MZ301 - MZ : :25-17:35 17:45-18:50 piątek r. konwersatorium MZ : :00-20:10 20:20-21:25 sobota r. wykład MZ301 - MZ : :05-9:15 9:25-10:30 sobota r. konwersatorium MZ : :40-11:50 12:00-13:05 niedziela r. wykład MZ301 - MZ : :05-9:15 9:25-10:30 niedziela r. konwersatorium MZ : :40-11:50 12:00-13:05 18:00-19:15 wtorek r. wykład MZ301 - MZ : :35-17:50 19:25-20:40 środa r. wykład MZ301 - MZ : :50 16:35-18:05 18:15-19:45 piątek r. konwersatorium MZ : :25-17:35 17:45-18:50 piątek r. konwersatorium MZ : :00-20:10 20:20-21:25 niedziela r. konwersatorium MZ : :45-14:55 15:05-16:10 RAZEM 36 5 Course materials:
6 Characterizations of AI Part 1 6
7 Overview of Characterizations Long term goals What do we want to achieve with AI? 1.2 Inspirations How do we get machines to act intelligently? 1.3 Methodology employed Hack away or theories 1.4 General tasks to achieve Reason, learn, discover, compete, communicate, Fine grained characterizations
8 1.1.1 Long Term Goals 8 Produce intelligent behaviour in machines: Why use computers at all? They can do things better than us Big calculations quickly and reliably We do intelligent things So get computers to do intelligent things Probabilistic Puzzle Would a computer program get this wrong?
9 Probabilistic Puzzle You're shown three doors and told: "Behind one is the big cash prize, behind the others is nothing. Please choose a door". After you choose a door, the host, who knows where the prize is, opens a door, behind which there is nothing. He then asks: "OK, do you want to change your mind and choose the other door? Should you choose the other door? 9
10 1.1.2 Long Term Goals 10 Understand human intelligence in society: Aid to philosophy, psychology, cognitive science Big question: what is intelligence? Smaller questions: language, attention, emotion Example: The ELIZA program Helped to study Rogerian psychotherapy Computation vs. intelligence How does society affect intelligence AI is used to look into social behaviour
11 The ELIZA Program 11 ELIZA was one of the earliest AI programs It implemented some notions of Rogerian psychotherapy ELIZA worked by linguistic translation of statements such as "I'm feeling ill" into "Are you really feeling ill" and statements such as "I'm sorry you are feeling ill It was remarkably successful, and did fool some people into believing they were conversing with a human psychotherapist.
12 1.1.3 Long Term Goals 12 Give birth to new life forms: Oldest question of all Meaning of life One approach: model life in silicon Create artificial life forms (ALife) Evolutionary algorithms (If it worked for life on planet earth ) Hope life will be an emergent property Can tame this for more utilitarian needs
13 1.1.4 Long Term Goals Add to scientific knowledge: Often ignored that AI produces big scientific questions Investigate intelligence, life, information Example: complexity of algorithms P = NP? Another example: What concepts can be learned by certain algorithms (computational learning theory) 13
14 1.2 Inspirations for AI 14 Major question: How are we going to get a machine to act intelligently to perform complex tasks? Use what we have available: Logic Introspection Brains Evolution Society
15 1.2.1 Inspirations for AI Logic: Studied intensively within mathematics Gives a handle on how to reason intelligently Example: automated reasoning Proving theorems using deduction Advantage of logic: We can be very precise (formal) about our programs Disadvantage of logic: Theoretically possible doesn t mean practically achievable Not necessarily applicable to modelling human intelligence 15
16 1.2.2 Inspirations for AI Introspection: Humans are intelligent, aren t they? Heuristics to improve performance Rules of thumb derived from perceived human behaviour Expert systems Implement the ways (rules) of the experts Example: MYCIN (blood disease diagnosis) Performed better than junior doctors Introspection can be misleading 16
17 1.2.3 Inspirations for AI 17 Brains: Our brains and senses are what give us intelligence Neurologists tell us that our brain consists of: Networks of billions of neurons Build artificial neural networks In hardware and software (mostly software now) Build neural structures Interactions of layers of neural networks
18 1.2.4 Inspirations for AI Evolution Our brains evolved through natural selection So, simulate the evolutionary process Simulate genes, mutation, inheritance, fitness, etc. Genetic algorithms and genetic programming Used in machine learning (induction) Used in Artificial Life simulation 18
19 1.2.5 Inspirations for AI 19 Evolution on Earth: We evolved in a dynamic environment Moving around and avoiding objects More intelligent than playing chess AI should be embedded in robotics Sensors (vision, etc.), locomotion, planning Hope that intelligent behaviour emerges Behaviour based robotics Start with insect like behaviour
20 1.2.6 Inspirations for AI 20 Society: Humans interact to achieve tasks requiring intelligence Can draw on group/crowd psychology Software should therefore Cooperate and compete to achieve tasks Multi-agent systems Split tasks into sub-tasks Autonomous agents interact to achieve goals
21 1.2.7 Inspirations for AI 21 Computer science Computers and operating systems got very fast Allows us to write intelligent programs In bad ways: using brute force Doing massive searches Rather than reasoning intelligently Example: computer chess Some people say that this isn t AI Drew McDermott disagrees
22 The Brains in Bahrain 22 In 2002 Deep Fritz took on Vladimir Kramnik the world (human) number one, in a competition which the organizers nicknamed "the Brains in Bahrain The match was drawn 4-4 and Kramnik gained a lot of respect for the program To quote the organizers: "Well, the last time an opponent escaped from Kramnik with a 21- move draw with the black pieces, it was Garry Kasparov!"
23 1.3 Methodologies Neat approach Ground programs in mathematical rigour Use logic and possibly prove things about programs Scruffy approach Write programs and test empirically See which methods work computationally Smart casual : use both approaches See AI as an empirical science and technology Needs theoretical development and testing 23
24 1.4 General Tasks 24 AI is often presented as A set of problem solving techniques Most tasks can be shoe-horned into a problem specification Some problems attacked with AI techniques: Getting a program to reason rationally Getting a program to learn and discover Getting a program to compete Getting a program to communicate Getting a program to exhibit signs of life Getting a robot to move about in the real world
25 1.5 Generic Techniques 25 Automated Reasoning Resolution, proof planning, Davis-Putnam, CSPs Machine Learning Neural nets, ILP, decision tree learning Natural language processing N-grams, parsing, grammar learning Robotics Planning, edge detection, cell decomposition Evolutionary approaches Crossover, mutation, selection
26 1.6 Representation/Languages AI catchphrase representation, representation, representation Some general schemes Predicate logic, higher order logic Frames, production rules Semantic networks, neural nets, Bayesian nets Some AI languages developed Prolog, LISP, ML (Perl, C++, Java, etc. also very much used) 26
27 1.7 Application Areas Applications which AI has been used for: Art, astronomy, bioinformatics, engineering, Finance, fraud detection, law, mathematics, Military, music, story writing, telecommunications Transportation, tutoring, video games, web search House chores, and many more AI takes as much as it gives to domains AI is not a slave to other applications It benefits from input from other domains 27
28 The EQP Theorem Prover Solved a tricky mathematical problem known as the Robbins Algebra problem in 1996 The problem had remained unresolved since the 1930s, even though various mathematicians had attempted to solve the problem. After a run lasting around 8 days, a fairly succinct solution was found. 28
29 1.8 Final Products 29 Some AI programs/robots are well developed Example software: Otter, Prover9 (theorem provers, successors to EQP) Progol (machine learning) LambdaClam (proof planner) C4.5 (decision tree learner) HR (automated theory formation) Example hardware: Rodney Brooks vacuum cleaner (Roomba) Museum tour guide
30 Search in Problem Solving Part 3 30
31 Problem Solving Agents 31 Looking to satisfy some goal Want environment to be in particular state Have a number of possible actions An action changes environment What sequence of actions reaches the goal? Many possible sequences Agent must search through sequences
32 Examples of Search Problems 32 Chess Each turn, search moves for win Route finding Search for best routes Theorem proving Search chains of reasoning for proof Machine learning Search through concepts for one which achieves target categorisation
33 Search Terminology States: places the search can visit Search space: the set of possible states Search path Sequence of states the agent actually visits Solution or a Goal State A state which solves the given problem o Either known or has a checkable property May be more than one Strategy How to choose the next state in the path at any given state 33
34 Specifying a Search Problem Initial state Where the search starts Operators Function taking one state to another state How the agent moves around search space Goal test How the agent knows if solution state found Search strategies apply operators to chosen states 34
35 Example: Chess Initial state (right) Operators Moving pieces Goal test Checkmate o Can the king move without being taken? 35
36 Example: Route Finding Initial state City journey starts in Operators Driving from city to city Goal test Is current location the destination city? Liverpool Leeds Nottingham Manchester Birmingham London 36
37 General Search Considerations 1. Artifact or Path? 37 Interested in solution only, or path which got there? Route finding Known destination, must find the route (path) Anagram puzzle Doesn t matter how you find the word Only the word itself (artifact) is important Machine learning Usually only the concept (artifact) is important Theorem proving The proof is a sequence (path) of reasoning steps
38 General Search Considerations 2. Completeness 38 Task may require one, many or all solutions E.g. how many different ways to get from A to B? Complete search space contains all solutions Exhaustive search explores entire space (if finite) Complete search strategy will find a solution if one exists Pruning rules out certain operators in certain states Space still complete if no solutions pruned Strategy still complete if not all solutions pruned
39 General Search Considerations 3. Soundness A sound search contains only correct solutions An unsound search contains incorrect solutions Caused by unsound operators or goal check Dangers Produce incorrect solution to a problem In particular, find solutions to problems with no solutions o find a route to an unreachable destination o prove a theorem which is actually false 39
40 General Search Considerations 4. Time & Space Tradeoffs Fast programs can be written But they often use up too much memory Memory efficient programs can be written But they are often slow Different search strategies have different memory/speed tradeoffs 40
41 General Search Considerations 5. Additional Information 41 Given initial state, operators and goal test Can you give the agent additional info? Uninformed (Brute Force or Blind) search strategies Have no additional information Informed (Heuristic) search strategies Uses problem specific information Heuristic measure (guess how far from goal)
42 Graph and Agenda Analogies Graph Analogy States are nodes in graph, operators are edges Expanding a node adds edges to new states Strategy chooses which node to expand next Agenda Analogy New states are put onto an agenda (a list) Top of the agenda is explored next o Apply operators to generate new states Strategy chooses where to put new states on agenda 42
43 Example Search Problem A genetics professor Wants to name her new baby boy Using only the letters D,N & A Search through possible strings (states) D,DN,DNNA,NA,AND,DNAN, etc. 3 operators: add D, N or A onto end of string Initial state is an empty string Goal test Look up state in a book of boys names, e.g. DAN 43
44 Uninformed Search Strategies Breadth-first search Depth-first search Iterative deepening search (IDS) Bidirectional search Uniform-cost search 44
45 Breadth-First Search Every time a new state is reached New states are put on the bottom of the agenda When state NA is reached New states NAD, NAN, NAA added to bottom These get explored later (possibly much later) Graph analogy Each node of depth d is fully expanded before any node of depth d+1 is looked at 45
46 Breadth-First Search 46 Branching rate Average number of edges coming from a node (3 above) Uniform Search Every node has same number of branches (as above)
47 Depth-First Search As in breadth-first search But new states are put at the top of agenda Graph analogy Expand deepest and leftmost node next But search can go on indefinitely down one path D, DD, DDD, DDDD, DDDDD, One solution to impose a depth limit on the search Sometimes the limit is not required Branches end naturally (i.e. cannot be expanded) 47
48 48 Depth-First Search (Depth Limit 4)
49 Depth vs. Breadth First Search Suppose branching rate is b Breadth-first Complete (guaranteed to find solution) Requires a lot of memory o At depth d needs to remember up to b d-1 states Depth-first Not complete because of indefinite paths or depth limit But is memory efficient o Only needs to remember up to b*d states 49
50 Iterative Deepening Search (IDS) 50 Idea: do repeated depth first searches Increasing the depth limit by one every time DFS to depth 1, DFS to depth 2, etc. Completely re-do the previous search each time Most DFS effort is in expanding last line of the tree e.g. to depth five, branching rate of 10 o DFS: 111,111 states, IDS: 123,456 states o Repetition of only 11% Combines best of BFS and DFS Complete and memory efficient But slower than either
51 Bidirectional Search If you know the solution state Work forwards and backwards Look to meet in middle Only need to go to half depth Difficulties Do you really know solution? Unique? Must be able to reverse operators Record all paths to check they meet Memory intensive Liverpool Leeds Nottingham Manchester Birmingham Peterborough 51 London
52 Action and Path Costs Action cost Particular value associated with an action Examples Distance in route planning Power consumption in circuit board construction Path cost Sum of all the action costs in the path If action cost = 1 (always), then path cost = path length 52
53 Uniform-Cost Search Breadth-first search Guaranteed to find the shortest path to a solution Not necessarily the least costly path Uniform path cost search Choose to expand node with the least path cost Guaranteed to find a solution with least cost If we know that path cost increases with path length This method is optimal and complete But can be very slow 53
54 Informed (Heuristic) Search Strategies Greedy search A* search IDA* search Hill climbing Simulated annealing 54
55 Best-First Search Evaluation function f gives cost for each state Choose state with smallest value ( the best ) Agenda: f decides where new states are put Graph: f decides which node to expand next Many different strategies depending on f For uniform-cost search f = path cost Informed search strategy defines f based on heuristic function 55
56 Heuristic Functions 56 Estimate of path cost h Liverpool From state to nearest solution h(state) >= 0 h(solution) = 0 Strategies can use this information Example: straight line distance As the crow flies in route finding Where does h come from? Maths, introspection, inspection or programs (e.g. ABSOLVER) Leeds Nottingham 75 Peterborough 120 London
57 Greedy Search 57 Always take the biggest bite f(state) = h(state) Choose smallest estimated cost to solution Ignores the path cost Blind alley effect: early estimates very misleading One solution: delay the use of greedy search Not guaranteed to find optimal solution Remember we are estimating the path cost
58 Shortest Path from A to F (Greedy) F 15 E Node # Path Heuristics h C AC 15 B AB D 10 B 5 13 C 10 A 58
59 A* Search 59 Path cost is g and heuristic function is h f(state) = g(state) + h(state) Choose smallest overall path cost (known + estimate) Combines uniform-cost and greedy search Can prove that A* is complete and optimal But only if h is admissible, i.e. underestimates the true path cost from state to solution
60 A* Example: Route Finding First states to try: B and C F 15 E f(s) = path cost from A + straight line distance from S to F, i.e., sum of solid & dotted distances: D B 15 f(b) = = 23 f(c) = = Hence we expand on B C 10 A 60
61 Shortest Path from A to F (A* Search) F 15 E Node # Path Path Cost g Heuris tics h Cost f=g+h C AC B AB D 10 B 5 13 C 10 A 61
62 IDA* Search Problem with A* search You have to record all the nodes In case you have to back up from a dead-end A* searches often run out of memory, not time Use the same iterative deepening trick as IDS But iterate over f(state) rather than depth Define contours: f < 100, f < 200, f < 300 etc. Complete & optimal as A*, but less memory 62
63 IDA* Search: Contours Find all nodes Where f(n) < 100 Ignore f(n) >= 100 Find all nodes Where f(n) < 200 Ignore f(n) >= 200 And so on 63
64 Hill Climbing & Gradient Descent For artifact-only problems (don t care about the path) Depends on some evaluation e(state), e.g. elevation Hill climbing tries to maximise score e Gradient descent tries to minimize cost e (the same strategy!) Randomly choose a state Only choose actions which improve e If cannot improve e, then perform a random restart o Choose another random state to restart the search from Only ever have to store one state (the present one) Can t have cycles as e always improves 64
65 Example: 8 Queens 65 Place 8 queens on board So no one can take another Gradient descent search Throw queens on randomly e = number of pairs which can attack each other Move a queen out of other s way o Decrease the evaluation function If this can t be done o Throw queens on randomly again
66 Simulated Annealing Hill climbing can find local maxima/minima o C is local max, G is global max o E is local min, A is global min Search must go wrong way to proceed! 66 Simulated annealing Pick a random action If action improves e then go with it If not, choose with probability based on how bad it is o Can go the wrong way o Effectively rules out really bad moves
67 Search Strategies Uninformed (Blind) Breadth-first search Depth-first search Iterative deepening (IDS) Bidirectional search Uniform-cost search Informed (Heuristic) Greedy search A* search IDA* search Hill climbing Simulated annealing 67
68 Intelligent Agents Part 2 68
69 Agents An agent is anything that can be viewed as perceiving its environment through sensors and acting upon the environment through effectors This definition includes: Robots, humans, programs Of particular interest: rational (intelligent) agents 69
70 Examples of Agents Humans/Animals Programs Robots senses keyboard, mouse, dataset cameras, pads body parts monitor, speakers, files motors, limbs 70
71 Rational Agents A rational (intelligent) agent is one that does the right thing Need to be able to assess agent s performance Should be independent of internal measures Ask yourself: has the agent acted rationally? Not just dependent on how well it does at a task First consideration: evaluation of rationality 71
72 Thought Experiment: Al Capone Convicted for tax evasion Were the police acting rationally? We must assess an agent s rationality in terms of: Task it is meant to undertake (Convict guilty/remove crims) Experience from the world (Capone guilty, no evidence) Its knowledge of the world (Cannot convict for murder) Actions available to it (Convict for tax, try for murder) Possible to conclude Police were acting rationally 72
73 Autonomy in Agents The autonomy of an agent is the extent to which its behaviour is controlled by its owner 73 Extremes No autonomy no freedom to respond to the perceived environment Complete autonomy can act without any constraints Example: baby learning to crawl Ideal: design agents to have some autonomy Possibly to become more autonomous in time
74 Running Example The RHINO Robot - Museum Tour Guide 74 Museum guide in Bonn Two tasks to perform Guided tour around exhibits Provide info on each exhibit Very successful 18.6 kilometres 47 hours 50% attendance increase 1 tiny mistake (no injuries)
75 Internal Structure Second Set of Considerations: Architecture and Program Knowledge of the Environment Reflexes Goals Utility Functions 75
76 Architecture and Program Program Method of turning environmental input into actions Architecture Hardware/software (OS etc.) on which agent s program runs RHINO s architecture: Sensors (infrared, sonar, laser) Processors (3 onboard, 3 more by wireless Ethernet) RHINO s program: Low level: probabilistic reasoning, vision, High level: problem solving, planning (first order logic) 76
77 Knowledge of Environment 77 Knowledge of Environment (World) Different from sensory information from environment World knowledge can be (pre)-programmed in Can also be updated/inferred by sensory information Choice of actions informed by knowledge of... Current state of the world Previous states of the world How its actions change the world Example: Chess agent World knowledge is the board state (all the pieces) Sensory information is the opponents move Its moves also change the board state
78 RHINO s Environment Knowledge Programmed knowledge Layout of the Museum o Doors, exhibits, restricted areas Sensed knowledge People and objects (chairs) moving Affect of actions on the World Nothing moved by RHINO explicitly But, people followed it around (moving people) 78
79 Reflexes Action on the world In response only to a sensor input Not in response to world knowledge Humans flinching, blinking Chess openings, endings Lookup table (not a good idea in general) o entries required for the entire game RHINO: no reflexes? Dangerous, because people get everywhere 79
80 Goals 80 Always need to think hard about What the goal of an agent is Does agent have internal knowledge about goal? Obviously not the goal itself, but some properties Goal based agents Uses knowledge about a goal to guide its actions E.g., Search, planning RHINO Goal: get from one exhibit to another Knowledge about the goal: its whereabouts Need this to guide its actions (movements)
81 Utility Functions Knowledge of a goal may be difficult to pin down For example, checkmate in chess But some agents have localised measures Utility functions measure value of world states Choose action which best improves utility (rational!) In search, this is Best First RHINO: various utilities to guide search for route Main one: distance from the target exhibit Density of people along path 81
82 Additional Features Third set of considerations: Accessibility Determinism Episodes Dynamic/Static Discrete/Continuous 82
83 Accessibility of Environment 83 Is everything an agent requires to choose its actions available to it via sensors? If so, the environment is fully accessible If not, parts of the environment are inaccessible Agent must make informed guesses about world RHINO: Invisible objects which couldn t be sensed Including glass cases and bars at particular heights Software adapted to take this into account
84 Determinism in the Environment Does the change in world state Depend only on current state and agent s action? Non-deterministic environments Have aspects beyond the control of the agent Utility functions have to guess at changes in world Robot in a maze: deterministic Whatever it does, the maze remains the same RHINO: non-deterministic People moved chairs to block its path 84
85 Episodic Environments Is the choice of current action Dependent on previous actions? If not, then the environment is episodic In non-episodic environments: Agent has to plan ahead: RHINO: o Current choice will affect future actions Short term goal is episodic o Getting to an exhibit does not depend on how it got to current one Long term goal is non-episodic o Tour guide, so cannot return to an exhibit on a tour 85
86 Static or Dynamic Environments Static environments don t change While the agent is deliberating over what to do Dynamic environments do change So agent should/could consult the world when choosing actions Alternatively: anticipate the change during deliberation Alternatively: make decision very fast RHINO: Fast decision making (planning route) But people are very quick on their feet 86
87 Discrete or Continuous Environments Nature of sensor readings / choices of action Sweep through a range of values (continuous) Limited to a distinct, clearly defined set (discrete) Maths in programs altered by type of data Chess: discrete RHINO: continuous Visual data can be considered continuous Choice of actions (directions) also continuous 87
88 RHINO s Solution to Environmental Problems Museum environment: Inaccessible Non-episodic Non-deterministic Dynamic Continuous RHINO constantly updates plan as it moves Solves these problems very well Necessary design given the environment 88
89 Summary 89 Think about these in design of agents: How to test whether agent is acting rationally Usual systems engineering stuff Autonomous Rational Agent Internal structure of agent Specifics of the environment
90 Knowledge Representation Part 4 90
91 Representation AI agents deal with knowledge (data) Facts (believed & observed knowledge) Procedures (how-to knowledge) Meaning (relate & define knowledge) Right representation is crucial Early realisation in AI Wrong choice can lead to project failure Active research area 91
92 Choosing a Representation For certain problem solving techniques Best representation already known Often a requirement of the technique Or a requirement of the programming language (e.g. Prolog) Examples First order theorem proving first order logic Inductive logic programming logic programs Neural networks learning neural networks Some general representation schemes Suitable for many different (and new) AI applications 92
93 Some General Representations 1. Logical Representations 2. Production Rules 3. Semantic Networks 4. Conceptual graphs 5. Frames 93
94 What is a Logic? 94 A language with concrete rules No ambiguity in representation (may be other errors!) Allows unambiguous communication and processing Very unlike natural languages e.g. English Many ways to translate between languages A statement can be represented in different logics And perhaps differently in the same logic Expressiveness of a logic How much can we say in this language? Not to be confused with logical reasoning Logics are languages, reasoning is a process (may use logic)
95 Syntax and Semantics Syntax Rules for constructing legal sentences in the logic Which symbols we can use (English: letters, punctuation) How we are allowed to combine symbols Semantics How we interpret (read) sentences in the logic Assigns a meaning to each sentence Example: All lecturers are seven feet tall A valid sentence (syntax) And we can understand the meaning (semantics) This sentence happens to be false (there is a counterexample) 95
96 Propositional Logic Syntax Propositions, e.g. it is raining Connectives: and, or, not, implies, iff (equivalent) Brackets, T (true) and F (false) Semantics (Classical aka. Boolean) Define how connectives affect truth o P and Q is true if and only if P is true and Q is true Use truth tables to work out the truth of statements 96
97 Predicate Logic 97 Propositional logic combines atoms An atom contains no propositional connectives Have no structure (today_is_raining, john_likes_apples) Predicates allow us to talk about objects Properties: is_raining(today) Relations: likes(john, apples) True or false In predicate logic each atom is a predicate e.g. first order logic, higher-order logic
98 First Order Logic 98 More expressive logic than propositional Used in this course (Lecture 6 on representation in FOL) Constants are objects: john, apples Predicates are properties and relations: likes(john, apples) Functions transform objects: likes(john, fruit_of(apple_tree)) Variables represent any object: likes(x, apples) Quantifiers qualify values of variables True for all objects (Universal): X. likes(x, apples) Exists at least one object (Existential): X. likes(x, apples)
99 Example: FOL Sentence Every rose has a thorn For all X if (X is a rose) then there exists Y o (X has Y) and (Y is a thorn) 99
100 Example: FOL Sentence On Mondays and Wednesdays I go to John s house for dinner 100 Note the change from and to or Translation is problematic
101 Higher Order Logic More expressive than first order Functions and predicates are also objects Described by predicates: binary(addition) Transformed by functions: differentiate(square) Can quantify over both E.g. define red functions as having zero at 17 Much harder to reason with 101
102 Logic is a Good Representation Fairly easy to do the translation when possible Branches of mathematics devoted to it It enables us to do logical reasoning Tools and techniques come for free Basis for programming languages Prolog uses logic programs (a subset of FOL) λprolog based on HOL Often requires non-standard semantics 102
103 Other Representations Logic representations have restrictions and can be hard to work with Many AI researchers have been searching for better representations Production rules Semantic networks Conceptual graphs Frames 103 Yet they all can be translated into logic...
104 Production Rules Rule set of <condition,action> pairs if condition then action Match-resolve-act cycle Match: Agent checks which rule s condition holds Resolve: o Multiple production rules may hold at once (conflict set) o Agent must choose rule from set (conflict resolution) Act: If so, rule fires and the action is carried out Working memory: rule can write knowledge to working memory knowledge may match and fire other rules 104
105 Production Rules Example IF (at bus stop AND bus arrives) THEN action(get on the bus) IF (on bus AND not paid AND have card) THEN action(pay with card) AND add(paid) IF (on bus AND paid AND empty seat) THEN sit down Conditions and actions must be clearly defined 105
106 Graphical Representation Humans draw diagrams all the time, e.g. Causal relationships And relationships between ideas 106
107 Graphical Representation: Semantic Networks Graphs are easy to store in a computer 107
108 Semantic Networks To be of any use must impose a formalism Formalism imposes restricted syntax 108
109 Semantic Networks Graphical representation (a graph) Links indicate subset, member, relation,... Equivalent to logical statements (usually FOL) Easier to understand than FOL Specialised SN reasoning algorithms can be faster Example: natural language understanding Sentences with same meaning have same graphs E.g. Conceptual Dependency Theory (Schank) 109
110 Conceptual Graphs Semantic network where each graph represents a single proposition Concept nodes can be Concrete (visualisable) such as restaurant, my dog Spot Abstract (not easily visualisable) such as anger Edges do not have labels Instead, conceptual relation nodes Easy to represent relations between multiple objects 110
111 Frame Representations 111 Semantic networks where nodes have structure Frame with a number of slots (age, height,...) Each slot stores specific item of information When agent faces a new situation Slots can be filled in (value may be another frame) Filling in may trigger actions May trigger retrieval of other frames Inheritance of properties between frames Very similar to objects in OOP
112 112 Example: Frame Representation
113 Flexibility in Frames Slots in a frame can contain Information for choosing a frame in a situation Relationships between this and other frames Procedures to carry out after various slots filled Default information to use where input is missing Blank slots: left blank unless required for a task Other frames, which gives a hierarchy Can also be expressed in first order logic 113
114 Representation & Logic AI wanted non-logical representations Production rules Semantic networks o Conceptual graphs, frames But all can be expressed in first order logic! Best of both worlds Logical reading ensures representation well-defined Representations specialised for applications Can make reasoning easier, more intuitive 114
115 Game Playing Part 5 115
116 Two Player Games Competitive rather than cooperative One player loses, one player wins Zero sum game One player wins what the other one loses See game theory in mathematics Getting an agent to play a game Boils down to how it plays each move Express this as a search problem o Cannot backtrack once a move has been made (episodic) 116
117 Basis of Game Playing: Search for Best Move Search for Move 1 Opponent Moves Initial Board State 1 Board State 2 Board State 3 Search for Move 3 Opponent Moves 117 Board State 4 Board State 5
118 Lookahead Search If I played this move Then they might play that move o Then I could do that move And they would probably do that move Or they might play that move o Then I could do that move And they would play that move o Or I could play that move And they would do that move If I played this move 118
119 Lookahead Search (best moves) If I played this move Then their best move would be o Then my best move would be Then their best move would be Or another good move for them is o Then my best move would be Etc. 119
120 Minimax Search 120 Like children sharing a cake Underlying assumption Opponent acts rationally Each player moves in such a way as to Maximise their final winnings, minimise their losses i.e., play the best move at the time Method: Calculate the guaranteed final scores for each move o Assuming the opponent will try to minimise that score Choose move that maximises this guaranteed score
121 Example Trivial Game Deal four playing cards out, face up Player 1 chooses one, player 2 chooses one Player 1 chooses another, player 2 chooses another And the winner is. Add the cards up The player with the highest even score wins 121
122 For Trivial Games Draw the entire search space Put the scores associated with each final board state at the ends of the paths (tree leaves) Move the scores up from the ends of the paths to the starts of the paths Whenever there is a choice use minimax assumption This guarantees the scores you can get Choose the path with the best score at the top Take the first move on this path as the next move 122
123 Entire Search Space Computing Minimal Guaranteed Scores for Player 1 123
124 Computing Minimal Guaranteed Scores for Player 1 Moving scores from bottom to top 124
125 Computing Minimal Guaranteed Scores for Player 1 Moving score when there s a choice 125
126 After Computing Minimal Guaranteed Scores (for Player 1), He Tries to Maximize them while Player 2 is Trying to Minimize them Choosing the best move
127 For Real Games 127 Search space is too large So we cannot draw (search) the entire space For example: chess has branching factor of ~35 Suppose our agent searches 10,000 board states per second and has a time limit of 150 seconds = 2.5 min. o So it can search up to 1,500,000 positions per move This amounts to only four ply look ahead o Because 35 4 = 1,500,625 Five ply search would require 1 hr. and 27 min.! Average humans can look ahead six to eight ply
128 Cutoff Search Must use a heuristic search Use an evaluation function Estimate the guaranteed score from a board state Draw search space to a certain depth Depth chosen to limit the time taken Put the estimated values at the end of paths Propagate them to the top as before 128
129 Evaluation Functions Must be able to differentiate between Good and bad board states Exact values not important The function should return the true score for goal states Example in chess Weighted linear function Weights: Pawn = Pionek Bishop = Laufer Rook = Wieza o Pawn=1, knight=bishop=3, rook=5, queen=9,
130 Example Chess Score 130 Black has: 5 pawns, 1 bishop, 2 rooks Score = 1*(5)+3*(1)+5*(2) = = 18 White has: 5 pawns, 1 rook Score = 1*(5)+5*(1) = = 10 Overall scores for this board state: black = = 8 white = = -8
131 Evaluation Function for our Game Evaluation after Player 1 s move of 5 Count zero if it s odd, take the number if its even Evaluation function here would choose 10 But this would be disastrous for Player 2 131
132 Problems with Evaluation Functions 132 Horizon problem Agent cannot see far enough into search space o Potentially disastrous board position after seemingly good one Possible solution Reduce the number of initial moves to look at o Allows you to look further into the search space Non-quiescent search Exhibits big swings in the evaluation function E.g., when taking pieces in chess Solution: advance search past non-quiescent part
133 Pruning Want to visit as many board states as possible Want to avoid whole branches (prune them) o Because they can t possibly lead to a good score Example: having your queen taken in chess o (Queen sacrifices often very good tactic, though) Alpha-beta pruning Can be used for entire search or cutoff search Recognize that a branch cannot produce better score o Than a node you have already evaluated 133
134 Example of Alpha-Beta Pruning player 1 player 2 Depth first search a good idea here 134
135 Alpha-Beta Pruning for Player 1 1. Given a node N which can be chosen by player one, then if there is another node, X, along any path, such that (a) X can be chosen by player two (b) X is on a higher level than N and (c) X has been shown to guarantee a worse score for player one than N, then the parent of N can be pruned Given a node N which can be chosen by player two, then if there is a node X along any path such that (a) player one can choose X (b) X is on a higher level than N and (c) X has been shown to guarantee a better score for player one than N, then the parent of N can be pruned.
136 Games with Chance Many more interesting games Have an element of chance Brought in by throwing a die, tossing a coin Example: backgammon See Gerry Tesauro s TD-Gammon program In these cases We can no longer calculate guaranteed scores We can only calculate expected scores o Using probability to guide us 136
137 Games Played by Computer 137 Games played perfectly: Connect four, noughts & crosses (tic-tac-toe) Best move pre-calculated for each board state o Small number of possible board states Games played well: Chess, draughts (checkers), backgammon Scrabble, tetris (using ANNs) Games played badly: Go, bridge, soccer
138 Philosophical Questions 138 Question. Is computer chess playing less sophisticated than human play? In science, simple and effective techniques are valued Minimax cutoff search is simple and effective But this is seen by some as stupid and non-ai Drew McDermott: "Saying Deep Blue doesn't really think about chess is like saying an airplane doesn't really fly because it doesn't flap its wings
139 Machine Learning Overview Part
140 Inductive Reasoning 140 Learning in humans consists of (at least): memorisation, comprehension, learning from examples Learning from examples Square numbers: 1, 4, 9,16 1 = 1 * 1; 4 = 2 * 2; 9 = 3 * 3; 16 = 4 * 4; What is next in the series? We can learn this by example quite easily Machine learning is largely dominated by Learning from examples Inductive reasoning Induce a pattern (hypothesis) from a set of examples o This is an unsound procedure (unlike deduction)
141 Machine Learning Tasks 141 Categorisation and Prediction Learn why certain objects are categorised a certain way E.g., why are dogs, cats and humans mammals, but trout, mackerel and tuna are fish? o Learn attributes of members of each category from background information, in this case: skin covering, eggs, Learn how to predict how to categorise unseen objects E.g., given examples of financial stocks and a categorisation of them into safe and unsafe stocks o Learn how to predict whether a new stock will be safe
142 Potential for Machine Learning Agents can learn these from examples: which chemicals are toxic (biochemistry) which patients have a disease (medicine) which substructures proteins have (bioinformatics) what the grammar of a language is (natural language) which stocks and shares are about to drop (finance) which vehicles are tanks (military) which style a composition belongs to (music) 142
143 Performing Machine Learning Specify your problem as a learning task Choose the representation scheme Choose the learning method Apply the learning method Assess the results and the method 143
144 Constituents of Learning Problems 1. The example set 2. The background concepts 3. The background axioms 4. Errors in the data 144
145 Problem constituents: 1. The Example Set 145 Learning from examples Express as a concept learning problem Whereby the concept solves the categorisation problem Usually need to supply pairs (E, C) Where E is an example, C is a category Positives: (E,C) where C is the correct category for E Negatives: (E,C) where C is an incorrect category for E Techniques which don t need negatives Can learn from positives only Questions about examples: How many does the technique need to perform the task? Do we need both positive and negative examples?
146 Example: Positives and Negatives Problem: learn reasons for animal taxonomy Into mammals, fish, reptile and bird Positives: (cat=mammal); (dog=mammal); (trout=fish); (eagle=bird); (crocodile=reptile); Negatives: (condor=fish); (mouse=bird); (trout=mammal); (platypus=bird); (human=reptile) 146
147 Problem Constituents: 2. Background Concepts 147 Concepts which describe the examples (Some of) which will be found in the solution to the problem Some concepts are required to specify examples Example: pixel data for handwriting recognition (later) Cannot say what the example is without this Some concepts are attributes of examples (functions) number_of_legs(human) = 2; covering(trout) = scales Some concepts specify binary categorisations: is_homeothermic(human); lays_eggs(trout); Questions about background concepts Which will be most useful in the solution? o Which can be discarded without worry? Which are binary, which are functions?
148 Problem Constituents: 3. Background Axioms 148 Similar to axioms in automated reasoning Specify relationships between pairs of background concepts Example: has_legs(x) = 4 covering(x) = (hair or scales) Can be used in the search mechanism to speed up the search Questions about background axioms: Are they correct? Are they useful for the search or superfluous?
149 Problem Constituents: 4. Errors in the Data In real world examples errors are many and varied: Incorrect categorisations: E.g., (platypus=bird) given as a positive example Missing data E.g., no skin covering attribute for falcon Incorrect background information E.g, is_homoeothermic(lizard) Repeated data E.g., two different values for the same function and input covering(platypus)=feathers & covering(platypus)=fur 149
150 Example (Toy) Problem: Michalski Train Spotting 150 Question: Why are the LH trains going eastwards? What are the positives/negatives? What are the background concepts? What is the solution? Toy problem (IQ test problem) No errors & a single perfect solution
151 Another Example: Handwriting Recognition Positive: This is a letter S: Negative: This is a letter Z: Background concepts: Pixel information Categorisations: (Matrix, Letter) pairs Both positive & negative Task Correctly categorise an unseen example into 1 of 26 categories 151
152 Constituents of Methods 1. The representation scheme 2. The search method 3. The method for choosing from rival solutions 152
153 Method Constituents: 1. Representation Must choose how to represent the solution 153 Very important decision Three assessment methods for solutions Predictive accuracy (how good it is as the task) o Can we use black box methods (accurate but incomprehensible)? Comprehensibility (how well we understand it) o May trade off some accuracy for comprehensibility Utility (problem-specific measures of worth) o Might override both accuracy and comprehensibility o Example: drug design (must be able to synthesise them)
154 Example Representations Inductive logic programs Decision trees Neural networks Genetic populations Hidden Markov Models Bayesians Networks Support Vector Machines 154
155 Method Constituents: 2. Search 155 Some techniques don t really search Example: neural networks Other techniques do perform search Example: inductive logic programming Can specify search as before Search states, initial states, operators, goal test Important consideration Bottom-up or top-down: general to specific or specific to general search Both have their advantages
156 Method constituents 3. Choosing a hypothesis Some learning techniques return one solution Others produce many solutions May differ in accuracy, comprehensibility & utility Question: how to choose just one from the rivals? Need to do this in order to (i) give the users just one answer (ii) assess the effectiveness of the technique Usual answer: Occam s razor All else being equal, choose the simplest solution When everything is equal May have to resort to choosing randomly 156
157 Assessing Hypotheses Given a hypothesis H False positives An example which is categorised as positive by H but in reality it was a negative example False negatives An example which is categorised as negative by H but in reality it was a positive example Sometimes we don t mind FPs as much as FNs Example: medical diagnosis o FN is someone predicted to be well who actually has disease But what if the treatment has severe side effects? 157
158 Illustrative Example Positives: Apple, Orange, Lemon, Melon, Strawberry Negatives: Banana, Passionfruit, Plum, Coconut, Apricot 158
159 Predictive Accuracy of Examples Hypothesis one: Positives are citrus fruits Scores 7 out of 10 for predictive accuracy Hypothesis two: Positives contain the letter l Also scores 7 out of 10 for predictive accuracy Hypothesis three: Positives are either o Apple, Orange, Lemon, Melon or Strawberry Scores 10 out of 10 for predictive accuracy. Hoorah! 159
160 The Real Test Is Lime a positive or a negative? Intended answer was that all positives contain the letter e and therefore Lime is a positive So, hypotheses one and two get this right It is a citrus fruit and it does have an l in it But hypothesis three got this wrong even though it seemed the best at predicting 160
161 Training and Test sets Standard technique for evaluating learning methods Split the data into two sets: Training set: used to learn the method Test set: used to test the accuracy of the learned hypothesis on unseen examples We are most interested in the performance of the learned concept on the test set 161
162 Methodologies for splitting data 162 Leave one out method For small datasets (<30 approx) Randomly choose one example to put in the test set Lime was left out in our example Repeatedly choose single examples to leave out o Remember to perform the learning task every time Hold back method For large datasets (thousands) Randomly choose a large set (poss %) to put into the test set
163 Cross Validation Method n-fold cross validation: Split the (entire) example set into n equal partitions Use each partition in turn as the test set Repeatedly perform the learning task and work out the predictive accuracy over the test set Average the predictive accuracy over all partitions o Gives you an indication of how the method will perform in real tests when a genuinely new example is presented 10-fold cross validation is very common 1-fold cross validation is the same as leave-one-out 163
164 Overfitting We say that hypothesis three was overfitting It is memorising examples rather than generalising them Formal definition: A learning method is overfitting if it finds a hypothesis H such that: o There is another hypothesis H where H scores better on the training set than H But H scores better on the test set than H In our example, H = hyp 3, H = hyp 2 164
165 Decision Tree Learning (Overview Only) Part
166 What to do this Weekend? If my parents are visiting We ll go to the cinema If not Then, if it s sunny I ll play tennis But if it s windy and I m rich, I ll go shopping If it s windy and I m poor, I ll go to the cinema If it s rainy, I ll stay in 166
167 Written as a Decision Tree Root of tree Leaves 167
168 Using the Decision Tree Assume: no parents on a sunny day 168
169 From Decision Trees to Logic Decision trees can be written as formulae in first order logic Read from the root to every leaf If this and this and this, then do this In our example: If no_parents and sunny_day, then play_tennis no_parents sunny_day play_tennis 169
170 Decision Tree Learning Overview Decision trees can be seen as rules for performing a categorisation, e.g., what kind of weekend will this be? Remember that we re learning from examples We need examples put into categories We also need attributes for the examples Attributes describe examples (background knowledge) Each attribute takes only a finite set of values 170
171 The ID3 Algorithm (Not Covered) The major question in decision tree learning Which nodes to put in which positions Including the root node and the leaf nodes ID3 uses a measure called Information Gain Based on a notion of entropy or Impurity in the data Used to choose which node to put in next Node with the highest information gain is chosen When there are no choices, a leaf node is put on 171
172 Appropriate Problems for Decision Tree learning From Tom Mitchell s book: Background concepts describe examples in terms of attribute-value pairs, values are always finite in number Concept to be learned (target function) has discrete values Disjunctive descriptions might be required in the answer Decision tree algorithms are fairly robust to errors 172
173 Two Layer Artificial Neural Networks Part
174 Non Symbolic Representations 174 Decision trees can be easily read A disjunction of conjunctions (logic) We call this a symbolic representation Non-symbolic representations More numerical in nature, more difficult to read Artificial Neural Networks (ANNs) A Non-symbolic representation scheme They embed a giant mathematical function o To take inputs and compute an output which is interpreted as a categorisation Often shortened to Neural Networks o Don t confuse them with real neural networks (in your head)
175 Function Learning 175 Map categorisation learning to numerical problem Each category given a number Or a range of real valued numbers (e.g., ) Function learning examples Input = 1,2,3,4 Output = 1,4,9,16 Here the concept to learn is squaring integers Input = [1,2,3], [2,3,4], [3,4,5], [4,5,6] Output = 1, 5, 11, 19 Here the concept is: [a,b,c] -> a*c - b o The calculation is more complicated than in the first example Neural networks: Calculation is much more complicated in general But it is still just a numerical calculation
176 Categorising Vehicles Input to function: pixel data from vehicle images Output: numbers: 1 for a car; 2 for a bus; 3 for a tank INPUT INPUT INPUT INPUT OUTPUT = 3 OUTPUT = 2 OUTPUT = 1 OUTPUT=1 176
177 So, what functions can we use? 177 Biological motivation: The brain does categorisation tasks like this easily The brain is made up of networks of neurons Naturally occurring neural networks Each neuron is connected to many others o Input to one neuron is the output from many others o Neuron fires if a weighted sum S of inputs > threshold Artificial neural networks Similar hierarchy with neurons firing Don t take the analogy too far o Human brains: 100,000,000,000 neurons o ANNs: < 1,000 usually o ANNs are a gross simplification of real neural networks
178 General Idea Choose Cat A (largest output value) NUMBERS INPUT INPUT LAYER HIDDEN LAYERS Value calculated using all the input unit values OUTPUT LAYER Cat A Cat B Cat C NUMBERS OUTPUT 178 VALUES PROPAGATE THROUGH THE NETWORK
179 Representation of Information If ANNs can correctly identify vehicles they then posess some notion of car, bus, etc. The categorisation is produced by the units (nodes) In practice: Each unit does the same calculation but it is based on the weighted sum of inputs to the unit So, the weights in the weighted sum is where the information is really stored We draw weights on to the ANN diagrams (see later) Black Box representation: Useful knowledge about the learned concept is difficult to extract 179
180 ANN learning problem 180 Given a categorisation to learn and training examples (both represented numerically) with the correct categorisation for each example Produce a neural network which provides the correct output for new, unseen examples Boils down to: (a) (b) (c) Choosing the correct network architecture Choosing (the same) function for each unit Training (adjusting) the weights between units to work correctly
181 Special Cases In principle, can have many hidden layers In practice, usually only one or two This lecture: Look at ANNs with no hidden layer Perceptrons: the simplest two layer ANNs Next lecture: Look at ANNs with one hidden layer Multi-layer ANNs 181
182 Perceptrons Multiple input nodes Single output node Takes a weighted sum of the inputs, call this S Unit function calculates the output value Useful to study because We can use perceptrons to build larger networks Perceptrons have limited abilities We will look at concepts they can t learn later 182
183 Unit Functions Step Function Sigma Function (smoth step function) Linear Function Simply output the weighted sum Threshold Function Output low values until the weighted sum gets over a threshold Then output high values Step function Output +1 if S > Threshold T Output 1 otherwise Sigma function Similar to step function but smooth (differentiable - next lecture) 183
184 Example Perceptron Categorisation of 2x2 pixel black & white images Into bright and dark Representation of this rule: If it contains 2, 3 or 4 white pixels, it is bright If it contains 0 or 1 white pixels, it is dark Perceptron architecture: Four input units, one for each pixel One output unit: +1 for white, -1 for dark 184
185 Example Perceptron 185 Example calculation: x 1 =-1, x 2 =1, x 3 =1, x 4 =-1 S = 0.25*(-1) *(1) *(1) *(-1) = 0 0 > -0.1, so the output from the ANN is +1 So the image is categorised as bright
186 Learning in Perceptrons Need to learn Both the weights between input and output units And the value for the threshold Make calculations easier by Thinking of the threshold as a weight from a special input unit where the output from the unit is always 1 Exactly the same result But we only have to worry about learning weights 186
187 New Representation for Perceptrons Special Input Unit Always produces 1 Threshold function has become this 187
188 Learning Algorithm 188 Weights are initially set randomly For each training example E Calculate the observed output from the ANN, o(e) If the target output t(e) is different from o(e) then o Tweak all the weights so that o(e) gets closer to t(e) o Tweaking is done by perceptron training rule (next slide) This routine is done for every example E Don t necessarily stop when all examples used Repeat the cycle (an epoch ) again Until the ANN produces the correct (or good enough) output for all the examples in the training set
189 Perceptron Training Rule When t(e) is different from o(e) Add on Δ i to weight w i Where Δ i = η * ( t(e) - o(e) ) * x i Do this for every weight in the network Interpretation: (t(e) o(e)) will either be +2 or 2 [cannot be the same sign] So we can think of the addition of Δ i as the adjustment of the weight in a direction which will improve the network s performance with respect to E Multiplication by x i adjusts it more if the input is bigger 189
190 The Learning Rate η η is called the learning rate Usually set to something small (e.g., 0.1) To control the movement of the weights Not to move too far for one example Which may over-compensate for another example If a large movement is actually necessary for the weights to correctly categorise E it will occur over time with multiple epochs 190
191 Worked Example Return to the bright and dark example Use a learning rate of η = 0.1 Suppose we have set random weights: 191
192 Worked Example Use this training example, E, to update weights Here, x 1 = -1, x 2 = 1, x 3 = 1, x 4 = -1 as before Propagate this information through the network S = (-0.5 * 1) + (0.7 * -1) + (-0.2 * +1) + (0.1 * +1) + (0.9 * -1) = -2.2 Hence the network outputs o(e) = -1 But it should have been bright =+1 so t(e) =
193 Calculating the Error Values Δ 0 = η * ( t(e) - o(e )) * x 0 = 0.1 * (1 - (-1)) * (1) = 0.1 * (2) = 0.2 Δ 1 = η * ( t(e) - o(e )) * x 1 = 0.1 * (1 - (-1)) * (-1) = 0.1 * (-2) = -0.2 Δ 2 = η * ( t(e) - o(e )) * x 2 = 0.1 * (1 - (-1)) * (1) = 0.1 * (2) = 0.2 Δ 3 = η * ( t(e) - o(e )) * x 3 = 0.1 * (1 - (-1)) * (1) = 0.1 * (2) = Δ 4 = η * ( t(e) - o(e )) * x 4 = 0.1 * (1 - (-1)) * (-1) = 0.1 * (-2) = -0.2
194 Calculating the New Weights w 0 = Δ 0 = = -0.3 w 1 = Δ 1 = = 0.5 w 2 = Δ 2 = = 0 w 3 = Δ 3 = = 0.3 w 4 = Δ 4 = =
195 Newly Adjusted Perceptron 195 Calculate the output for the example, E, again: S = (-0.3 * 1) + (0.5 * -1) + (0 * +1) + (0.3 * +1) + (0.7 * -1) = -1.2 Still gets the wrong categorisation But the value is closer to zero (from -2.2 to -1.2) In a few epochs time, this example will be correctly categorised
196 Learning Abilities of Perceptrons 196 Perceptrons represent a very simple network Computational learning theory Study of which concepts can and can t be learned by particular learning techniques (representation, method) Minsky and Papert s influencial book Showed the limitations of perceptrons Cannot learn some simple boolean functions Caused a winter of research for ANNs in AI o People thought it represented a fundamental limitation o But perceptrons are the simplest network o ANNS were revived by neuroscientists, etc.
197 Boolean Functions Take in two inputs (-1 or +1) Produce one output (-1 or +1) In other contexts, use 0 and 1 Example: AND function Produces +1 only if both inputs are +1 Example: OR function Produces +1 if either one of the inputs is +1 Example: XOR function Produces +1 if exactly one of the inputs is
198 Boolean Functions as Perceptrons Problem: XOR function cannot be represented as a perceptron because it is not linearly separable 198
199 Linearly Separable Boolean Functions 199 Linearly separable: Can use a line (dotted) to separate +1 and 1 Think of the line as representing the threshold Angle of line determined by two weights in perceptron Y-axis crossing determined by threshold
200 Multi-Layer Artificial Neural Networks Part
201 Multi-Layer Networks Built from Perceptron Units Perceptrons not able to learn certain concepts Can only learn linearly separable functions But they can be the basis for larger structures Which can learn more sophisticated concepts Say that the networks have perceptron units 201
202 Problem With Perceptron Units 202 The learning rule relies on differential calculus Finding minima by differentiating, etc. Step functions aren t differentiable They are not continuous at the threshold Alternative threshold function sought Must be smooth or differentiable Must be similar to step function, i.e., exhibit a threshold so that units can fire or not fire Sigmoid units used for backpropagation There are other alternatives that are often used
203 Sigmoid Units Take in weighted sum of inputs, S and output: Advantages: Looks very similar to the step function Is differentiable Derivative easily expressible in terms of σ itself: 203
204 Example ANN with Sigmoid Units Feed forward network Feed inputs in on the left, propagate numbers forward Suppose we have this ANN with weights set arbitrarily 204
205 Propagation of Example Suppose input to ANN is 10, 30, 20 First calculate weighted sums to hidden layer: S H1 = (0.2*10) + (-0.1*30) + (0.4*20) = = 7 S H2 = (0.7*10) + (-1.2*30) + (1.2*20) = = -5 Next calculate the output from the hidden layer: Using: σ(s) = 1/(1 + e -S ) σ(s H1 ) = 1/(1 + e -7 ) = 1/( ) = σ(s H2 ) = 1/(1 + e 5 ) = 1/( ) = So, H1 has fired, H2 has not 205
206 Propagation of Example Next calculate the weighted sums into the output layer: S O1 = (1.1 * 0.999) + (0.1 * ) = S O2 = (3.1 * 0.999) + (1.17 * ) = Finally, calculate the output from the ANN σ(s O1 ) = 1/(1+e ) = 1/( ) = σ(s O2 ) = 1/(1+e ) = 1/( ) = Output from O2 > output from O1 So, the ANN predicts category associated with O2 206
207 Backpropagation Learning Algorithm Same task as in perceptrons Learn a multi-layer ANN to correctly categorize examples We ll concentrate on ANNs with one hidden layer Overview of the routine Fix architecture and sigmoid units within architecture, i.e., number of units in hidden layer; the way the input units represent example; the way the output units categorize examples Randomly assign weights to the whole network using small values (between 0.5 and 0.5) Use each example in the set to retrain the weights Have multiple epochs (iterations through training set) until some termination condition is met (not necessarily 100% accurate) 207
208 Weight Training Overview 208 Use notation w ij to specify weight between unit i and unit j Perform calculation with respect to example E Going to calculate the value Δ ij for each w ij a nd add Δ ij on to w ij Do this by calculating error terms for each unit The error term for output units is found and then this information is used to calculate the error terms for the hidden units The error is propagated back through the ANN
209 Propagate E through the Network Feed E through the network (as in example above) Record the target and observed values for example E i.e., determine weighted sum from hidden units Let t i (E) be the target values for output unit i Let o i (E) be the observed value for output unit i Note that for categorisation learning tasks, Each t i (E) will be 0, except for a single t j (E), which will be 1 But o i (E) will be a real valued number between 0 and 1 Also record the outputs from the hidden units Let h i (E) be the output from hidden unit i 209
210 Error terms for each unit The Error Term for output unit k is calculated as: The Error Term for hidden unit k is: In English: For hidden unit h, add together all the errors for the output units, multiplied by the appropriate weight. Then multiply this sum by h k (E)(1 h k (E)) 210
211 Final Calculations Choose a learning rate, η (e.g., = 0.1) For each weight w ij Between input unit i and hidden unit j Calculate: Where x i is the input to the system to input unit i for E For each weight w ij Between hidden unit i and output unit j Calculate: Where h i (E) is the output from hidden unit i for E 211 Finally, add on each Δ ij on to w ij
212 Worked Backpropagation Example Start with the previous ANN 212 We will retrain the weights In the light of example E = (10,30,20) Stipulate that E should have been categorised as O1 Will use a learning rate of η = 0.1
213 Previous Calculations Need the calculations from when we propagated E through the ANN: 213 t 1 (E) = 1 and t 2 (E) = 0 [from categorisation] o 1 (E) = and o 2 (E) = 0.957
214 Error Values for Output Units t 1 (E) = 1 and t 2 (E) = 0 [from categorisation] o 1 (E) = and o 2 (E) = So: 214
215 Error Values for Hidden Units δ O1 = and δ O2 = h 1 (E) = and h 2 (E) = So, for H1, we add together: (w 11 *δ 01 ) + (w 12 *δ O2 ) = (1.1*0.0469)+(3.1* ) = And multiply by: h 1 (E)(1-h 1 (E)) to give us: o * (0.999 * ( )) = = δ H1 For H2, we add together: (w 21 *δ 01 ) + (w 22 *δ O2 ) = (0.1*0.0469)+(1.17* ) = And multiply by: h 2 (E)(1-h 2 (E)) to give us: 215 o * (0.067 * ( )) = = δ H2
216 Calculation of Weight Changes For weights between the input and hidden layer 216
217 Calculation of Weight Changes For weights between hidden and output layer 217 Weight changes are not very large Small differences in weights can make big differences in calculations But it might be a good idea to increase η
218 Calculation of Network Error Could calculate Network error as Proportion of mis-categorised examples But there are multiple output units, with numerical output So we use a more sophisticated measure: 218 Not as complicated as it looks Square the difference between target and observed o Squaring ensures we get a positive number Add up all the squared differences For every output unit and every example in training set
219 Suitable Problems for ANNs 219 Examples and target categorisation Can be expressed as real values o ANNs are just fancy numerical functions Predictive accuracy is more important than understanding what the machine has learned o Black box non-symbolic approach, not easy to digest Slow training times are OK Can take hours and days to train networks Execution of learned function must be quick Learned networks can categorise very quickly o Very useful in time critical situations (is that a tank, car or bus?) Error: ANNs are fairly robust to noise in data
220 Genetic Algorithms Part
221 Mimicking Darwinian evolution How do we get an agent to act intelligently? One answer: Humans have evolved a brain which enables us to act intelligently Idea: Mimic Darwinian evolution to evolve intelligent agents Intelligence is a specific instance of evolution Evolving a species to survive in a niche Via adaptation of species through survival of the fittest More general idea: Evolve programs to solve problems 221
222 Overview of Genetic Algorithms (GAs) Represent parameters as bit strings, e.g., Everything in computers is represented in binary Start with a population (fairly large set) of possible solutions known as individuals Combine possible solutions by swapping material Choose the best solutions and kill off the worse solutions This generates a new set of possible solutions Requires a notion of fitness of the individual Based on an evaluation function Dependent on properties of the solution 222
223 223 The Canonical Genetic Algorithm
224 The Initial Population 224 Represent solutions to your problem as bit strings of length L Choose an initial population Generate length L strings of 1s & 0s randomly Strings are sometimes called chromosomes Letters in the string are called genes We call the bit-strings individuals
225 The Selection Process We are going to combine (mate) pairs of members of the current population to produce offspring We need to choose which pairs to mate Use an evaluation function, g(c), which depends on desirable properties of solutions Use g(c) to define a fitness function 225
226 Calculating Fitness Use a fitness function to assign a probability of each individual being used in the mating process for the next generation Usually we use: fitness(c) = g(c) / A, where A = average of g(c) over entire population 226 Other constructions are possible
227 Using the Fitness Function Use fitness to produce an intermediate population IP from which mating will occur Integer part of the fitness value represents the number of copies of individuals guaranteed to go in IP Fractional part of the fitness value represents the probability that an extra copy will go into IP Use a probabilistic function (roll dice) to determine whether the extra copies are added to the IP 227
228 Examples of Selection Example #1 Suppose the average of g over the population is 17 And for a particular individual c 0, g(c 0 ) = 25 Fitness(c 0 ) = 25/17 = 1.47 Then 1 copy of c 0 is guaranteed to go into IP And the probability of another copy going in is 0.47 Example #2 Suppose g(c 1 ) = 14 No copy of c 1 is guaranteed to go into IP The probability of one copy going in is 14/17 =
229 The Mating Process Pairs from the Intermediate Population are chosen randomly Could choose the same individual multiple times! They are combined with a probability p c Their offspring are produced by recombination Offspring go into the next generation The process is repeated until The next generation contains the required number of individuals for the next population Often this is taken as the current population size which keeps the population size constant 229
230 Analogy with Natural Evolution Analogy holds: Some very fit individuals may be unlucky, and: o not find a partner o not be able to reproduce with the partner they find Some less fit individuals may be lucky and find a partner and produce hundreds of offspring! Analogy is only partial: There is no notion of sexes Individuals can mate with themselves 230
231 The Recombination Process The population will only evolve to be better if best parts of the best individuals are combined Crossover techniques are used They take material (genes) from each individual And add it to offspring individuals Mutation techniques are used Which randomly alter the offspring Good for getting out of local maxima 231
232 One-point crossover Randomly choose a single point in both individuals Both have the same point Split into LHS 1 +RHS 1 and LHS 2 +RHS 2 Generate two offspring from the combinations Offspring 1: LHS 1 +RHS 2 Offspring 2: LSH 2 +RHS Example: (X,Y,a,b are all ones and zeros) 232
233 Two-point Crossover Two points are chosen in the strings The material falling between the two points is swapped in the string for the two offspring Example: 233
234 Inversion Another possible operator to mix things up Takes the material from only on parent Randomly chooses two points Takes the segment in-between the points Reverses it in the offspring Example: 234
235 The Mutation Process 235 Recombination keeps large strings the same Not just randomly jumbling up the bit-strings Produces a large range of possible solutions But might get into a local maxima Random mutation Each bit in every offspring produced o Is flipped from 0 to 1 or vice versa o With a given probability (usually pretty small < 1%) May help avoid local maxima o In natural evolution mutation is almost always harmful Could use probability distribution To protect the best individuals in population But is this good for avoiding local maxima?
236 236 Schematic Overview of Producing the Next Generation
237 The Termination Check Termination may involve testing whether an individual solves the problem to a good enough standard Not necessarily having found a definitive answer Alternatively, termination may occur after a fixed time or after a fixed number of generations Note that the best individual in a population may not be as good as the best in a previous population So, your GA should: Record the best from each generation Output the best from all populations as the answer 237
238 Problem I: Representing Solutions 238 Difficulty 1: Encoding solutions as bit strings in the first place ASCII strings require 8 bits (=1 byte) per letter Solutions can become very long o So that evolution takes many cycles to converge Difficulty 2: Avoiding redundancy Many bit strings produced by recombination and mutation will not represent possible solutions at all o How do we evaluate such possibilities (obviously bad???) Situation is better when solutions are continuous o Over a set of real-valued numbers or integers Situation is worse when there is a finite number of solutions
239 Problem II: Evaluation Functions Fitness function is not to be confused with the evaluation function Evaluation function must, if possible: Return a real-valued number Be quick to calculate (this will be done many, many times) Distinguish between different individuals Problem when populations have evolved All individuals score highly with respect to fitness So we may have to do some fixing to spread them out Otherwise they all have roughly equal chance of reproducing and evolution will have stopped 239
240 Analogy with Genetics Analogies often used With analogy to both genetic and species evolution They are rarely exact and sometimes misleading GA Genetics Species Population DNA Population Bit string Chromosome Organism Bit Gene Genome??? Selection??? Survival of the fittest Recombination Crossover Inheritance Mutation Mutation Mutation 240
241 Application to Transportation System Design Taken from the ACM Student Magazine Undergraduate project of Ricardo Hoar & Joanne Penner Vehicles on a road system modelled as individual agents Want to increase traffic flow Uses a GA approach to evolve solutions to: The problem of timing traffic lights Optimal solutions only known for very simple road systems Traffic light switching times are real-valued numbers over a continuous space 241
242 Transport Evaluation Function Traffic flow is increased if: The average time a car is waiting is decreased Alternatively, the total waiting time is decreased They used this evaluation function: 242 W i is the total waiting time for car i and d i is the total driving time for car i.
243 Results Reduction in waiting times for a simple road: Solutions similar to (human) established solutions 243
244 Results Reduction in waiting times for a more complicated road system (not too complicated, though) Roughly 50% decrease in waiting times 244
245 Using Genetic Algorithms If you can answer four questions: What is the fitness [evaluation] function? How is an individual represented? How are individuals selected? How do individuals reproduce? Then why not try GAs for your problem? Similar to neural networks (might not be the best way to proceed, but it is quick and easy to get going) Try a quick implementation before investing more time thinking about another approach. Transport app worked very well (undergrad level) 245
246 Genetic Programming Part
247 Automatic Programming Getting software to write software Brilliant idea, but turned out to be very difficult Intuitively, should be easier than other tasks o But programming requires cunning and guile o And is just as difficult to automate as other human tasks The automatic programming community Didn t deliver on early promises So people began to avoid this area o Automated Programming became words of warning 247
248 But Lots of Techniques Many techniques involve The automation of programming For example, decision tree learning o The output is a decision tree program, which must be run given input information in order for it to make decisions Similarly for Artificial Neural Networks Genetic programming people are more open They quite clearly state that they are involved in automatic programming projects Perhaps they will achieve this Holy Grail of AI 248
249 Genetic Programming Overview The task is to produce a program which completes a particular task or solves a particular problem Done by search inspired by evolutionary methods Initial population is produced Individuals are program representations Individuals are selected Programs are translated, compiled and executed Users evaluate their performance New programs are bred From single individuals (mutation) From pairs of parent individuals (crossover) 249
250 Comparison with Genetic Algorithms Evolutionary processes very similar Choosing of individuals to mate sometimes simpler Fitness functions are problem specific Specifying termination is very similar Big difference is the representation GAs have a fixed representation for which parameters are learned GPs really do evolve programs, which can grow in structure, complexity and size 250
251 Questions to be Answered How is the programming task specified? How are the ingredients chosen/specified/ How do we represent programs? In a way that enables reproduction, evaluation, etc. How are individuals chosen for reproduction? How are offspring produced? How are termination conditions specified? We will start by looking at the representation of programs, as this is required to answer many questions 251
252 Representing Programs 252 Programs are written as lines of code Want to combine parent programs into offspring As with GAs, it s a good idea not to randomly jumble up the lines of code Could think of each line as a bit And perform one and two point crossover as for GAs However, that leads to programs which don t even compile We at least need our offspring to be compilable!
253 A Tree Representation 253 Idea: represent programs as trees Each node is either a function or a terminal Functions Examples: add, multiply, square root, sine, cosine Problem specific ones: getpixel(), goleft() Terminals (leaves) Constants and variables, e.g., 10, 17, X, Y The nodes below a function node are the inputs to that function The node itself represents the output from the function as input to the parent node The nodes at the ends of branches are terminals only
254 An Example if (root(x+y) < 10) then return x else return x / y 254
255 Expressivity 255 We will look at simple programs only Called result producing branches These form part of larger programs (as subroutines) They can be seen as functions o Taking a set of inputs and producing a single output value o E.g., X and Y were inputs in the previous function, which returned either X or X/Y Programs which are in the search space Dependent on function set, terminal set And the programmatic functions, e.g., if-then-else, for-loop And the limitations of the nodes o E.g., a node for if X less than Y, then (could be more general)
256 More General Representation 256 < can be replaced by any boolean function Beware: more expressive means bigger space which means that convergence to a solution could take forever
257 Specifying the Program Task Remember we are trying to evolve a program rather than write one ourselves Need to specify what the program should do Embedded into an evaluation function so that the selection of the fittest can drive evolution Not just what is good, but what should happen Problem specific, a difficult part of the task 257
258 Specifying the Problem (Method #1) 258 Simply give a set of input/output pairs Evaluation function counts the number of inputs for which the program correctly generates the output Example: Discovery of Understandable Math Formulas Using Genetic Programming Task to develop a function for highest common factor so that the program can be understood Input was triples of the form (A, B, hcf(a,b)) o For instance (3,6,3), (15,20,5), etc.
259 Specifying the Problem (Method #2) Take the artifacts produced by the evolved programs and test them using a (possibly complicated) evaluation function E.g., Project by Penousal Machado et. Al Adding colour to greyscale images Many fitness functions tried, including this one: 259
260 Specifying the Function and Terminal Sets 260 GP engine needs to know Which ingredients are allowed in the programs Function set All the computational units in the graph E.g., *,+,/,-,, Terminal set All the variables and constants that can appear Often highly problem specific E.g., robotics: travel_in_direction(), left, right, etc. E.g., image manipulation: sine, cosine, getpixel(), etc.
261 Specifying Control Parameters Very important: size of the population Usually kept (roughly) constant throughout More individuals means greater diversity but a longer time to make changes Also important: maximum length of programs GPs don t have a fixed structure like GAs So the programs could balloon o Often a criticism of GPs is that the programs are huge and incomprehensible Having a cap on the program length can improve comprehensibility and improve search 261
262 Termination Conditions 262 Very similar to GAs Can wait a certain amount of time Or for a certain number of generations Best individual seen in any generation is taken Some implementations allow the user to monitor and stop the search if it reaches a plateau Can also wait until a good individual is found i.e., until one scores high enough with respect to the evaluation function
263 Evolving New Populations 263 Important aspects of generating new population: How the initial population is produced How individuals are selected to produce offspring How new individuals (offspring) are generated Genetic operators Reproduction Mutation Crossover Architecture altering
264 Seeding an Initial Population 264 First population is produced randomly Functions and terminals are added to graph and then terminals are turned into functions Random choice 1: what to put at the top Random choice 2: which node to expand Which terminal to change into a function Care taken to keep inputs the same type Random choice 3: when to stop altering This produces many programs of varied shapes and sizes
265 Choosing Individuals To Produce Offspring from 265 Reproduction is used and mating is inaccurate (sometimes single individuals reproduce) Selection based on the evaluation function Individuals go into an intermediate population again Only those in the IP will produce offspring Using the evaluation function probabilistically A probability is assigned to each individual o Then a dice is thrown for it o E.g., 0.8: generate random number between 0 and 1 o If it is 0.8 or less, the individual goes into the IP
266 Other Possibilities for the Selection Procedure 266 Tournament selection: Pairs of individuals are chosen and the fittest of the two goes through to IP Meant to simulate competition in nature If two fairly unfit individuals are chosen, one of them will win Selection by rank Evaluation function used to order the individuals Only the top ones go into the IP Means the least fit die off (could be a bad thing)
267 Genetic Operators for Single Individuals Reproduction A copy of the individual is taken and passed on without alteration Means old generation members can survive Hence we can completely kill off the previous generation Mutation A node is chosen at random and removed 267 To be replaced by a randomly generated subtree generated in same way as the initial population Sometimes constraints are added: o Functions only replaced by functions o Terminals only replaced by terminals
268 268 Example of Mutation
269 Crossover Operation for Two Parent Individuals Parents chosen randomly from the IP Two parents may be the same individual A node on one parent is chosen randomly A node on the other parent is chosen randomly The subtrees below the nodes are swapped These subtrees are called the crossover fragments If parents are the same individual Must make sure the two nodes on tree are different Otherwise two copies of the individual are passed on 269
270 270 Example of Crossover
271 Application Domain #1: Evolving Electronic Circuits 271 John Koza Stanford Professor and CEO of Genetic Programming Inc. Guru of Genetic Programming, very successful Made GP more mainstream by his success Particularly fruitful area: Producing designs for circuit diagrams, e.g., antennas, amplifiers, etc. Functions mimic how transistors, resistors, etc. work Has led to many new inventions
272 Quote from There are now 36 instances where genetic programming has automatically produced a result that is competitive with human performance, including 15 instances where genetic programming has created an entity that either infringes or duplicates the functionality of a previously patented 20th-century invention, 6 instances where genetic programming has done the same with respect to a 21st-century invention, and 2 instances where genetic programming has created a patentable new invention. 272
273 Application Domain #2: Evolutionary Art 273 Programs are evolved to manipulate images Initial seed generated randomly using Mathematical operators, *, root, +, sine, cosine, etc. And pixel operators, getpixels(), setpixelrgb(), etc. The user acts as the fitness function They choose the images they like These are used to generate more offspring And the user is shown the best of the generation o Assessed in terms of closeness to the chosen ones o (e.g., similar colour distributions, complexity, etc.,) Complicated and beautiful images result from this process, and people are becoming very interested E.g., an ad-campaign for Absolut Vodka
274 The Nevar Evoluationary Art Program The goal of the project is to produce an automated artist Includes the greyscale colour discussed previously Implemented and maintained by Penousal Machado at Coimbra University in Portugal Produced many images which have won prizes 274
275 275 Image from Nevar #1
276 276 Image from Nevar #2
Artificial Intelligence
Artificial Intelligence ICS461 Fall 2010 1 Lecture #12B More Representations Outline Logics Rules Frames Nancy E. Reed [email protected] 2 Representation Agents deal with knowledge (data) Facts (believe
COMP 590: Artificial Intelligence
COMP 590: Artificial Intelligence Today Course overview What is AI? Examples of AI today Who is this course for? An introductory survey of AI techniques for students who have not previously had an exposure
Measuring the Performance of an Agent
25 Measuring the Performance of an Agent The rational agent that we are aiming at should be successful in the task it is performing To assess the success we need to have a performance measure What is rational
Learning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
CS91.543 MidTerm Exam 4/1/2004 Name: KEY. Page Max Score 1 18 2 11 3 30 4 15 5 45 6 20 Total 139
CS91.543 MidTerm Exam 4/1/2004 Name: KEY Page Max Score 1 18 2 11 3 30 4 15 5 45 6 20 Total 139 % INTRODUCTION, AI HISTORY AND AGENTS 1. [4 pts. ea.] Briefly describe the following important AI programs.
CSC384 Intro to Artificial Intelligence
CSC384 Intro to Artificial Intelligence What is Artificial Intelligence? What is Intelligence? Are these Intelligent? CSC384, University of Toronto 3 What is Intelligence? Webster says: The capacity to
Course Outline Department of Computing Science Faculty of Science. COMP 3710-3 Applied Artificial Intelligence (3,1,0) Fall 2015
Course Outline Department of Computing Science Faculty of Science COMP 710 - Applied Artificial Intelligence (,1,0) Fall 2015 Instructor: Office: Phone/Voice Mail: E-Mail: Course Description : Students
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski [email protected] Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems
Artificial Intelligence
Artificial Intelligence Chapter 1 Chapter 1 1 Outline What is AI? A brief history The state of the art Chapter 1 2 What is AI? Systems that think like humans Systems that think rationally Systems that
Game playing. Chapter 6. Chapter 6 1
Game playing Chapter 6 Chapter 6 1 Outline Games Perfect play minimax decisions α β pruning Resource limits and approximate evaluation Games of chance Games of imperfect information Chapter 6 2 Games vs.
CHAPTER 15: IS ARTIFICIAL INTELLIGENCE REAL?
CHAPTER 15: IS ARTIFICIAL INTELLIGENCE REAL? Multiple Choice: 1. During Word World II, used Colossus, an electronic digital computer to crack German military codes. A. Alan Kay B. Grace Murray Hopper C.
What is Artificial Intelligence?
CSE 3401: Intro to Artificial Intelligence & Logic Programming Introduction Required Readings: Russell & Norvig Chapters 1 & 2. Lecture slides adapted from those of Fahiem Bacchus. 1 What is AI? What is
Learning Agents: Introduction
Learning Agents: Introduction S Luz [email protected] October 22, 2013 Learning in agent architectures Performance standard representation Critic Agent perception rewards/ instruction Perception Learner Goals
The Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
6.042/18.062J Mathematics for Computer Science. Expected Value I
6.42/8.62J Mathematics for Computer Science Srini Devadas and Eric Lehman May 3, 25 Lecture otes Expected Value I The expectation or expected value of a random variable is a single number that tells you
WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT?
WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT? introduction Many students seem to have trouble with the notion of a mathematical proof. People that come to a course like Math 216, who certainly
6.3 Conditional Probability and Independence
222 CHAPTER 6. PROBABILITY 6.3 Conditional Probability and Independence Conditional Probability Two cubical dice each have a triangle painted on one side, a circle painted on two sides and a square painted
Load Balancing. Load Balancing 1 / 24
Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait
8. KNOWLEDGE BASED SYSTEMS IN MANUFACTURING SIMULATION
- 1-8. KNOWLEDGE BASED SYSTEMS IN MANUFACTURING SIMULATION 8.1 Introduction 8.1.1 Summary introduction The first part of this section gives a brief overview of some of the different uses of expert systems
THE HUMAN BRAIN. observations and foundations
THE HUMAN BRAIN observations and foundations brains versus computers a typical brain contains something like 100 billion miniscule cells called neurons estimates go from about 50 billion to as many as
SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis
SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: [email protected] October 17, 2015 Outline
A Working Knowledge of Computational Complexity for an Optimizer
A Working Knowledge of Computational Complexity for an Optimizer ORF 363/COS 323 Instructor: Amir Ali Ahmadi TAs: Y. Chen, G. Hall, J. Ye Fall 2014 1 Why computational complexity? What is computational
If A is divided by B the result is 2/3. If B is divided by C the result is 4/7. What is the result if A is divided by C?
Problem 3 If A is divided by B the result is 2/3. If B is divided by C the result is 4/7. What is the result if A is divided by C? Suggested Questions to ask students about Problem 3 The key to this question
1 What is Machine Learning?
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #1 Scribe: Rob Schapire February 4, 2008 1 What is Machine Learning? Machine learning studies computer algorithms for learning to do
Machine Learning. 01 - Introduction
Machine Learning 01 - Introduction Machine learning course One lecture (Wednesday, 9:30, 346) and one exercise (Monday, 17:15, 203). Oral exam, 20 minutes, 5 credit points. Some basic mathematical knowledge
Acting humanly: The Turing test. Artificial Intelligence. Thinking humanly: Cognitive Science. Outline. What is AI?
Acting humanly: The Turing test Artificial Intelligence Turing (1950) Computing machinery and intelligence : Can machines think? Can machines behave intelligently? Operational test for intelligent behavior:
Artificial Intelligence (AI)
Overview Artificial Intelligence (AI) A brief introduction to the field. Won t go too heavily into the theory. Will focus on case studies of the application of AI to business. AI and robotics are closely
What mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL
What mathematical optimization can, and cannot, do for biologists Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL Introduction There is no shortage of literature about the
Fall 2012 Q530. Programming for Cognitive Science
Fall 2012 Q530 Programming for Cognitive Science Aimed at little or no programming experience. Improve your confidence and skills at: Writing code. Reading code. Understand the abilities and limitations
Laboratory work in AI: First steps in Poker Playing Agents and Opponent Modeling
Laboratory work in AI: First steps in Poker Playing Agents and Opponent Modeling Avram Golbert 01574669 [email protected] Abstract: While Artificial Intelligence research has shown great success in deterministic
Gerard Mc Nulty Systems Optimisation Ltd [email protected]/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I
Gerard Mc Nulty Systems Optimisation Ltd [email protected]/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Data is Important because it: Helps in Corporate Aims Basis of Business Decisions Engineering Decisions Energy
Agents: Rationality (2)
Agents: Intro Agent is entity that perceives and acts Perception occurs via sensors Percept is one unit of sensory input Percept sequence is complete history of agent s percepts In general, agent s actions
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
D A T A M I N I N G C L A S S I F I C A T I O N
D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.
Predicate logic Proofs Artificial intelligence. Predicate logic. SET07106 Mathematics for Software Engineering
Predicate logic SET07106 Mathematics for Software Engineering School of Computing Edinburgh Napier University Module Leader: Uta Priss 2010 Copyright Edinburgh Napier University Predicate logic Slide 1/24
DM810 Computer Game Programming II: AI. Lecture 11. Decision Making. Marco Chiarandini
DM810 Computer Game Programming II: AI Lecture 11 Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Resume Decision trees State Machines Behavior trees Fuzzy
Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits
Outline NP-completeness Examples of Easy vs. Hard problems Euler circuit vs. Hamiltonian circuit Shortest Path vs. Longest Path 2-pairs sum vs. general Subset Sum Reducing one problem to another Clique
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 313]
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 313] File Structures A file is a collection of data stored on mass storage (e.g., disk or tape) Why on mass storage? too big to fit
Search methods motivation 1
Suppose you are an independent software developer, and your software package Windows Defeater R, widely available on sourceforge under a GNU GPL license, is getting an international attention and acclaim.
AI: A Modern Approach, Chpts. 3-4 Russell and Norvig
AI: A Modern Approach, Chpts. 3-4 Russell and Norvig Sequential Decision Making in Robotics CS 599 Geoffrey Hollinger and Gaurav Sukhatme (Some slide content from Stuart Russell and HweeTou Ng) Spring,
Machine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
EXECUTIVE SUPPORT SYSTEMS (ESS) STRATEGIC INFORMATION SYSTEM DESIGNED FOR UNSTRUCTURED DECISION MAKING THROUGH ADVANCED GRAPHICS AND COMMUNICATIONS *
EXECUTIVE SUPPORT SYSTEMS (ESS) STRATEGIC INFORMATION SYSTEM DESIGNED FOR UNSTRUCTURED DECISION MAKING THROUGH ADVANCED GRAPHICS AND COMMUNICATIONS * EXECUTIVE SUPPORT SYSTEMS DRILL DOWN: ability to move
Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
3. Mathematical Induction
3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
What is Learning? CS 391L: Machine Learning Introduction. Raymond J. Mooney. Classification. Problem Solving / Planning / Control
What is Learning? CS 391L: Machine Learning Introduction Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem solving
Levels of Analysis and ACT-R
1 Levels of Analysis and ACT-R LaLoCo, Fall 2013 Adrian Brasoveanu, Karl DeVries [based on slides by Sharon Goldwater & Frank Keller] 2 David Marr: levels of analysis Background Levels of Analysis John
The Trip Scheduling Problem
The Trip Scheduling Problem Claudia Archetti Department of Quantitative Methods, University of Brescia Contrada Santa Chiara 50, 25122 Brescia, Italy Martin Savelsbergh School of Industrial and Systems
Sanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 [email protected] 1. Introduction The field of data mining and knowledgee discovery is emerging as a
The Turing Test! and What Computer Science Offers to Cognitive Science "
The Turing Test and What Computer Science Offers to Cognitive Science " Profs. Rob Rupert and Mike Eisenberg T/R 11-12:15 Muenzinger D430 http://l3d.cs.colorado.edu/~ctg/classes/cogsci12/ The Imitation
Game Theory 1. Introduction
Game Theory 1. Introduction Dmitry Potapov CERN What is Game Theory? Game theory is about interactions among agents that are self-interested I ll use agent and player synonymously Self-interested: Each
Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
Artificial Intelligence BEG471CO
Artificial Intelligence BEG471CO Year IV Semester: I Teaching Schedule Examination Scheme Hours/Week Theory Tutorial Practical Internal Assessment Final Total 3 1 3/2 Theory Practical * Theory** Practical
Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)
Machine Learning 1 Attribution Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) 2 Outline Inductive learning Decision
6.080/6.089 GITCS Feb 12, 2008. Lecture 3
6.8/6.89 GITCS Feb 2, 28 Lecturer: Scott Aaronson Lecture 3 Scribe: Adam Rogal Administrivia. Scribe notes The purpose of scribe notes is to transcribe our lectures. Although I have formal notes of my
CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team
CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team Lecture Summary In this lecture, we learned about the ADT Priority Queue. A
Smart Graphics: Methoden 3 Suche, Constraints
Smart Graphics: Methoden 3 Suche, Constraints Vorlesung Smart Graphics LMU München Medieninformatik Butz/Boring Smart Graphics SS2007 Methoden: Suche 2 Folie 1 Themen heute Suchverfahren Hillclimbing Simulated
Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Know the definitions of conditional probability and independence
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
Lecture 6. Artificial Neural Networks
Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
P vs NP problem in the field anthropology
Research Article P vs NP problem in the field anthropology Michael.A. Popov, Oxford, UK Email [email protected] Keywords P =?NP - complexity anthropology - M -decision - quantum -like game - game-theoretical
Game Playing in the Real World. Next time: Knowledge Representation Reading: Chapter 7.1-7.3
Game Playing in the Real World Next time: Knowledge Representation Reading: Chapter 7.1-7.3 1 What matters? Speed? Knowledge? Intelligence? (And what counts as intelligence?) Human vs. machine characteristics
(Refer Slide Time: 2:03)
Control Engineering Prof. Madan Gopal Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 11 Models of Industrial Control Devices and Systems (Contd.) Last time we were
How To Program A Computer
Object Oriented Software Design Introduction Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa September 23, 2010 G. Lipari (Scuola Superiore Sant Anna) OOSD September 23, 2010
Decision Making under Uncertainty
6.825 Techniques in Artificial Intelligence Decision Making under Uncertainty How to make one decision in the face of uncertainty Lecture 19 1 In the next two lectures, we ll look at the question of how
Security visualisation
Security visualisation This thesis provides a guideline of how to generate a visual representation of a given dataset and use visualisation in the evaluation of known security vulnerabilities by Marco
Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing
CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
Theorem (informal statement): There are no extendible methods in David Chalmers s sense unless P = NP.
Theorem (informal statement): There are no extendible methods in David Chalmers s sense unless P = NP. Explication: In his paper, The Singularity: A philosophical analysis, David Chalmers defines an extendible
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson
Mathematics for Computer Science/Software Engineering Notes for the course MSM1F3 Dr. R. A. Wilson October 1996 Chapter 1 Logic Lecture no. 1. We introduce the concept of a proposition, which is a statement
Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 13. Random Variables: Distribution and Expectation
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 3 Random Variables: Distribution and Expectation Random Variables Question: The homeworks of 20 students are collected
A Client-Server Interactive Tool for Integrated Artificial Intelligence Curriculum
A Client-Server Interactive Tool for Integrated Artificial Intelligence Curriculum Diane J. Cook and Lawrence B. Holder Department of Computer Science and Engineering Box 19015 University of Texas at Arlington
IAI : Knowledge Representation
IAI : Knowledge Representation John A. Bullinaria, 2005 1. What is Knowledge? 2. What is a Knowledge Representation? 3. Requirements of a Knowledge Representation 4. Practical Aspects of Good Representations
Course: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 [email protected] Abstract Probability distributions on structured representation.
Chess Algorithms Theory and Practice. Rune Djurhuus Chess Grandmaster [email protected] / [email protected] October 3, 2012
Chess Algorithms Theory and Practice Rune Djurhuus Chess Grandmaster [email protected] / [email protected] October 3, 2012 1 Content Complexity of a chess game History of computer chess Search trees
Chapter 2: Intelligent Agents
Chapter 2: Intelligent Agents Outline Last class, introduced AI and rational agent Today s class, focus on intelligent agents Agent and environments Nature of environments influences agent design Basic
About the Author. The Role of Artificial Intelligence in Software Engineering. Brief History of AI. Introduction 2/27/2013
About the Author The Role of Artificial Intelligence in Software Engineering By: Mark Harman Presented by: Jacob Lear Mark Harman is a Professor of Software Engineering at University College London Director
The Math. P (x) = 5! = 1 2 3 4 5 = 120.
The Math Suppose there are n experiments, and the probability that someone gets the right answer on any given experiment is p. So in the first example above, n = 5 and p = 0.2. Let X be the number of correct
BPM: Chess vs. Checkers
BPM: Chess vs. Checkers Jonathon Struthers Introducing the Games Business relies upon IT systems to perform many of its tasks. While many times systems don t really do what the business wants them to do,
Brain-in-a-bag: creating an artificial brain
Activity 2 Brain-in-a-bag: creating an artificial brain Age group successfully used with: Abilities assumed: Time: Size of group: 8 adult answering general questions, 20-30 minutes as lecture format, 1
cs171 HW 1 - Solutions
1. (Exercise 2.3 from RN) For each of the following assertions, say whether it is true or false and support your answer with examples or counterexamples where appropriate. (a) An agent that senses only
Appendices master s degree programme Artificial Intelligence 2014-2015
Appendices master s degree programme Artificial Intelligence 2014-2015 Appendix I Teaching outcomes of the degree programme (art. 1.3) 1. The master demonstrates knowledge, understanding and the ability
Automated Theorem Proving - summary of lecture 1
Automated Theorem Proving - summary of lecture 1 1 Introduction Automated Theorem Proving (ATP) deals with the development of computer programs that show that some statement is a logical consequence of
Chapter 21: The Discounted Utility Model
Chapter 21: The Discounted Utility Model 21.1: Introduction This is an important chapter in that it introduces, and explores the implications of, an empirically relevant utility function representing intertemporal
Lecture 1: Introduction to Reinforcement Learning
Lecture 1: Introduction to Reinforcement Learning David Silver Outline 1 Admin 2 About Reinforcement Learning 3 The Reinforcement Learning Problem 4 Inside An RL Agent 5 Problems within Reinforcement Learning
NECESSARY AND SUFFICIENT CONDITIONS
Michael Lacewing Personal identity: Physical and psychological continuity theories A FIRST DISTINCTION In order to understand what is at issue in personal identity, it is important to distinguish between
Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science [email protected]
Machine Learning CS 6830 Razvan C. Bunescu School of Electrical Engineering and Computer Science [email protected] What is Learning? Merriam-Webster: learn = to acquire knowledge, understanding, or skill
8. Machine Learning Applied Artificial Intelligence
8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name
Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)
Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80) CS 30, Winter 2016 by Prasad Jayanti 1. (10 points) Here is the famous Monty Hall Puzzle. Suppose you are on a game show, and you
Chapter 6: Episode discovery process
Chapter 6: Episode discovery process Algorithmic Methods of Data Mining, Fall 2005, Chapter 6: Episode discovery process 1 6. Episode discovery process The knowledge discovery process KDD process of analyzing
Software Modeling and Verification
Software Modeling and Verification Alessandro Aldini DiSBeF - Sezione STI University of Urbino Carlo Bo Italy 3-4 February 2015 Algorithmic verification Correctness problem Is the software/hardware system
Regular Languages and Finite Automata
Regular Languages and Finite Automata 1 Introduction Hing Leung Department of Computer Science New Mexico State University Sep 16, 2010 In 1943, McCulloch and Pitts [4] published a pioneering work on a
CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
Problems often have a certain amount of uncertainty, possibly due to: Incompleteness of information about the environment,
Uncertainty Problems often have a certain amount of uncertainty, possibly due to: Incompleteness of information about the environment, E.g., loss of sensory information such as vision Incorrectness in
Graph Theory Problems and Solutions
raph Theory Problems and Solutions Tom Davis [email protected] http://www.geometer.org/mathcircles November, 005 Problems. Prove that the sum of the degrees of the vertices of any finite graph is
SPECIFICATION BY EXAMPLE. Gojko Adzic. How successful teams deliver the right software. MANNING Shelter Island
SPECIFICATION BY EXAMPLE How successful teams deliver the right software Gojko Adzic MANNING Shelter Island Brief Contents 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Preface xiii Acknowledgments xxii
Circuits and Boolean Expressions
Circuits and Boolean Expressions Provided by TryEngineering - Lesson Focus Boolean logic is essential to understanding computer architecture. It is also useful in program construction and Artificial Intelligence.
