Introduction to Reinforcement Learning

Similar documents
Self-Improving Supply Chains

Intelligent Flexible Automation

What is Artificial Intelligence?

A Sarsa based Autonomous Stock Trading Agent

Cloud Based Distributed Databases: The Future Ahead

Innovations in Big Data Analytics (Technical Insights)

THE MOST SOCIAL EVENTS FOR BRAND SPONSORS

Beyond Reward: The Problem of Knowledge and Data

Game Playing in the Real World. Next time: Knowledge Representation Reading: Chapter

Market Segmentation & Local Host Development

BEST PRACTICES RESEARCH INSERT COMPANY LOGO HERE

The Mathematics 11 Competency Test Percent Increase or Decrease

Technology Intelligence: A Powerful Tool for Competitive Advantage

Hyper-connectivity and Artificial Intelligence

INTRODUCTION: INTERESTING TIMES FOR WORKFLOW TECHNOLOGY

WHITEPAPER. A Data Analytics Plan: Do you have one? Five factors to consider on your analytics journey.

The Origin of Life. The Origin of Life. Reconstructing the history of life: What features define living systems?

Artificial Intelligence in Retail Site Selection

PART III. OPS-based wide area networks

CSC384 Intro to Artificial Intelligence

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

agucacaaacgcu agugcuaguuua uaugcagucuua

Goal Setting. Your role as the coach is to develop and maintain an effective coaching plan with the client. You are there to

Chess Algorithms Theory and Practice. Rune Djurhuus Chess Grandmaster / October 3, 2012

National Quali cations SPECIMEN ONLY

Music Mood Classification

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

SHARPCLOUD SECURITY STATEMENT

2015 Top 10 Information and Communication Technologies (Technical Insights)

Replication Study Guide

Content. CEO Statement 3. Solving for the Impossible 4. The Challenge 5. Optimizing Results is Complicated 6. Daisy s Theory of Retail TM 7

Embracing Microsoft Vista for Enhanced Network Security

There are a number of factors that increase the risk of performance problems in complex computer and software systems, such as e-commerce systems.

Pennsylvania Lottery Profit Report. As Required by Act 53 of 2008

Regular Expressions and Automata using Haskell

The Cyber Threat Profiler

Search Based Applications

Big Data, Physics, and the Industrial Internet! How Modeling & Analytics are Making the World Work Better."

TRADING WITH THE GUPPY MULTIPLE MOVING AVERAGE

White paper The future of Service Desks - vision

Chess Algorithms Theory and Practice. Rune Djurhuus Chess Grandmaster / September 23, 2014

Doubles Strategy by Robert Bj Jacobucci

Feature Factory: A Crowd Sourced Approach to Variable Discovery From Linked Data

Solution Showcase 2016

TD-Gammon, A Self-Teaching Backgammon Program, Achieves Master-Level Play

RelaViz Graph Visualization of Learned Relations Between Entities. Joel Ferstay - joelaf@cs.ubc.ca

The Evolution of Instagram Brand Adoption Study May, 2013

BrainMaster Macromedia Flash Player

Monte-Carlo Tree Search (MCTS) for Computer Go

CHAPTER 15: IS ARTIFICIAL INTELLIGENCE REAL?

Johan Hallberg Research Manager / Industry Analyst IDC Nordic Services & Sourcing Digital Transformation Global CIO Agenda

KEY KNOWLEDGE MANAGEMENT TECHNOLOGIES IN THE INTELLIGENCE ENTERPRISE

Game Changer The Impact of Cognitive Technology on Business and Financial Reporting. May 23, 2016

Bachelor Degree in Informatics Engineering Master courses

Network-Wide Capacity Planning with Route Analytics

Grade Level Expectations for the Sunshine State Standards

Introduction to Learning & Decision Trees

Notes from With Winning in Mind by Lanny Bassham Mental Management System

Software AG Fast Big Data Solutions. Come la gestione realtime dei dati abilita nuovi scenari di business per le Banche

Explore the Art of the Possible Discover how your company can create new business value through a co-innovation partnership with SAP

Typical programme structures for MSc programmes in the School of Computing Science

Higher Education Enrollment Marketing

How To Play Go Lesson 1: Introduction To Go

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Course 395: Machine Learning

Section 1.3 P 1 = 1 2. = P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., =

UNDERSTAND YOUR CLIENTS BETTER WITH DATA How Data-Driven Decision Making Improves the Way Advisors Do Business

CONTENTS. Introduction 3. IoT- the next evolution of the internet..3. IoT today and its importance..4. Emerging opportunities of IoT 5

Algorithms and optimization for search engine marketing

Traffiliate White Paper

Future Proofing the Data Center: A New Architecture for Innovation and Investment Protection Built on Carrier-Class Routers

CSE 517A MACHINE LEARNING INTRODUCTION

How to Learn Good Cue Orders: When Social Learning Benefits Simple Heuristics

The Social Cognitive perspective and Albert Bandura

THE PROFESSIONAL PASSPORT JOBS NETWORK

Visionet IT Modernization Empowering Change

Application of Artificial Intelligence for a Computerized Maintenance Scheduling System

The purposes of this experiment are to test Faraday's Law qualitatively and to test Lenz's Law.

Meeting the challenges of today s oil and gas exploration and production industry.

Machine Learning: Overview

Our Engine allows you to connect and sync your data. In a flash, connect the data that matters and is relevant to you, simply and easily.

The Birth of the Universe Newcomer Academy High School Visualization One

siemens.com/scada SIMATIC SCADA Systems Efficient to a new level Answers for industry.

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

Appendix A: Science Practices for AP Physics 1 and 2

Jitter Transfer Functions in Minutes

Software Engineering 4C03

2015 Global Performance Monitoring for Network Operators New Product Innovation Award

Biology Chapter 5 Test

A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling

Masters in Information Technology

Network-Wide Change Management Visibility with Route Analytics

Everything you need to know about flash storage performance

Protecting Resilience through Social Media Data - A White Paper

Dan French Founder & CEO, Consider Solutions

MTU s National Research Rankings, FY2001

IBM Deep Computing Visualization Offering

Time Series Forecasting Techniques

Future Trends in Technology.. Educational response

Enhance Production in 6 Steps Using Preventive Maintenance

Transcription:

Introduction to Reinforcement Learning Rich Sutton Reinforcement Learning and Artificial Intelligence Laboratory Department of Computing Science University of Alberta, Canada R L & A I

Part 1: Why?

The coming of artificial intelligence When people finally come to understand the principles of intelligence what it is and how it works well enough to design and create beings as intelligent as ourselves A fundamental goal for science, engineering, the humanities, for all mankind It will change the way we work and play, our sense of self, life, and death, the goals we set for ourselves and for our societies But it is also of significance beyond our species, beyond history It will lead to new beings and new ways of being, things inevitably much more powerful than our current selves

Milestones in the development of life on Earth year 14Bya 4.5Bya Milestone Big bang formation of the earth and solar system The Age of Replicators The Age of Design 3.7Bya 1.1Bya 1Mya origin of life on earth (formation of first replicators) DNA and RNA sexual reproduction multi-cellular organisms nervous systems Self-replicated things humans most prominent culture language agriculture, metal tools written language industrial revolution 100Kya 10Kya 5Kya 200ya technology 70ya computers nanotechnology? artificial intelligence super-intelligence Designed things most prominent

When will super-human AI come? Might it come in our lifetimes? Should we include its possibility in our research and career plans?

the computational mega-trend effective computation per $ increases exponentially, with a doubling time of 18-24 months this trend has held for the last sixty years and will continue for the foreseeable future

Logarithmic+Plot Logarithmic+Plot Logarithmic+Plot Logarithmic+Plot 16

The computational mega-trend Moore s law figure by Hans Moravec, 1998

The possibility of AI is near We are nearing an important milestone in the history of life on earth, the point at which we can construct machines with the potential for exhibiting an intelligence comparable to ours. -- David Waltz, 1988 (recent president of AAAI) Should occur in 2030 for $1000 We don t yet have the needed AI software (designs, ideas) But the hardware will be a tremendous economic spur to development of the ideas...perhaps at nearly the same time

What s an AI researcher to do? The prize understanding mind is great And may be within reach How then can we do something relevant to acquiring the prize? What strategy should we follow? What subproblem should we address? which issues should we consider, and which should we ignore?

Implications for AI Diminishes the importance of special-case solutions Amplifies the importance of general smart force solutions Data and computation becomes more important than leveraging human understanding witness the demise of classical AI and the rise of machine learning, search, and statistics Scalability of an algorithm the ability to leverage more computation becomes crucial

Do computational costs matter? As computation becomes vastly cheaper (but not free), do computation costs become less important? No. Quite the opposite Computational costs become more, not less, important because demand is essentially infinite performance will depend on how efficiently computation is used; other influences will decline in relative import

We have seen this story before In chess we thought human ideas were key, but it turned out (deep Blue 1997) that big, efficient, heuristic search was key In computer Go we thought human ideas were key, but it turned out (MCTS 2006 ) that big, sample-based search was key In natural language processing we thought that human-written rules were key, but it turned out (~1988) that statistical machine learning and big data were key In visual object recognition we thought human ideas were key, but it turned out (deep learning 2012 ) that big data sets, many parameters, and long training was key

Steady, exponential improvement (since MCTS, 2005) in the strength of the best computer Go programs Master Beginner 7 dan 6 dan 5 dan 4 dan 3 dan 2 dan 1 dan 1 kyu 2 kyu Progress In Computer Go Monte-Carlo Search Strength of sample-based search programs MoGo Zen CrazyStone Zen 3 kyu MoGo 4 kyu 5 kyu 6 kyu 7 kyu 8 kyu Strength of conventional search programs Handtalk Indigo Indigo CrazyStone Traditional Search 9 kyu 10 kyu 11 kyu 12 kyu 13 kyu Projecting the trend suggests a computer will be world champion by 2025 14 kyu 15 kyu 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Year 2011 2014 / 45

Two diverging approaches to AI Strong automation (computation & data rule) Weak methods (general) The AI gradually constructs an understanding of the world from data; this understanding may be opaque to the researcher Success is defined in a self-verifiable way, for example, as predicting and controlling the data Engineering/design (human understanding rules) Strong methods (specific) The researcher gradually understands the possible worlds, and makes the AI behave appropriately in each Success may be defined in terms unavailable to the AI, for example, as labelling objects in the same way a person would

A new view of the AI problem and its implications for (and against) solution methods Minds are real-time information processors interacting with a firehose of data from a complex and arbitrary world we must find scalable and general methods, to learn arbitrary stuff (no domain knowledge, no taking advantage of structure) We have immense computational resources, but it s never enough; the complexity of the world is always vastly greater we seek computationally frugal methods for finding approximate solutions (optimality is a distraction; relying on it is untenable) We have immense data, but not labeled examples we must be able to learn from unsupervised interaction with the world, a.k.a. self-labelling (no human labels, not even from the web)

the mind s first responsibility is real-time sensorimotor information processing Perception, action, & anticipation as fast and reactive as possible

AI slowly, tectonically, shifting toward scalability Increasing desire for: Generality Approximation Massive, efficient computation Learning without labels

The longest trend in AI There have always been two general approaches/directions Design. Use our intuition about how our intelligence works to engineer AIs that work similarly; leverage our human design abilities Meta-design. Design only general principles and general algorithms; leverage computation and data to determine the rest My reading of AI history is that the former has always been more appealing in the short run, but the latter more successful in the long run

Reinforcement learning (RL) and temporal-difference learning (TDL) are consilient with the new view RL is learning to control data TDL is learning to predict data Both are weak (general) methods Both proceed without human input or understanding Both are computationally cheap and thus potentially computationally massive