# Temporal Difference Learning in the Tetris Game

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Temporal Difference Learning in the Tetris Game Hans Pirnay, Slava Arabagi February 6, Introduction Learning to play the game Tetris has been a common challenge on a few past machine learning competitions. Most have achieved tremendous success by using a heuristic cost function during the game and optimizing each move based on those heuristics. This approach dates back to 1990 s when it has been shown to eliminate successfully a couple hundred lines. However, we decided to completely depart from defining heuristics for the game and implement a Know-Nothingism approach, thereby excluding any human factor and therefore error/bias/pride in teaching the computer to play efficiently. Instead we use a combination of value iteration based on the Temporal Difference (TD) methodology in combination with a Neural Network to have the computer learn to play Tetris with minimal human interaction. 2 Theory 2.1 Temporal Difference Basis The bulk theory is largely based on the general TD methodology and its implementation in the game of Backgammon [1]. We define an experience tuple < s, a, r, s > at each move representing the current state (s) of the field configuration, the action that is taken in the state (a), the reward obtained for the move (r) and the next state (s ). The state is defined as the configuration of the playing field. The function V (s), representing the value function of a state s, is defined as the discounted value of all rewards attained from a given state onward if the optimal strategy is chosen. Intuitively, the value function represents how strategically favorable is for the game to achieve that state. Note, that we are not trying to minimize or maximize the function V (s) but merely to estimate it, implying that the game itself possesses a true value function V true (s) intrinsic for the game, the goal for the learning algorithm is to converge to that function. The methodology of convergence to that function is based on the state update step. Suppose the playing field is in a configuration and a Tetris piece is generated for a new move, for each possible action of placing the new piece a, the value function V (s ) is evaluated based on the configuration of the playing field following that action. The optimal move is then chosen as argmax (r + V (s )), denote it V (s ). Once the best next state has been chosen the value function of the current state, a 1

2 V (s) is updated based on the Sutton TD formula [2]: V (s) = V (s) + α(r + γv (s ) V (s)), (1) where the expression r + γv (s ) represents the estimated value function of the next state, and α is the learning rate. The γ < 1 parameter expresses that there is uncertainty in the evaluation of the next state and the algorithm shouldn t fully trust it. The reward is defined to be high when a row is eliminated, and negative when the game is lost. The value function is only updated based on the most optimal next action. In such a manner, the estimated V (s) should increase for states that can result in an eliminated line and greatly decrease for the actions that result in loss. 2.2 Mapping States to Values via a Neural Network The value function role of mapping states to a value representing their progress toward eliminating a row is done via a Neural Network, similar to the one implemented in [1]. The details of the neural network are presented in below sections. The general algorithm of 1 is implemented in the neural network in the following manner. At each move of the game, after the optimal state V (s ) has been calculated, the neural network weights are updated by the equation: w m+1 w m = η m λ k m net k w, (2) where η is the network learning rate and net m w is the gradient of the network output with respect to the neuron weights at move m. This step is a modified back-propagation routine for training neural networks. The improvement lies in modifying the direction of the gradient step, essentially instead of only considering the the gradient of the latest step net m w, the algorithm keeps track of all the previous descent directions and scaling them by λ k m incorporates them in the descent/ascent direction. This combinatorial scaling of the descent directions is accomplished by a recursive function e of the following form: k=0 e m+1 = e m λ + net m w. (3) In the above equation, λ is the effective momentum of the computation. The value of e m is maintained together with the weights w m of each neuron and updated every time during the learning state in a recursive fashion. The network is updated after every move and hence it is trained at the same rate. The back-propagation step is performed for one iteration only when a new input-output state is provided. In general, the described approach relies heavily on the ability to construct the value function using the neural network. Because the neural network only learns the states s that the game achieves, in order to converge to V true (s), the algorithm needs to be in every state at least a few times, implying that the game is required to be played many thousand, or even million, times. As we wanted as little human intrusion in the game as possible, no initial strategies were preset into the policy chooser, instead a randomization routine was implemented ensuring that the algorithm explored different moves earlier in the stages of training. Even with this exploration breadth addition, we expected a few million games to be played before the value function for each state converges. 2

3 3 Implementation Figure 1: Configuration of neural network used for game learning. In our implementation we used the Java Tetris simulator provided by Eric Whitman. 3.1 Implementation of the Neural Network Since all available free implementations of neural networks are designed for batch learning, we implemented our own feed-forward network. The input to the network is the raw encoding of the state which consists of 200 squares that can either be on (1) or off (0). The neural network, portrayed in Fig. 1, maps this input to a single decimal number as output, representing the value function V (s). Since our input and output structure is very similar to that of TD-Gammon (200 input neurons for Tetris as compared to 198 in TD-Gammon), we used 80 hidden units in one single hidden layer, alike [1]. The next piece was not included in the state information. For the structure, the neurons are fully interconnected with both their preceding and succeeding layer. The output functions of the neurons for the input and output layer are identity mappings. The hidden layer implements the sigmoid function: s(x) = exp( x). (4) The network is trained by the on-line modified back-propagation routine after each move as described above. 3

4 3.2 Modeling Reward Modeling the reward was accomplished as portrayed in table 1. The reward for each line is an obvious choice, since lines are the main goal of the game. The reward for each piece set was introduced to get around a very common problem that occurs in the beginning stages of every game, when pieces are practically set by chance. The Tetris game would build up all pieces in the leftmost column until the very top, where the cost of game-over forced it to place pieces to fill up the empty space, thereby delaying loosing and eliminating rows. Therefore, the first favorable states that the algorithm experienced were the ones that had a lot of holes at the bottom, and were built up to the very top. The reward for each line was meant to encourage a more even build up of lines right from the start, overall increasing the probability of line elimination. Event Reward Reason each move +1 to learn from the startout each deleted lin +5 the ultimate goal game over -10 an additional penalty Table 1: Rewards. 3.3 Exploration Exploration is a key ingredient for unsupervised learning. On the other hand, building the value function is based on the assumption that the best possible moves are made. Thus the two goals of approximating the value function correctly and exploring the state space have to be weighted against each other. This is done by sampling the next move according to the quality assigned to it by the value function. Define S in the following manner: S(l) l k=0 [ ] V (s(a k )) min V (s(a r)), (5) r A where s(a k ) is the state that is reached after taking action a k, and A is the set of all possible actions. Then the exploration rule is: For a random number R [0, 1], select action j, if S(j 1) S(n) R < S(j) S(n). (6) Intuitively the equation portrays that the move is sampled from the list of all possible moves according to their improvement over the (conceived) worst possible move. 4 Results The achieved results with this algorithm were somewhat unfavorable. We do, however, still believe strongly in the philosophy of Know-Nothingism and discuss the encountered computational and modeling-related issues below. 4

5 4.1 Numerical Results The results stated here were achieved using the parameters listed in table 2. Parameter Value Use η 0.01 step-size / learning rate λ 0.6 TD-parameter γ 0.9 Discount factor of future states Table 2: Parameters used in the algorithm. Table 3 shows the results for our network after a selected number of training games. N G is the number of games, N L is the average number of lines removed per game, and N E is the number of times a sub-optimal move is chosen for exploration purposes divided by the overall number of moves. The last number, which reflects the S function (eqn. 5, is very interesting because it provides insight about the change of the value function. The decreasing value implies that the function steadily grows more confident about the best move to make. N G [1/1000] N L N E Table 3: Results achieved by the playing algorithm. 4.2 Computational Problems The major problem, created because of slow implementation of the algorithm in Java and the natural complexity of the Tetris game, was presenting as many training examples to the game as necessary. While TD-Gammon needed 4.5 million games to converge, we only were able to play only 700, Modeling Errors and Complexity Discussion Our conceived modeling errors are mostly bound to our commitment to follow the very successful approach of TD-Gammon. Indeed, at first glance, the similarities (table 4) seem very striking. Property TD-Gammon TD-Tetris Bare state descriptors output 1 1 random branching each move 6 7 Problem Dimensionality Table 4: Comparison of TD-Gammon and TD-Tetris. There are, however, some differences, that make the application of the TD algorithm to Tetris 5

6 more difficult than to Backgammon. For one, there is the obvious problem of the much higher dimensionality of Tetris as compared to Backgammon. This problem shall be illustrated by the following numbers: The original TD-Gammon needed 4.5 million games for the neural network to converge. A normal game among humans takes about 50 to 60 moves, but TD-Gammon s initial games took much longer due to the lack of experience. From these numbers, one can estimate the number of moves approximately to 100 moves per game. Because TD-Gammon plays against itself during training, the neural network learns on a total of 900 million half-moves. Suppose now the most favorable case, that the necessity for training examples or TD-updates scales with log 10 of the number of possible states. This implies that Tetris would need 30 times as many updates as TD-Gammon, amounting to 900, 000, = 27, 000, 000, 000 updates. Given that Tetris updates at every move and there are 270 moves per game (which is much more than is reached in the beginning stages), Tetris would need 100,000,000 training games to achieve the same success levels as TD-Gammon. Considering that we only managed to get through 700,000 games, our algorithm has not even completed one percent of the training necessary, and should thus not be judged in unfair comparison. References [1] Taken from: [2] Richard S. Sutton, Learning to predict by the method of temporal differences, Machine Learning, 3:9-44,

### INTRODUCTION TO NEURAL NETWORKS

INTRODUCTION TO NEURAL NETWORKS Pictures are taken from http://www.cs.cmu.edu/~tom/mlbook-chapter-slides.html http://research.microsoft.com/~cmbishop/prml/index.htm By Nobel Khandaker Neural Networks An

### Chapter 4: Artificial Neural Networks

Chapter 4: Artificial Neural Networks CS 536: Machine Learning Littman (Wu, TA) Administration icml-03: instructional Conference on Machine Learning http://www.cs.rutgers.edu/~mlittman/courses/ml03/icml03/

### Assignment #1. Example: tetris state: board configuration + shape of the falling piece ~2 200 states! Recap RL so far. Page 1

Generalization and function approximation CS 287: Advanced Robotics Fall 2009 Lecture 14: Reinforcement Learning with Function Approximation and TD Gammon case study Pieter Abbeel UC Berkeley EECS Represent

### Learning to Play 33 Games: Neural Networks as Bounded-Rational Players Technical Appendix

Learning to Play 33 Games: Neural Networks as Bounded-Rational Players Technical Appendix Daniel Sgroi 1 Daniel J. Zizzo 2 Department of Economics School of Economics, University of Warwick University

### An Introduction to Neural Networks

An Introduction to Vincent Cheung Kevin Cannons Signal & Data Compression Laboratory Electrical & Computer Engineering University of Manitoba Winnipeg, Manitoba, Canada Advisor: Dr. W. Kinsner May 27,

### PMR5406 Redes Neurais e Lógica Fuzzy Aula 3 Multilayer Percetrons

PMR5406 Redes Neurais e Aula 3 Multilayer Percetrons Baseado em: Neural Networks, Simon Haykin, Prentice-Hall, 2 nd edition Slides do curso por Elena Marchiori, Vrie Unviersity Multilayer Perceptrons Architecture

### Learning to Play Board Games using Temporal Difference Methods

Learning to Play Board Games using Temporal Difference Methods Marco A. Wiering Jan Peter Patist Henk Mannen institute of information and computing sciences, utrecht university technical report UU-CS-2005-048

### TIME SERIES FORECASTING WITH NEURAL NETWORK: A CASE STUDY OF STOCK PRICES OF INTERCONTINENTAL BANK NIGERIA

www.arpapress.com/volumes/vol9issue3/ijrras_9_3_16.pdf TIME SERIES FORECASTING WITH NEURAL NETWORK: A CASE STUDY OF STOCK PRICES OF INTERCONTINENTAL BANK NIGERIA 1 Akintola K.G., 2 Alese B.K. & 2 Thompson

### TD-Gammon, A Self-Teaching Backgammon Program, Achieves Master-Level Play

TD-Gammon, A Self-Teaching Backgammon Program, Achieves Master-Level Play Gerald Tesauro IBM Thomas J. Watson Research Center P. O. Box 704 Yorktown Heights, NY 10598 (tesauro@watson.ibm.com) Abstract.

### Self Organizing Maps: Fundamentals

Self Organizing Maps: Fundamentals Introduction to Neural Networks : Lecture 16 John A. Bullinaria, 2004 1. What is a Self Organizing Map? 2. Topographic Maps 3. Setting up a Self Organizing Map 4. Kohonen

### Combining Q-Learning with Artificial Neural Networks in an Adaptive Light Seeking Robot

Combining Q-Learning with Artificial Neural Networks in an Adaptive Light Seeking Robot Steve Dini and Mark Serrano May 6, 2012 Abstract Q-learning is a reinforcement learning technique that works by learning

### NEURAL NETWORKS AND REINFORCEMENT LEARNING. Abhijit Gosavi

NEURAL NETWORKS AND REINFORCEMENT LEARNING Abhijit Gosavi Department of Engineering Management and Systems Engineering Missouri University of Science and Technology Rolla, MO 65409 1 Outline A Quick Introduction

### Learning. Artificial Intelligence. Learning. Types of Learning. Inductive Learning Method. Inductive Learning. Learning.

Learning Learning is essential for unknown environments, i.e., when designer lacks omniscience Artificial Intelligence Learning Chapter 8 Learning is useful as a system construction method, i.e., expose

### SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK N M Allinson and D Merritt 1 Introduction This contribution has two main sections. The first discusses some aspects of multilayer perceptrons,

### Neural Networks. CAP5610 Machine Learning Instructor: Guo-Jun Qi

Neural Networks CAP5610 Machine Learning Instructor: Guo-Jun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector

### degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].

1.3 Neural Networks 19 Neural Networks are large structured systems of equations. These systems have many degrees of freedom and are able to adapt to the task they are supposed to do [Gupta]. Two very

### Building MLP networks by construction

University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2000 Building MLP networks by construction Ah Chung Tsoi University of

### Package AMORE. February 19, 2015

Encoding UTF-8 Version 0.2-15 Date 2014-04-10 Title A MORE flexible neural network package Package AMORE February 19, 2015 Author Manuel Castejon Limas, Joaquin B. Ordieres Mere, Ana Gonzalez Marcos, Francisco

COPYRIGHT NOTICE: David A. Kendrick, P. Ruben Mercado, and Hans M. Amman: Computational Economics is published by Princeton University Press and copyrighted, 2006, by Princeton University Press. All rights

### Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems

### Motivation. Motivation. Can a software agent learn to play Backgammon by itself? Machine Learning. Reinforcement Learning

Motivation Machine Learning Can a software agent learn to play Backgammon by itself? Reinforcement Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut

### Lecture 8 February 4

ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt

### 6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

### Improving Generalization

Improving Generalization Introduction to Neural Networks : Lecture 10 John A. Bullinaria, 2004 1. Improving Generalization 2. Training, Validation and Testing Data Sets 3. Cross-Validation 4. Weight Restriction

### Efficient online learning of a non-negative sparse autoencoder

and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-93030-10-2. Efficient online learning of a non-negative sparse autoencoder Andre Lemme, R. Felix Reinhart and Jochen J. Steil

### Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

### Recurrent Neural Networks

Recurrent Neural Networks Neural Computation : Lecture 12 John A. Bullinaria, 2015 1. Recurrent Neural Network Architectures 2. State Space Models and Dynamical Systems 3. Backpropagation Through Time

### Tetris: Experiments with the LP Approach to Approximate DP

Tetris: Experiments with the LP Approach to Approximate DP Vivek F. Farias Electrical Engineering Stanford University Stanford, CA 94403 vivekf@stanford.edu Benjamin Van Roy Management Science and Engineering

### SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS

UDC: 004.8 Original scientific paper SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS Tonimir Kišasondi, Alen Lovren i University of Zagreb, Faculty of Organization and Informatics,

### Creating a NL Texas Hold em Bot

Creating a NL Texas Hold em Bot Introduction Poker is an easy game to learn by very tough to master. One of the things that is hard to do is controlling emotions. Due to frustration, many have made the

### CHAPTER 17. Linear Programming: Simplex Method

CHAPTER 17 Linear Programming: Simplex Method CONTENTS 17.1 AN ALGEBRAIC OVERVIEW OF THE SIMPLEX METHOD Algebraic Properties of the Simplex Method Determining a Basic Solution Basic Feasible Solution 17.2

### Neural Networks. Neural network is a network or circuit of neurons. Neurons can be. Biological neurons Artificial neurons

Neural Networks Neural network is a network or circuit of neurons Neurons can be Biological neurons Artificial neurons Biological neurons Building block of the brain Human brain contains over 10 billion

### Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS

Sensitivity Analysis 3 We have already been introduced to sensitivity analysis in Chapter via the geometry of a simple example. We saw that the values of the decision variables and those of the slack and

### EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

### arxiv:1112.0829v1 [math.pr] 5 Dec 2011

How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly

### November 28 th, Carlos Guestrin. Lower dimensional projections

PCA Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University November 28 th, 2007 Lower dimensional projections Rather than picking a subset of the features, we can new features that are combinations

### A fairly quick tempo of solutions discussions can be kept during the arithmetic problems.

Distributivity and related number tricks Notes: No calculators are to be used Each group of exercises is preceded by a short discussion of the concepts involved and one or two examples to be worked out

### On-line Policy Improvement using Monte-Carlo Search

On-line Policy Improvement using Monte-Carlo Search Gerald Tesauro IBM T. J. Watson Research Center P. O. Box 704 Yorktown Heights, NY 10598 Gregory R. Galperin MIT AI Lab 545 Technology Square Cambridge,

### ADVANTAGE UPDATING. Leemon C. Baird III. Technical Report WL-TR Wright Laboratory. Wright-Patterson Air Force Base, OH

ADVANTAGE UPDATING Leemon C. Baird III Technical Report WL-TR-93-1146 Wright Laboratory Wright-Patterson Air Force Base, OH 45433-7301 Address: WL/AAAT, Bldg 635 2185 Avionics Circle Wright-Patterson Air

### Data Mining Techniques Chapter 7: Artificial Neural Networks

Data Mining Techniques Chapter 7: Artificial Neural Networks Artificial Neural Networks.................................................. 2 Neural network example...................................................

### Inductive QoS Packet Scheduling for Adaptive Dynamic Networks

Inductive QoS Packet Scheduling for Adaptive Dynamic Networks Malika BOURENANE Dept of Computer Science University of Es-Senia Algeria mb_regina@yahoo.fr Abdelhamid MELLOUK LISSI Laboratory University

### Near Optimal Solutions

Near Optimal Solutions Many important optimization problems are lacking efficient solutions. NP-Complete problems unlikely to have polynomial time solutions. Good heuristics important for such problems.

### 1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09

1 Introduction Most of the course is concerned with the batch learning problem. In this lecture, however, we look at a different model, called online. Let us first compare and contrast the two. In batch

### Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

### Research of Digital Character Recognition Technology Based on BP Algorithm

Research of Digital Character Recognition Technology Based on BP Algorithm Xianmin Wei Computer and Communication Engineering School of Weifang University Weifang, China wfxyweixm@126.com Abstract. This

### Neural network software tool development: exploring programming language options

INEB- PSI Technical Report 2006-1 Neural network software tool development: exploring programming language options Alexandra Oliveira aao@fe.up.pt Supervisor: Professor Joaquim Marques de Sá June 2006

### Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

### Module 2. Problem Solving using Search- (Single agent search) Version 1 CSE IIT, Kharagpur

Module 2 Problem Solving using Search- (Single agent search) Lesson 6 Informed Search Strategies-II 3.3 Iterative-Deepening A* 3.3.1 IDA* Algorithm Iterative deepening A* or IDA* is similar to iterative-deepening

### An Early Attempt at Applying Deep Reinforcement Learning to the Game 2048

An Early Attempt at Applying Deep Reinforcement Learning to the Game 2048 Hong Gui, Tinghan Wei, Ching-Bo Huang, I-Chen Wu 1 1 Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan

### PLAANN as a Classification Tool for Customer Intelligence in Banking

PLAANN as a Classification Tool for Customer Intelligence in Banking EUNITE World Competition in domain of Intelligent Technologies The Research Report Ireneusz Czarnowski and Piotr Jedrzejowicz Department

### CS188 Spring 2011 Section 3: Game Trees

CS188 Spring 2011 Section 3: Game Trees 1 Warm-Up: Column-Row You have a 3x3 matrix of values like the one below. In a somewhat boring game, player A first selects a row, and then player B selects a column.

### Introduction to Neural Networks : Revision Lectures

Introduction to Neural Networks : Revision Lectures John A. Bullinaria, 2004 1. Module Aims and Learning Outcomes 2. Biological and Artificial Neural Networks 3. Training Methods for Multi Layer Perceptrons

### About the NeuroFuzzy Module of the FuzzyTECH5.5 Software

About the NeuroFuzzy Module of the FuzzyTECH5.5 Software Ágnes B. Simon, Dániel Biró College of Nyíregyháza, Sóstói út 31, simona@nyf.hu, bibby@freemail.hu Abstract: Our online edition of the software

### Learning Tetris Using the Noisy Cross-Entropy Method

NOTE Communicated by Andrew Barto Learning Tetris Using the Noisy Cross-Entropy Method István Szita szityu@eotvos.elte.hu András Lo rincz andras.lorincz@elte.hu Department of Information Systems, Eötvös

### Cheng Soon Ong & Christfried Webers. Canberra February June 2016

c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I

### A Gentle Introduction to Machine Learning

A Gentle Introduction to Machine Learning Second Lecture Part I Olov Andersson, AIICS Linköpings Universitet Outline of Machine Learning Lectures First we will talk about Supervised Learning Definition

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

Volume 3, No. 8, August 2012 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at www.jgrcs.info Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

### 6. Feed-forward mapping networks

6. Feed-forward mapping networks Fundamentals of Computational Neuroscience, T. P. Trappenberg, 2002. Lecture Notes on Brain and Computation Byoung-Tak Zhang Biointelligence Laboratory School of Computer

### Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

### Comparison of TDLeaf(λ) and TD(λ) Learning in Game Playing Domain

Comparison of TDLeaf(λ) and TD(λ) Learning in Game Playing Domain Daniel Osman and Jacek Mańdziuk Faculty of Mathematics and Information Science, Warsaw University of Technology, Plac Politechniki 1, -1

### Stochastic Models for Inventory Management at Service Facilities

Stochastic Models for Inventory Management at Service Facilities O. Berman, E. Kim Presented by F. Zoghalchi University of Toronto Rotman School of Management Dec, 2012 Agenda 1 Problem description Deterministic

### Analecta Vol. 8, No. 2 ISSN 2064-7964

EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,

### NET PRESENT VALUE SIMULATING WITH A SPREADSHEET

Journal of Defense Resources Management No. 1 (1) / 2010 NET PRESENT VALUE SIMULATING WITH A SPREADSHEET Maria CONSTANTINESCU Regional Department of Defense Resources Management Studies Abstract: Decision

### A Monte Carlo simulation method for system reliability analysis

A Monte Carlo simulation method for system reliability analysis MATSUOKA Takeshi 1, 2 1. College of Nuclear Science and Technology, Harbin Engineering University No.145-1, Nantong Street, Nangang District,

### Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster

Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster Jonatan Ward Sergey Andreev Francisco Heredia Bogdan Lazar Zlatka Manevska Eindhoven University of Technology,

### Using simulation to calculate the NPV of a project

Using simulation to calculate the NPV of a project Marius Holtan Onward Inc. 5/31/2002 Monte Carlo simulation is fast becoming the technology of choice for evaluating and analyzing assets, be it pure financial

### Lecture 6. Artificial Neural Networks

Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm

### Introduction to Statistics for Computer Science Projects

Introduction Introduction to Statistics for Computer Science Projects Peter Coxhead Whole modules are devoted to statistics and related topics in many degree programmes, so in this short session all I

### ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2015 These notes have been used before. If you can still spot any errors or have any suggestions for improvement, please let me know. 1

### Judging a Movie by its Poster using Deep Learning

Brett Kuprel kuprel@stanford.edu Abstract It is often the case that a human can determine the genre of a movie by looking at its movie poster. This task is not trivial for computers. A recent advance in

### Chapter 5: Solving General Linear Programs

Chapter 5: Solving General Linear Programs So far we have concentrated on linear programs that are in standard form, which have a maximization objective, all constraints of type, all of the right hand

### Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network Anthony Lai (aslai), MK Li (lilemon), Foon Wang Pong (ppong) Abstract Algorithmic trading, high frequency trading (HFT)

### 1 Description of The Simpletron

Simulating The Simpletron Computer 50 points 1 Description of The Simpletron In this assignment you will write a program to simulate a fictional computer that we will call the Simpletron. As its name implies

### OPTIMIZATION AND OPERATIONS RESEARCH Vol. IV - Markov Decision Processes - Ulrich Rieder

MARKOV DECISIO PROCESSES Ulrich Rieder University of Ulm, Germany Keywords: Markov decision problem, stochastic dynamic program, total reward criteria, average reward, optimal policy, optimality equation,

### When to use a Chi-Square test:

When to use a Chi-Square test: Usually in psychological research, we aim to obtain one or more scores from each participant. However, sometimes data consist merely of the frequencies with which certain

### Bank efficiency evaluation using a neural network-dea method

Iranian Journal of Mathematical Sciences and Informatics Vol. 4, No. 2 (2009), pp. 33-48 Bank efficiency evaluation using a neural network-dea method G. Aslani a,s.h.momeni-masuleh,a,a.malek b and F. Ghorbani

### Using Artifical Neural Networks to Model Opponents in Texas Hold'em

CMPUT 499 - Research Project Review Using Artifical Neural Networks to Model Opponents in Texas Hold'em Aaron Davidson email: davidson@cs.ualberta.ca November 28th, 1999 Abstract: This paper describes

### Neural Networks: a replacement for Gaussian Processes?

Neural Networks: a replacement for Gaussian Processes? Matthew Lilley and Marcus Frean Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand marcus@mcs.vuw.ac.nz http://www.mcs.vuw.ac.nz/

### (Refer Slide Time: 01:35)

Artificial Intelligence Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 6 Informed Search -2 Today we start our 6 th lecture. This

### Logistic Regression for Spam Filtering

Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used

### A Sarsa based Autonomous Stock Trading Agent

A Sarsa based Autonomous Stock Trading Agent Achal Augustine The University of Texas at Austin Department of Computer Science Austin, TX 78712 USA achal@cs.utexas.edu Abstract This paper describes an autonomous

### t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

### Lecture 1 Newton s method.

Lecture 1 Newton s method. 1 Square roots 2 Newton s method. The guts of the method. A vector version. Implementation. The existence theorem. Basins of attraction. The Babylonian algorithm for finding

### Using Markov Decision Processes to Solve a Portfolio Allocation Problem

Using Markov Decision Processes to Solve a Portfolio Allocation Problem Daniel Bookstaber April 26, 2005 Contents 1 Introduction 3 2 Defining the Model 4 2.1 The Stochastic Model for a Single Asset.........................

### Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

Errata for ASM Exam C/4 Study Manual (Sixteenth Edition) Sorted by Page 1 Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Practice exam 1:9, 1:22, 1:29, 9:5, and 10:8

### Introduction to Convex Optimization for Machine Learning

Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning

### Notes on Complexity Theory Last updated: August, 2011. Lecture 1

Notes on Complexity Theory Last updated: August, 2011 Jonathan Katz Lecture 1 1 Turing Machines I assume that most students have encountered Turing machines before. (Students who have not may want to look

### Chapter 6. Cuboids. and. vol(conv(p ))

Chapter 6 Cuboids We have already seen that we can efficiently find the bounding box Q(P ) and an arbitrarily good approximation to the smallest enclosing ball B(P ) of a set P R d. Unfortunately, both

### The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

### Fundamentals of Operations Research. Prof. G. Srinivasan. Department of Management Studies. Indian Institute of Technology, Madras. Lecture No.

Fundamentals of Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Lecture No. # 22 Inventory Models - Discount Models, Constrained Inventory

### Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows

TECHNISCHE UNIVERSITEIT EINDHOVEN Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows Lloyd A. Fasting May 2014 Supervisors: dr. M. Firat dr.ir. M.A.A. Boon J. van Twist MSc. Contents

### Machine Learning: Multi Layer Perceptrons

Machine Learning: Multi Layer Perceptrons Prof. Dr. Martin Riedmiller Albert-Ludwigs-University Freiburg AG Maschinelles Lernen Machine Learning: Multi Layer Perceptrons p.1/61 Outline multi layer perceptrons

### Cracking the Sudoku: A Deterministic Approach

Cracking the Sudoku: A Deterministic Approach David Martin Erica Cross Matt Alexander Youngstown State University Center for Undergraduate Research in Mathematics Abstract The model begins with the formulation

### Reinforcement Learning

Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Martin Lauer AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg martin.lauer@kit.edu

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Neural Networks and Learning Systems

Neural Networks and Learning Systems Exercise Collection, Class 9 March 2010 x 1 x 2 x N w 11 3 W 11 h 3 2 3 h N w NN h 1 W NN y Neural Networks and Learning Systems Exercise Collection c Medical Informatics,

### Introduction to Logistic Regression

OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction