Judging a Movie by its Poster using Deep Learning

Similar documents
Programming Exercise 3: Multi-class Classification and Neural Networks

Implementation of Neural Networks with Theano.

Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks

Lecture 8 February 4

Improving Deep Neural Network Performance by Reusing Features Trained with Transductive Transference

Simplified Machine Learning for CUDA. Umar

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Supporting Online Material for

Predict Influencers in the Social Network

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Movie Classification Using k-means and Hierarchical Clustering

Visualizing Higher-Layer Features of a Deep Network

Follow links Class Use and other Permissions. For more information, send to:

The Scientific Data Mining Process

Linear Threshold Units

Temporal Difference Learning in the Tetris Game

Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning CMU-10701

CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015

Efficient online learning of a non-negative sparse autoencoder

Data Mining Algorithms Part 1. Dejan Sarka

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Neural Networks and Support Vector Machines

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

Predicting Movie Revenue from IMDb Data

STA 4273H: Statistical Machine Learning

Predicting User Preference for Movies using NetFlix database

Analecta Vol. 8, No. 2 ISSN

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

Advanced analytics at your hands

Unsupervised and Transfer Learning Challenge: a Deep Learning Approach

Chapter 4: Artificial Neural Networks

Lecture 6. Artificial Neural Networks

Pattern Recognition and Prediction in Equity Market

Neural network software tool development: exploring programming language options

Introduction to Logistic Regression

Manifold Learning with Variational Auto-encoder for Medical Image Analysis

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Data Mining. Nonlinear Classification

A Look Into the World of Reddit with Neural Networks

IFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...

Data quality in Accounting Information Systems

Application of Neural Network in User Authentication for Smart Home System

CSCI567 Machine Learning (Fall 2014)

How To Use Neural Networks In Data Mining

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network

Chapter 6. The stacking ensemble approach

Homework 4 Statistics W4240: Data Mining Columbia University Due Tuesday, October 29 in Class

Data, Measurements, Features

Novelty Detection in image recognition using IRF Neural Networks properties

Machine Learning and Pattern Recognition Logistic Regression

Data Mining Practical Machine Learning Tools and Techniques

Introduction to Learning & Decision Trees

Big Data Analytics CSCI 4030

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Back Propagation Neural Networks User Manual

Data Mining Techniques Chapter 7: Artificial Neural Networks

Training Methods for Adaptive Boosting of Neural Networks for Character Recognition

CSC 411: Lecture 07: Multiclass Classification

Employer Health Insurance Premium Prediction Elliott Lui

Less naive Bayes spam detection

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

Data Mining for Knowledge Management. Classification

Data Mining - Evaluation of Classifiers

Classification and Prediction

Active Learning SVM for Blogs recommendation

Taking Inverse Graphics Seriously

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Classification using Logistic Regression

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

TV and Movie Product Placement Segments

Statistical Models in Data Mining

Neural Network Add-in

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Firefox, Opera, Safari for Windows BMP file handling information leak. September Discovered by: Mateusz j00ru Jurczyk, Hispasec Labs

Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Data Mining Techniques for Prognosis in Pancreatic Cancer

Machine Learning: Multi Layer Perceptrons

Steven C.H. Hoi School of Information Systems Singapore Management University

Web Document Clustering

Face Recognition For Remote Database Backup System

PhD in Computer Science and Engineering Bologna, April Machine Learning. Marco Lippi. Marco Lippi Machine Learning 1 / 80

Spam Detection A Machine Learning Approach

Self Organizing Maps: Fundamentals

Spark: Cluster Computing with Working Sets

degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].

Server Load Prediction

Feedforward Neural Networks and Backpropagation

Classification Problems

A Logistic Regression Approach to Ad Click Prediction

Research Article Distributed Data Mining Based on Deep Neural Network for Wireless Sensor Network

Social Media Mining. Data Mining Essentials

A Simple Feature Extraction Technique of a Pattern By Hopfield Network

3D Object Recognition using Convolutional Neural Networks with Transfer Learning between Input Channels

Transfer Learning for Latin and Chinese Characters with Deep Neural Networks

Transcription:

Brett Kuprel kuprel@stanford.edu Abstract It is often the case that a human can determine the genre of a movie by looking at its movie poster. This task is not trivial for computers. A recent advance in machine learning called deep learning allows algorithms to learn important features from large datasets. Rather than analyzing an image pixel by pixel, for example, higher level features can be used for classification. In this project I attempted to train a neural network of stacked autoencoders to predict a movie s genre given an image of its movie poster. My hypothesis is that a good algorithm can correctly guess the genre based on the movie poster at least half the time. 1. Introduction A neuron is a computational unit that takes as input x R n and outputs an activation h(x) = f(w T x + b), where f is a sigmoidal function. A neural network is a network of these neurons. See the example in figure 1 from the UFLDL tutorial (Ng et al., 2010). 1.1. Forward Propagation A neural network performs a computation on an input x by forward propagation. Let a (l) R n l be the set of activations (i.e. outputs) of the n l neurons and W (l) the matrix of weight vectors w for layer l. We have the following recursion relationship: a (l+1) = f(w (l) a (l) + b (l) ) (1) To determine the final hypotheses h given an input x, iteratively apply this recursion, starting with a (0) = x 1.2. Autoencoders An autoencoder is a neural network that takes as input x [0, 1] n, maps it to a latent representation y [0, 1] n, and finally outputs z [0, 1] n, a reconstructed version of x. If the input is interpreted as bit vectors, Figure 1. Top: a single neuron. Bottom: a neural network (specifically a feedforward network) the reconstruction error can be measured by the cross entropy J(x, z) = (x log z + (1 x) log(1 z)) (2) When n < n, the latent layer y can be thought of as a lossy compression of x. It does not generalize for arbitrary x, but this is usually ok since many datasets lie on lower dimensional manifolds. Natural images for instance are a very small subset of all possible images. 2. Methods 2.1. Model Let each movie poster be a vector x (i) R n where n is the number of pixels in the image. Each movie belongs to at most 3 genres. I express this as a boolean vector y (i) R G where G is the set of genres, and y (i) j = 1 if movie i belongs to genre j, 0 otherwise.

is indexed by the boolean matrix Y. This is still a differentiable function in the parameters W and b. 2.2. Learning Parameters Backpropagation is a greedy method used to train the weights in a neural network. It involves using gradient descent to update the parameters, W (l) ij = α J W (l) ij, b (l) i = α J b (l) i (7) Many times gradients are messy to derive, and do not provide much insight into the problem at hand. There is a package in python that I used called Theano that calculates these gradients and applies updates to the model parameters W and b behind the scenes. 2.3. Stacking Autoencoders Figure 2. an autoencoder The algorithm will produce a single genre prediction ŷ G. Define the prediction as the argmax of the conditional probability distribution: ŷ (i) = arg max P (Y = j x (i) ) (3) j and the CPT as the softmax of the final hypothesis layer of the network P (Y = j x (i) ) = exp(h j) exp(h) (4) Where h R G is found by forward propagation of x (i) through the network. The goal is to minimize prediction error rate. Define an error to occur when the predicted genre for some movie i is not in the set of genres that movie i belongs to: % Error = 1 D 1 y (i) [ŷ (i) ] (5) i D This error rate function is not differentiable in the model parameters W and b because of the argmax expression to find ŷ. To train the network, I instead try to minimize the negative log likelihood: J(W, b) = log P (Y = j X)[Y ] (6) Where the ith rows of X and Y are x (i) and y (i), and the sum is over all elements. Notice the CPT matrix Before applying backpropagation, it would be nice if W and b were initialized to something reasonable. A known problem with training neural networks is diffusion of gradients. When back propagation is run from scratch, only the nodes close to the final layer will be updated properly. A greedy method for initializing W and b is by stacking auto encoders. The idea is simple: train an autoencoder on a set of data X. Use the learned feature representations as the input for another autoencoder. Repeat until you have as many layers as you want. For this project I had 3 latent layers (aside from the initial data and hypothesis layers). This process results in a reasonable initialization of the weights W and b. It also allows unlabeled data to be used effectively for feature learning. Of the movie poster images I had, very few than 1/6 had genre labels. 2.4. Getting the Data On IMDB there is a link that goes to a random popular movie. http : //www.imdb.com/random/title Using this link, I can obtain the movie rating and poster for N movies as shown in algorithm 1. I used the BeautifulSoup package in python to scrape the HTML. This algorithm seems to exhaust IMDB s random popular movie function around a little less than 1000 movies. At that point, the algorithm will visit close to 100 sites before seeing a movie that hasn t been scraped yet. There is another website, called Movie Poster DB that also has a random movie link, and claims to host over 100 thousand movie posters. While

Judging a Movie by its Poster using Deep Learning Algorithm 1 Scrape IMDB Input: desired number of movies N Output: dictionary M where a movie key m points to genre g, rating r, and movie poster p M {} U = http : //www.imdb.com/random/title while M < N do m = getmovietitle(u ) if m not in M then p getmovieposter(u ) g getmoviegenre(u ) r getmovierating(u ) M [m] { genre : g, rating : r, poster : p} end if end while these posters do not have ratings or genre labels, they can still be used for feature learning. I wrote a similar script for this website and was able to scrape 5,000 posters in just a few hours. Table 1. Genre counts for movies in IMDB dataset. One movie can belong to multiple genres Genre Drama Comedy Action Adventure Crime Thriller Sci-Fi Fantasy Romance Mystery Horror Animation Family Biography History Documentary War Sport Western Musical Count 365 247 234 178 170 135 102 90 89 79 54 53 49 30 23 18 16 11 4 3 playing around with different image sizes, I decided on 100 100. The change in aspect ratio doesn t really affect the poster as much as I would have expected. Also, I had to decide what to do with the color. It seems that color does not add enough information to warrant tripling the features (or reducing the number of pixels per image to 1/3). I used a luminosity function, gray = 0.299 red + 0.587 green + 0.114 blue to convert each image to grayscale. See figure 3 for the preprocessed IMDB dataset. 2.5. Preparing Data Let M be the dictionary returned by algorithm 1. I decided to split the data into a training set Dtrain, a validation set Dvalid, and a test set Dtest with sizes 80%, 10%, and 10%. Figure 3. Movie posters from IMDB, standardized to 100x100 pixels and converted to grayscale One frustration I ran into into while scraping posters was that there was no standard image shape. In order to apply most machine learning algorithms, each data point should have the same set of features (i.e. same image size). In the PIL package in python, there is a function PIL.Image.resize(new size) which converts any size image to any other size image. After 2.6. Implementing a Neural Net I used a package for python called Theano (Bergstra et al., 2010) designed for deep learning. It simplifies running algorithms on the GPU. The same code written for the CPU will work on the GPU (as long as floats are used). Among other things, it uses a lazy evaluation technique, and performs symbolic differentiation. I found an example of using stacked autoencoders to classify the MNIST handwritten digits dataset. I used this as starter code to build by movie poster classifier.

Judging a Movie by its Poster using Deep Learning Figure 5. Negative log likelihood set is shown in figure 5 The images that most highly activate the neurons in the 3rd layer are shown in figure 6. Notice, a few of them look like faces. The first and second layer features were less exciting so I did not include them. Also, my code is split across many files and I decided to omit it from the report. Please email me if you want any part or all of it. Figure 4. Training speedup using GPU 3. Results I scraped a total of 5,800 images, each 100 by 100 pixels grayscale. 800 of the images are shown in figure 3. 5,000 of the images have no genre labels. The remaining 800 have genre labels distributed as shown in table 1. Each movie has anywhere between 1 and 3 genre labels associated with it. Training a 3 layer (layers of 1000, 500, 300 nodes) architecture topped with a layer of multiclass logistic regression results in a validation set error rate of 47% and a test set error percentage of 49.5%. This means that given a movies poster, the algorithm can correctly predict on of its genre out of 20 possible genres about 50% of the time. Note that if drama is guessed every time, the algorithm will predict the genre correctly 45.6% of the time. A plot of the negative log likelihood over number of iterations through the training Figure 6. Learned features in the 3rd hidden layer

4. Conclusion It was difficult to implement a deep neural network for the first time in 1 week. That said, I think my neural network suffered from the curse of dimensionality. My images were 100 by 100 pixels for a total of 10,000 variables per training example, and I only had 5000 training examples. A smarter method might be to cut each poster into patches and then classify using a voting scheme amongst the patches in the poster. Another method could be to simply use lower resolution movie posters. References Bergstra, James, Breuleux, Olivier, Bastien, Frédéric, Lamblin, Pascal, Pascanu, Razvan, Desjardins, Guillaume, Turian, Joseph, Warde-Farley, David, and Bengio, Yoshua. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010. Oral Presentation. Ng, Andrew, Ngiam, Jiquan, Foo, Chuan Y., Mai, Yifan, and Suen, Caroline. Ufldl tutorial, 2010. URL http://ufldl.stanford.edu/wiki/ index.php/ufldl_tutorial.