Efficient online learning of a non-negative sparse autoencoder
|
|
|
- Joleen Brooks
- 9 years ago
- Views:
Transcription
1 and Machine Learning. Bruges (Belgium), April 2010, d-side publi., ISBN Efficient online learning of a non-negative sparse autoencoder Andre Lemme, R. Felix Reinhart and Jochen J. Steil Research Institute for Cognition and Robotics (CoR-Lab), Bielefeld University {alemme, freinhar, jsteil}@cor-lab.uni-bielefeld.de Abstract. We introduce an efficient online learning mechanism for nonnegative sparse coding in autoencoder neural networks. In this paper we compare the novel method to the batch algorithm non-negative matrix factorization with and without sparseness constraint. We show that the efficient autoencoder yields to better sparseness and lower reconstruction errors than the batch algorithms on the MNIST benchmark dataset. 1 Introduction Unsupervised learning techniques that can find filters for relevant parts in images are particularly important for visual object classification [1]. Lee proposed a nonnegative matrix factorization (NMF) in [2] to produce a non-negative encoding of input images by combining a linear matrix factorization approach with nonnegativity constraints. Hoyer introduced an additional sparseness constraint to the factorization process in order to increase the sparseness of the produced encoding [3, 4]. Sparse codes enhance the probability of linear separability, a crucial feature for object classification [5]. The main drawbacks of matrix factorization approaches are high computational costs and the fact that the costly matrix factorization has to be redone each time new images are perceived. This is also true for the sort-of-online NMF introduced in [6]. Therefore, NMF is not suited for problems that require processing in real-time and life-long learning. A non-negative and online version of the PCA was introduced recently []. This approach addresses the problem of non-negativity and computational efficiency. However, PCA is intrinsically a non-sparse method. We propose a modified autoencoder model that encodes input images in a non-negative and sparse network state. We use logistic activation functions in the hidden layer in order to map neural activities to strictly positive outputs. Sparseness of the representation is achieved by an unsupervised self-adaptation rule, which adapts the non-linear activation functions based on the principle of intrinsic plasticity [8]. Non-negativity of connection weights is enforced by a novel, asymmetric regularization approach that punishes negative weights more than positive ones. Both online learning rules are combined, use only local information, and are very efficient. We compare the reconstruction performance as well as the generated codes of the introduced model to the non-negative offline algorithms NMF and non-negative matrix factorization with sparseness constraints (NMFSC) on the MNIST benchmark data set [9]. The efficient online implementation outperforms the offline algorithms with respect to reconstruction errors and sparseness of filters as well as code vectors. 1
2 and Machine Learning. Bruges (Belgium), April 2010, d-side publi., ISBN x h ˆx W W (a) Original (b) Network (c) Reconstruction Fig. 1: Autoencoder network with input image x (left) and reconstructed image ˆx (right). W W T is the tied weight matrix and h the hidden network state. 2 Non-negative sparse autoencoder network (NNSAE) We modify an autoencoder network in order to obtain non-negative and sparse encodings with only positive network weights from non-negative input data. The network architecture is shown in Fig. 1. We denote the input image by x Ê n and the network reconstruction by ˆx = W T f(wx), where W Ê m n is the weight matrix and f( ) are parameterized activation functions f i (g i )= (1 + exp( a i g i b i )) 1 [0, 1] with slopes a i and biases b i that are applied component-wise to the neural activities g = Wx (g Ê m ). Non-negative code vectors are already assured by the logistic activation functions f i (g i ). Learning to reconstruct: Learning of the autoencoder is based on minimizing the reconstruction error E = 1 N N i=1 x i ˆx i 2 on the training set. Note that W W T is the same matrix for encoding and decoding, i.e. the autoencoder has tied weights [10, 11], which reduces the number of parameters to be adapted. Then, learning boils down to a simple error correction rule for all training examples x w ij = w ij + η (x i ˆx i ) h j, (1) Δw ij = η (x i ˆx i ) h j + d( w ij ), (2) where w ij is the connection from hidden neuron j to the output neuron i and h j is the activation of neuron j. The decay function d( w ij ) will be discussed in the next section. We use an adaptive learning rate η = η ( h 2 + ɛ) 1 that scales the gradient step size η by the inverse network activity and is similar to backpropagation-decorrelation learning for reservoir networks [12]. Asymmetric decay enforces non-negative network parameters: In order to enforce non-negative weights, we introduce an asymmetric regularization approach. Typically, a Gaussian prior for the network weights is assumed, which is expressed by a quadratic cost term λ W 2 added to the error function E. Using gradient descent, the Gaussian prior results in a so called decay term 2
3 and Machine Learning. Bruges (Belgium), April 2010, d-side publi., ISBN d(w ij )= λw ij. We assume a virtually deformed Gaussian prior that is skewed with respect to the sign of the new computed weight w ij (eq. 1). Gradient descent then yields the following asymmetric, piecewise linear decay function: { α w ij if w ij < 0 d( w ij )= (3) β w ij else The main advantage of (3) is the parameterized decay behavior depending on the sign of the weight: it is possible to allow a certain degree of negative weights (0 α 1) or to prohibit negative weights completely (α = 1). The decay function (3) falls back to the case of a symmetric prior for α=β. Adaptation of non-linearities achieves sparse codes: In the context of the non-negative autoencoder, it is crucial to adapt the activation functions to the strictly non-negative input distribution caused by the non-negative weight and data matrix. We therefore apply an efficient, unsupervised and local learning scheme to the parameterized activation functions f i (g i ) that is known as intrinsic plasticity (IP) [8]. In addition, IP enhances the sparseness of the input representation in the hidden layer [13]. For more details about IP see [8, 13]. We use a small learning rate η IP = throughout the experiments and initialize the parameters of the activation functions a = 1 and b = 3 in order to accelerate convergence. IP provides a parameter to adjust the desired mean network activation, which we set to μ = 0.2 corresponding to a sparse network state. In addition, we use a constant bias decay Δb =Δb IP to further increase the sparseness of the encoding. 3 Encoding and decoding of handwritten digits We analyze the behavior of the non-negative sparse autoencoder (NNSAE) and compare the new NNSAE approach to the batch algorithms NMF and NMFSC on a benchmark data set. Dataset and network training: The MNIST data set is commonly used to benchmark image reconstruction and classification methods [14]. For the following experiments the MNIST-basic set [9] is used, which comprises trainings and test images of handwritten digits from 0 to 9 in a centered and normalized pixel format. We subdivide the training set into samples for learning and use the remaining 2000 examples as validation set. Some example digits from the test set are shown in the top row of Fig. 2. We use an autoencoder with 84 input neurons and 100 hidden neurons to encode and decode the input images. The weight matrix is initialized randomly according to a uniform distribution in [0.95, 1.05]. We set η =0.01 and ɛ=0.002 in (1). Each autoencoder is trained for 100 epochs, where the whole training set is presented to the model in each epoch. To account for the random network initialization, we present all results averaged over 10 trials. Impact of asymmetric decay: We first investigate the impact of the asymmetric decay function (3). The following variates are calculated to quantify the 3
4 and Machine Learning. Bruges (Belgium), April 2010, d-side publi., ISBN α=0 α=0.1 α=0.2 α=0.3 α=1.0 MSE STD s(h) s(w) Tab. 1: MSE with standard deviation on the test set and sparseness of the encodings H and network weights W depending on asymmetric decay rate α. model properties in dependence on α: (a) Pixelwise mean square error MSE = 1 MN N i=1 j=1 M (x i j ˆx i j) 2 (4) of the reconstructions, where M is the dimension of the data and N the number of samples. (b) Average sparseness of the code matrix H and filters W, where we estimate the sparseness of a vector v Ê n by s(v) =( n (Σ n i=1 v i )/ Σ n i=1 v2 i ))( n 1) 1. (5) This function was introduced by Hoyer [4] and rates the energy of a vector v with values between zero and one, where sparse vectors v are mapped to s(v) 0. Tab. 1 shows the MSE and the average sparseness of hidden states as well as connection weights depending on α while β 0. The results in Tab. 1 for the case without decay (α = 0) show that a certain amount of negative weights seems to be useful in terms of the reconstruction error. However, if we commit to non-negativity, it does not matter whether there is just a small set of negative (α =0.1) or only positive entries (α = 1) in the weight matrix. This indicates that a rather moderate non-negativity constraint causes the weight dynamics and the final solution to change completely. We therefore set α = 1 in the following experiments and focus on strictly non-negative filters w i Ê n. 4 Online NNSAE versus offline NMF and NMFSC The batch algorithms NMF and NMFSC solve the equation X = WH starting from the data matrix X and initial filters W and code matrix H. Then, iterative updates of the code and weight matrix are conducted in order to reduce the reconstruction error. We iterate the factorization algorithm 100 times on the training set to make the comparison with the NNSAE fair. We further set the sparseness parameters provided by NMFSC for the weight and code matrix to s(w)=0.9 ands(h)=0.35 according to the values obtained for the NNSAE. For more details about NMF and NMFSC see [2, 4]. To evaluate the generalization of NMF and NMFSC on a test set, we keep the trained filters W fixed and update only the code matrix H iteratively as in the training. 4
5 and Machine Learning. Bruges (Belgium), April 2010, d-side publi., ISBN MSE Sparseness Test Validation Train sp(h) sp(w) NMF 0.04± ± ± NMFSC 0.109± ± ± NNSAE 0.015± ± ± Tab. 2: Mean square reconstruction errors for training, validation and test set as well as sparseness of code and weight matrices for NMF, NMFSC and NNSAE. Fig. 2: 15 images from the test set (top row) with reconstructions (2nd row), code histogram (3rd row) and some filters (4th row) performed by the NMFSC. Reconstruction (5th row), code histogram (6th row) and some filters (th row) performed by the NNSAE. Comparison of reconstruction errors and sparseness: Tab. 2 gives a review of the performance of NMF/NMFSC and compares it to the NNSAE. The NNSAE outperforms NMF/NMFSC with respect to the MSE on the test set, sparseness of the code matrix H and sparseness of the weight matrix W. It is surprising that even NMFSC does not reach the sparse codes that we achieve with the NNSAE, although the constraint is set to be equal (s(h) = 0.35). We show for some examples the corresponding code histogram of NMFSC and NNSAE in Fig. 2. The sparseness of the filters w i does not differ significantly for the different methods, which seems to be the effect of the non-negative constraint itself. In Fig. 2 we show in the 4th and the last row some filters of NMFSC and NNSAE. The filters of both methods cover only blob-like areas, i.e. are local feature detectors, which corresponds to a sparse representation. 5
6 and Machine Learning. Bruges (Belgium), April 2010, d-side publi., ISBN Discussion and conclusion We present an autoencoder for efficient online learning of sparse and non-negative encodings. Therefore, we combine a mechanism to enhance the sparseness of encodings and an asymmetric decay function to create non-negative weights. The online trained autoencoder outperforms the offline techniques NMF/NMFSC when reconstructing images from the test set, underlining its generalization capabilities. Additionally, the autoencoder produces sparser codes compared to the offline methods while generating similar filters at the same time. Besides the life-long learning capability, the main advantage of the new approach is the efficient encoding of novel inputs: the state of the network has simply to be updated with the new input, where the matrix factorization approaches in contrast require a full optimization step on the new data. It is of particular interest to use the NNSAE in a stacked autoencoder network for pattern recognition in the future: Does the non-negative representation improve classification performance, and how is fine-tuning of the hierarchy, e.g. by backpropagation learning, affected by the asymmetric decay function? Also, the interplay of IP and synaptic plasticity in the context of the NNSAE has to be further investigated. References [1] M. Spratling. Learning Image Components for Object Recognition. Mach. Learn., [2] D. D. Lee, and H. S. Seung. Learning the parts of objects by nonnegative matrix factorization. Nature, pp , [3] P. O. Hoyer. Non-negative sparse coding. Neural Networks for Signal Processing XII, pp , [4] P. O. Hoyer. Non-negative Matrix Factorization with Sparseness Constraints. Journal of Machine Learning Research, pp , [5] Y-lan Boureau and Y. Lecun. Sparse Feature Learning for Deep Belief Networks. NIPS, pp. 1 8, 200. [6] B. Cao. Detect and Track Latent Factors with Online Nonnegative Matrix Factorization Matrix, pp , [] M. D. Plumbley and O. Erkki. A nonnegative PCA algorithm for independent component analysis. Neural Networks, pp. 66 6, [8] J. Triesch. A Gradient Rule for the Plasticity of a Neuron s Intrinsic Excitability. Neural Computation, pp. 65 0, [9] Variations of the MNIST database. [10] G. E. Hinton and S. Osindero. A fast learning algorithm for deep belief nets. Neural Computation, pp , [11] P. Vincent and H. Larochelle and Y. Bengio. Extracting and composing robust features with denoising autoencoders ICML 08, pp , [12] J. J. Steil. Backpropagation-Decorrelation: online recurrent learning with O(N) complexity Proc. IJCNN, pp , [13] J. Triesch. Synergies Between Intrinsic and Synaptic Plasticity Mechanisms. Neural Computation, pp , 200. [14] Y. Lecun and C. Cortes. The MNIST database of handwritten digits. 6
Supporting Online Material for
www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence
Visualizing Higher-Layer Features of a Deep Network
Visualizing Higher-Layer Features of a Deep Network Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent Dept. IRO, Université de Montréal P.O. Box 6128, Downtown Branch, Montreal, H3C 3J7,
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
Self Organizing Maps: Fundamentals
Self Organizing Maps: Fundamentals Introduction to Neural Networks : Lecture 16 John A. Bullinaria, 2004 1. What is a Self Organizing Map? 2. Topographic Maps 3. Setting up a Self Organizing Map 4. Kohonen
Manifold Learning with Variational Auto-encoder for Medical Image Analysis
Manifold Learning with Variational Auto-encoder for Medical Image Analysis Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill [email protected] Abstract Manifold
Novelty Detection in image recognition using IRF Neural Networks properties
Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,
Taking Inverse Graphics Seriously
CSC2535: 2013 Advanced Machine Learning Taking Inverse Graphics Seriously Geoffrey Hinton Department of Computer Science University of Toronto The representation used by the neural nets that work best
Efficient Learning of Sparse Representations with an Energy-Based Model
Efficient Learning of Sparse Representations with an Energy-Based Model Marc Aurelio Ranzato Christopher Poultney Sumit Chopra Yann LeCun Courant Institute of Mathematical Sciences New York University,
Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition
Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition Marc Aurelio Ranzato, Fu-Jie Huang, Y-Lan Boureau, Yann LeCun Courant Institute of Mathematical Sciences,
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski [email protected] Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Programming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
Class-specific Sparse Coding for Learning of Object Representations
Class-specific Sparse Coding for Learning of Object Representations Stephan Hasler, Heiko Wersing, and Edgar Körner Honda Research Institute Europe GmbH Carl-Legien-Str. 30, 63073 Offenbach am Main, Germany
Analecta Vol. 8, No. 2 ISSN 2064-7964
EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,
Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network
Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network Anthony Lai (aslai), MK Li (lilemon), Foon Wang Pong (ppong) Abstract Algorithmic trading, high frequency trading (HFT)
Simplified Machine Learning for CUDA. Umar Arshad @arshad_umar Arrayfire @arrayfire
Simplified Machine Learning for CUDA Umar Arshad @arshad_umar Arrayfire @arrayfire ArrayFire CUDA and OpenCL experts since 2007 Headquartered in Atlanta, GA In search for the best and the brightest Expert
Recurrent Neural Networks
Recurrent Neural Networks Neural Computation : Lecture 12 John A. Bullinaria, 2015 1. Recurrent Neural Network Architectures 2. State Space Models and Dynamical Systems 3. Backpropagation Through Time
Stochastic Pooling for Regularization of Deep Convolutional Neural Networks
Stochastic Pooling for Regularization of Deep Convolutional Neural Networks Matthew D. Zeiler Department of Computer Science Courant Institute, New York University [email protected] Rob Fergus Department
Modeling Pixel Means and Covariances Using Factorized Third-Order Boltzmann Machines
Modeling Pixel Means and Covariances Using Factorized Third-Order Boltzmann Machines Marc Aurelio Ranzato Geoffrey E. Hinton Department of Computer Science - University of Toronto 10 King s College Road,
Università degli Studi di Bologna
Università degli Studi di Bologna DEIS Biometric System Laboratory Incremental Learning by Message Passing in Hierarchical Temporal Memory Davide Maltoni Biometric System Laboratory DEIS - University of
Introduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Deep Learning Barnabás Póczos & Aarti Singh Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey
Chapter 4: Artificial Neural Networks
Chapter 4: Artificial Neural Networks CS 536: Machine Learning Littman (Wu, TA) Administration icml-03: instructional Conference on Machine Learning http://www.cs.rutgers.edu/~mlittman/courses/ml03/icml03/
Automated Stellar Classification for Large Surveys with EKF and RBF Neural Networks
Chin. J. Astron. Astrophys. Vol. 5 (2005), No. 2, 203 210 (http:/www.chjaa.org) Chinese Journal of Astronomy and Astrophysics Automated Stellar Classification for Large Surveys with EKF and RBF Neural
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Machine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
Monotonicity Hints. Abstract
Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: [email protected] Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology
Improving Deep Neural Network Performance by Reusing Features Trained with Transductive Transference
Improving Deep Neural Network Performance by Reusing Features Trained with Transductive Transference Chetak Kandaswamy, Luís Silva, Luís Alexandre, Jorge M. Santos, Joaquim Marques de Sá Instituto de Engenharia
Lecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
Lecture 6. Artificial Neural Networks
Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm
An Analysis of Single-Layer Networks in Unsupervised Feature Learning
An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates 1, Honglak Lee 2, Andrew Y. Ng 1 1 Computer Science Department, Stanford University {acoates,ang}@cs.stanford.edu 2 Computer
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].
1.3 Neural Networks 19 Neural Networks are large structured systems of equations. These systems have many degrees of freedom and are able to adapt to the task they are supposed to do [Gupta]. Two very
Forecasting of Economic Quantities using Fuzzy Autoregressive Model and Fuzzy Neural Network
Forecasting of Economic Quantities using Fuzzy Autoregressive Model and Fuzzy Neural Network Dušan Marček 1 Abstract Most models for the time series of stock prices have centered on autoregressive (AR)
Lecture 8 February 4
ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt
IFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...
IFT3395/6390 Historical perspective: back to 1957 (Prof. Pascal Vincent) (Rosenblatt, Perceptron ) Machine Learning from linear regression to Neural Networks Computer Science Artificial Intelligence Symbolic
UW CSE Technical Report 03-06-01 Probabilistic Bilinear Models for Appearance-Based Vision
UW CSE Technical Report 03-06-01 Probabilistic Bilinear Models for Appearance-Based Vision D.B. Grimes A.P. Shon R.P.N. Rao Dept. of Computer Science and Engineering University of Washington Seattle, WA
Introduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
Accurate and robust image superresolution by neural processing of local image representations
Accurate and robust image superresolution by neural processing of local image representations Carlos Miravet 1,2 and Francisco B. Rodríguez 1 1 Grupo de Neurocomputación Biológica (GNB), Escuela Politécnica
Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation
Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed A brief history of backpropagation
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
An Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia [email protected] Tata Institute, Pune,
3D Object Recognition using Convolutional Neural Networks with Transfer Learning between Input Channels
3D Object Recognition using Convolutional Neural Networks with Transfer Learning between Input Channels Luís A. Alexandre Department of Informatics and Instituto de Telecomunicações Univ. Beira Interior,
Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
Transfer Learning for Latin and Chinese Characters with Deep Neural Networks
Transfer Learning for Latin and Chinese Characters with Deep Neural Networks Dan C. Cireşan IDSIA USI-SUPSI Manno, Switzerland, 6928 Email: [email protected] Ueli Meier IDSIA USI-SUPSI Manno, Switzerland, 6928
Performance Evaluation of Artificial Neural. Networks for Spatial Data Analysis
Contemporary Engineering Sciences, Vol. 4, 2011, no. 4, 149-163 Performance Evaluation of Artificial Neural Networks for Spatial Data Analysis Akram A. Moustafa Department of Computer Science Al al-bayt
Least-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
Temporal Difference Learning in the Tetris Game
Temporal Difference Learning in the Tetris Game Hans Pirnay, Slava Arabagi February 6, 2009 1 Introduction Learning to play the game Tetris has been a common challenge on a few past machine learning competitions.
Learning multiple layers of representation
Review TRENDS in Cognitive Sciences Vol.11 No.10 Learning multiple layers of representation Geoffrey E. Hinton Department of Computer Science, University of Toronto, 10 King s College Road, Toronto, M5S
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
CS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
Deep Learning for Multivariate Financial Time Series. Gilberto Batres-Estrada
Deep Learning for Multivariate Financial Time Series Gilberto Batres-Estrada June 4, 2015 Abstract Deep learning is a framework for training and modelling neural networks which recently have surpassed
JPEG compression of monochrome 2D-barcode images using DCT coefficient distributions
Edith Cowan University Research Online ECU Publications Pre. JPEG compression of monochrome D-barcode images using DCT coefficient distributions Keng Teong Tan Hong Kong Baptist University Douglas Chai
Lecture 5: Variants of the LMS algorithm
1 Standard LMS Algorithm FIR filters: Lecture 5: Variants of the LMS algorithm y(n) = w 0 (n)u(n)+w 1 (n)u(n 1) +...+ w M 1 (n)u(n M +1) = M 1 k=0 w k (n)u(n k) =w(n) T u(n), Error between filter output
Neural Networks: a replacement for Gaussian Processes?
Neural Networks: a replacement for Gaussian Processes? Matthew Lilley and Marcus Frean Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand [email protected] http://www.mcs.vuw.ac.nz/
NEURAL NETWORKS AND REINFORCEMENT LEARNING. Abhijit Gosavi
NEURAL NETWORKS AND REINFORCEMENT LEARNING Abhijit Gosavi Department of Engineering Management and Systems Engineering Missouri University of Science and Technology Rolla, MO 65409 1 Outline A Quick Introduction
Clarify Some Issues on the Sparse Bayesian Learning for Sparse Signal Recovery
Clarify Some Issues on the Sparse Bayesian Learning for Sparse Signal Recovery Zhilin Zhang and Bhaskar D. Rao Technical Report University of California at San Diego September, Abstract Sparse Bayesian
Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error
An Introduction to Neural Networks
An Introduction to Vincent Cheung Kevin Cannons Signal & Data Compression Laboratory Electrical & Computer Engineering University of Manitoba Winnipeg, Manitoba, Canada Advisor: Dr. W. Kinsner May 27,
New Ensemble Combination Scheme
New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,
Subspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China [email protected] Stan Z. Li Microsoft
Machine Learning in FX Carry Basket Prediction
Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines
CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott
Implementation of Neural Networks with Theano. http://deeplearning.net/tutorial/
Implementation of Neural Networks with Theano http://deeplearning.net/tutorial/ Feed Forward Neural Network (MLP) Hidden Layer Object Hidden Layer Object Hidden Layer Object Logistic Regression Object
called Restricted Boltzmann Machines for Collaborative Filtering
Restricted Boltzmann Machines for Collaborative Filtering Ruslan Salakhutdinov [email protected] Andriy Mnih [email protected] Geoffrey Hinton [email protected] University of Toronto, 6 King
Why Does Unsupervised Pre-training Help Deep Learning?
Journal of Machine Learning Research 11 (2010) 625-660 Submitted 8/09; Published 2/10 Why Does Unsupervised Pre-training Help Deep Learning? Dumitru Erhan Yoshua Bengio Aaron Courville Pierre-Antoine Manzagol
A Computational Framework for Exploratory Data Analysis
A Computational Framework for Exploratory Data Analysis Axel Wismüller Depts. of Radiology and Biomedical Engineering, University of Rochester, New York 601 Elmwood Avenue, Rochester, NY 14642-8648, U.S.A.
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Sparse deep belief net model for visual area V2
Sparse deep belief net model for visual area V2 Honglak Lee Chaitanya Ekanadham Andrew Y. Ng Computer Science Department Stanford University Stanford, CA 9435 {hllee,chaitu,ang}@cs.stanford.edu Abstract
arxiv:1312.6034v2 [cs.cv] 19 Apr 2014
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps arxiv:1312.6034v2 [cs.cv] 19 Apr 2014 Karen Simonyan Andrea Vedaldi Andrew Zisserman Visual Geometry Group,
Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks
Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks Ph. D. Student, Eng. Eusebiu Marcu Abstract This paper introduces a new method of combining the
A Simple Feature Extraction Technique of a Pattern By Hopfield Network
A Simple Feature Extraction Technique of a Pattern By Hopfield Network A.Nag!, S. Biswas *, D. Sarkar *, P.P. Sarkar *, B. Gupta **! Academy of Technology, Hoogly - 722 *USIC, University of Kalyani, Kalyani
A Hybrid Neural Network-Latent Topic Model
Li Wan Leo Zhu Rob Fergus Dept. of Computer Science, Courant institute, New York University {wanli,zhu,fergus}@cs.nyu.edu Abstract This paper introduces a hybrid model that combines a neural network with
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network
Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network Qian Wu, Yahui Wang, Long Zhang and Li Shen Abstract Building electrical system fault diagnosis is the
Applications to Data Smoothing and Image Processing I
Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is
/SOLUTIONS/ where a, b, c and d are positive constants. Study the stability of the equilibria of this system based on linearization.
echnische Universiteit Eindhoven Faculteit Elektrotechniek NIE-LINEAIRE SYSEMEN / NEURALE NEWERKEN (P6) gehouden op donderdag maart 7, van 9: tot : uur. Dit examenonderdeel bestaat uit 8 opgaven. /SOLUIONS/
LEUKEMIA CLASSIFICATION USING DEEP BELIEF NETWORK
Proceedings of the IASTED International Conference Artificial Intelligence and Applications (AIA 2013) February 11-13, 2013 Innsbruck, Austria LEUKEMIA CLASSIFICATION USING DEEP BELIEF NETWORK Wannipa
Visualization by Linear Projections as Information Retrieval
Visualization by Linear Projections as Information Retrieval Jaakko Peltonen Helsinki University of Technology, Department of Information and Computer Science, P. O. Box 5400, FI-0015 TKK, Finland [email protected]
High Quality Image Magnification using Cross-Scale Self-Similarity
High Quality Image Magnification using Cross-Scale Self-Similarity André Gooßen 1, Arne Ehlers 1, Thomas Pralow 2, Rolf-Rainer Grigat 1 1 Vision Systems, Hamburg University of Technology, D-21079 Hamburg
CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER
93 CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER 5.1 INTRODUCTION The development of an active trap based feeder for handling brakeliners was discussed
NEURAL NETWORKS A Comprehensive Foundation
NEURAL NETWORKS A Comprehensive Foundation Second Edition Simon Haykin McMaster University Hamilton, Ontario, Canada Prentice Hall Prentice Hall Upper Saddle River; New Jersey 07458 Preface xii Acknowledgments
These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks
This version: December 12, 2013 Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks Lawrence Takeuchi * Yu-Ying (Albert) Lee [email protected] [email protected] Abstract We
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
Learning Shared Latent Structure for Image Synthesis and Robotic Imitation
University of Washington CSE TR 2005-06-02 To appear in Advances in NIPS 18, 2006 Learning Shared Latent Structure for Image Synthesis and Robotic Imitation Aaron P. Shon Keith Grochow Aaron Hertzmann
3 An Illustrative Example
Objectives An Illustrative Example Objectives - Theory and Examples -2 Problem Statement -2 Perceptron - Two-Input Case -4 Pattern Recognition Example -5 Hamming Network -8 Feedforward Layer -8 Recurrent
Keywords: Image complexity, PSNR, Levenberg-Marquardt, Multi-layer neural network.
Global Journal of Computer Science and Technology Volume 11 Issue 3 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: 0975-4172
Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
: Explicit Invariance During Feature Extraction Salah Rifai (1) Pascal Vincent (1) Xavier Muller (1) Xavier Glorot (1) Yoshua Bengio (1) (1) Dept. IRO, Université de Montréal. Montréal (QC), H3C 3J7, Canada
Deep learning applications and challenges in big data analytics
Najafabadi et al. Journal of Big Data (2015) 2:1 DOI 10.1186/s40537-014-0007-7 RESEARCH Open Access Deep learning applications and challenges in big data analytics Maryam M Najafabadi 1, Flavio Villanustre
Pedestrian Detection with RCNN
Pedestrian Detection with RCNN Matthew Chen Department of Computer Science Stanford University [email protected] Abstract In this paper we evaluate the effectiveness of using a Region-based Convolutional
From extreme learning machines to reservoir computing: random projections and dynamics for motion learning
From extreme learning machines to reservoir computing: random projections and dynamics for motion learning u y q Jochen Steil 2011, Tutorial at CBIC Biology Physics Bielefeld Environment is part of: Intelligent
Deep Learning with H2O. Arno Candel & Viraj Parmar
Deep Learning with H2O Arno Candel & Viraj Parmar Published by H2O 2015 Deep Learning with H2O Arno Candel & Viraj Parmar February 2015: Second Edition Contents 1 What is H2O? 4 2 2.1 2.2 2.3 Introduction
2 Forecasting by Error Correction Neural Networks
Keywords: portfolio management, financial forecasting, recurrent neural networks. Active Portfolio-Management based on Error Correction Neural Networks Hans Georg Zimmermann, Ralph Neuneier and Ralph Grothmann
SOLVING LINEAR SYSTEMS
SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis
