Various applications of restricted Boltzmann machines for bad quality training data



Similar documents
Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network

Introduction to Deep Learning Variational Inference, Mean Field Theory

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Deep learning applications and challenges in big data analytics

Simple and efficient online algorithms for real world applications

Visualizing Higher-Layer Features of a Deep Network

A Practical Guide to Training Restricted Boltzmann Machines

Introduction to Machine Learning CMU-10701

BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. Aayush Agrawal

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs

Reinforcement Learning with Factored States and Actions

MapReduce Approach to Collective Classification for Networks

Chapter 4: Artificial Neural Networks

Programming Tools based on Big Data and Conditional Random Fields

Classification algorithm in Data mining: An Overview

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Machine Learning over Big Data

A Stochastic 3MG Algorithm with Application to 2D Filter Identification

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Data Mining Techniques Chapter 7: Artificial Neural Networks

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

Web Mining using Artificial Ant Colonies : A Survey

Simplified Machine Learning for CUDA. Umar

Data Mining Practical Machine Learning Tools and Techniques

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

Globally Optimal Crowdsourcing Quality Management

Online Semi-Supervised Learning

Supporting Online Material for

Cell Phone based Activity Detection using Markov Logic Network

Intrusion Detection via Machine Learning for SCADA System Protection

Introduction to Learning & Decision Trees

Tracking Groups of Pedestrians in Video Sequences

Active Learning SVM for Blogs recommendation

HUAWEI Advanced Data Science with Spark Streaming. Albert Bifet

called Restricted Boltzmann Machines for Collaborative Filtering

Generative versus discriminative training of RBMs for classification of fmri images

Topic models for Sentiment analysis: A Literature Survey

arxiv: v2 [cs.lg] 9 Apr 2014

Interactive Machine Learning. Maria-Florina Balcan

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

ClusterOSS: a new undersampling method for imbalanced learning

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Average rate of change of y = f(x) with respect to x as x changes from a to a + h:

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Survey of the Mathematics of Big Data

Deep Learning for Multivariate Financial Time Series. Gilberto Batres-Estrada

The Basics of Graphical Models

Gibbs Sampling and Online Learning Introduction

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu

Big Data and Transactional Databases Exploding Data Volume is Creating New Stresses on Traditional Transactional Databases

Modern (Computational) Approaches to Big Data Analytics. CSC 576 Computer Science, University of Rochester Instructor: Ji Liu

Journée Thématique Big Data 13/03/2015

Course: Model, Learning, and Inference: Lecture 5

Linear Threshold Units

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Online Algorithms: Learning & Optimization with No Regret.

Introduction to Markov Chain Monte Carlo

Pixels Description of scene contents. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT) Banksy, 2006

A Fast Learning Algorithm for Deep Belief Nets

Unit 22 Big Data analytics

Making Sense of the Mayhem: Machine Learning and March Madness

1 Maximum likelihood estimation

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University

Statistical Machine Learning from Data

Using Random Forest to Learn Imbalanced Data

A Survey on Web Mining From Web Server Log

Distributed forests for MapReduce-based machine learning

Statistical machine learning, high dimension and big data

MACHINE LEARNING BRETT WUJEK, SAS INSTITUTE INC.

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

CSE 473: Artificial Intelligence Autumn 2010

Big Data: Image & Video Analytics

How To Identify A Churner

6.2.8 Neural networks for data mining

Product Guide. Sawmill Analytics, Swindon SN4 9LZ UK tel:

A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

TIETS34 Seminar: Data Mining on Biometric identification

Improving Decision Making and Managing Knowledge

System Behavior Analysis by Machine Learning

Random Forest Based Imbalanced Data Cleaning and Classification

Statistical Challenges with Big Data in Management Science

Programming Exercise 3: Multi-class Classification and Neural Networks

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013

Recurrent Neural Networks

International Journal of Innovative Research in Computer and Communication Engineering

Exploring Big Data in Social Networks

E-commerce Transaction Anomaly Classification

Transcription:

Wrocław University of Technology Various applications of restricted Boltzmann machines for bad quality training data Maciej Zięba Wroclaw University of Technology 20.06.2014

Motivation Big data - 7 dimensions1 Volume: size of data. Velocity: speed, displacement of data. Variety: diversity of data. Viscosity: measures the resistance to flow in the volume of data. Virality: measures how fast data is distributed unique and shared between nodes in a network (e.g. the Internet). Veracity: trust and quality of the data. Value: what is the added value that Big Data should bring? 1 According to ATOS company 2/7

Motivation Big data - 7 dimensions1 Volume: size of data. Velocity: speed, displacement of data. Variety: diversity of data. Viscosity: measures the resistance to flow in the volume of data. Virality: measures how fast data is distributed unique and shared between nodes in a network (e.g. the Internet). Veracity: trust and quality of the data. Value: what is the added value that Big Data should bring? 1 According to ATOS company 2/7

Veracity of Data Typical problems with data - training context Imbalanced data problem. One class dominates another in the training data. Noisy labels problem. Some of the examples in training data contain incorrectly assigned labels. Missing values issue. Values of some features are unknown. Example of imbalanced data Unstructured data. The data is represented in unprocessed form: images, videos, documents, XML structures. Semi-supervised data. Some portion of training data is unlabelled. 3/7

Veracity of Data Typical problems with data - training context Imbalanced data problem. One class dominates another in the training data. Noisy labels problem. Some of the examples in training data contain incorrectly assigned labels. Missing values issue. Values of some features are unknown. Example of imbalanced data Unstructured data. The data is represented in unprocessed form: images, videos, documents, XML structures. Semi-supervised data. Some portion of training data is unlabelled. 3/7

e of experts, we take a convex com why PoE models might work better tha (if the expert has a value which is Methods Restricted Boltzmann Machines (RBM) with an arbitrary graph structure. The sh gh the model is actually symmetric and d n machine with a bipartite structure. No RBM is a bipartie Markov Random Field with visible and hidden units. The joint distribution of visible and hidden units is the Gibbs distribution: p(x, h θ) = 1 Z exp( E(x, h θ) ) For binary visible x {0, 1} D and hidden units h {0, 1} M th energy function is as follows: E(x, h θ) = x Wh b x c h, Because of no visible to visible, or hidden to hidden connection we have: p(x i = 1 h, W, b) = sigm ( W i h + b i ) p(h j = 1 x, W, c) = sigm ( (W j ) x + c j ) (b) V H 4/7

Methods RBM for imbalanced data Train the model on examples from minority class by application of MLL (scaled): 1 N log ( p(x N n=1 θ) ) = 1 N N log ( p(x n, h θ) ) n=1 Generate artificial examples X M m=1 using Synthetic Oversampling TEchnique (SMOTE). For each of the newly created example x m apply Gibbs sampling: h m p(h x m, θ) x m p(x h m, θ) h Label newly created example x m and store in training data. 5/7

Methods RBM for imbalanced data - example SMOTE procedure: A B 6/7

Methods RBM for imbalanced data - example SMOTE procedure: A Generating artificial examples on MNIST data: EXAMPLE 1 EXAMPLE 2 B SMOTE SMOTE RBM 6/7

RBM for other raw data issues Problem of missing values. RBM is trained for each of the classes separately. Gibbs sampling is applied to uncover unknown values. RBM models are iteratively updated while new training example is completed. Problem of noisy labels. RBM is trained for each of the classes separately. Each of the trained models is used as an oracle to detect uncorrected labelled data. Reconstruction error is used to determine unlabelled examples. Problem of unstructured data. RBM is used as domain-independent feature extractor that transforms raw data into hidden units. 7/7