# Evaluation of Machine Learning Techniques for Green Energy Prediction

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 arxiv: v1 [cs.lg] 14 Jun 2014 Evaluation of Machine Learning Techniques for Green Energy Prediction 1 Objective Ankur Sahai University of Mainz, Germany We evaluate Machine Learning techniques for Green energy (wind, solar and biomass) prediction based on weather forecasts. Weather is constituted by multiple attributes: temperature, cloud cover, wind speed/ direction which are discrete random variables. One of our objectives is to predict the weather based on the previous weather data. Additionally we are interested in finding correlation (dependencies in order to reduce the dimensionality of the data set) between these variables, predicting missing data predict deviations in weather forecasts (for job scheduling within the green control center), finding clusters within the data (constituted by closely related variables e.g. PCA that can be used to remove redundant variables), classification, finding (non-linear using SVMs) regression models, training artificial neural networks based on the historical data so that they can be used for prediction in the future. 2 Machine learning techniques We analyze the following machine learning techniques: Bayesian inference Neural Networks

2 Support Vector Machines Clustering technique: PCA 2.1 Bayesian learning Bayesian learning is based on using previous information to predict the future. It is based on the Bayes? rule p(x/y)=p(x Y)/P(Y) (1) = P(Y/X) P(X)/P(Y) (2) = p(x/y) P(X) (3) = p(x/y) (1/P(Y)) (4) If we consider the set of all the events that are correlated to X (i.e. that X depends upon) say Y 1,, Y n, then from (1) we get: X= (X Y 1 ) (X Y n ) (5) = p(x)= p(x Y 1 )+ + p(x Y n ) (6) p(x)= p(x/y 1 ) p(y 1 )+ + p(x/y n ) p(y n ) (7) n p(x)= p(x/y i ) p(y i ) (8) Bayesian networks use a directed acyclic graph for the different random variables (represented using nodes) and their conditional dependencies (represented using edges) and uses inference algorithm to make prediction. Different graphical techniques like belief propagation or expectation maximization can be used to spread the information within the Bayesian network. Expectation maximization algorithms are used to model scenarios, which consist of latent variables, and consist of two steps. The structure of a dynamic Bayesian network changes with time. One simple example is a Hidden Markov Model. Markov model is a graph where each node corresponds to a state and has an associated transition probability to the neighboring nodes (which measure the probability with which the state transition will happen to the neighboring nodes). In the Hidden Markov Model some states are hidden as in the transition probability for these nodes (how the system behaves when in the state corresponding to these nodes) is not known. 2 i=1

3 Belief propagation includes algorithm like sum-product message passing. This technique is based on using the information stored in the neighboring nodes along with conditional probability to pass message within the network. It can also be represented using a Linear programming formulation. If the input variable follows a distribution such as Gaussian, it becomes much easier to use Bayesian inference. One of the disadvantages is that it is computationally hard although it is conceptually simple. This technique can also be used to recover the missing parameters/ variables based on the previous value (prior distribution) and correlation with other parameters. 2.2 Clustering techniques These techniques are used to identify closely related data points in any given data set. K-means clustering is a commonly studied example where the data points have to be grouped into K clusters such that the mean distance between the clusters is minimized. These techniques can be used to identify patterns in a weather plot say in a temperature vs. cloud cover plot where, region with higher density of points can be identified (which can be used to infer when the temperature is of certain value then it is more likely that the cloud cover is of certain value). Principal Component Analysis (PCA) is one such technique that is based on generating linearly uncorrelated variable (principal components) from a data set of multiple correlated variables. 2.3 Binary Decision Trees This can be used for scheduling the jobs based on deviation between the predicted and actual weather conditions. One can calculate different schedules based upon the different weather forecast/ prediction and store it in a binary decision tree. Later on depending on the deviation from the actual weather data one can use the rules stored in the nodes of the decision tree to select one of the schedules. 2.4 Support Vector Machines SVM is a supervised machine learning technique, which is used for binary classification of the input into different sets after training the model with previous data. It can 3

4 work in multiple dimensions represented by different variables. SVM works by building a model which is used for classification. It is based on a kernel method which is used for classification/regression, reducing dimensionality or clustering. Further they can also be used for nonlinearizing the regression and avoiding over/ underfitting. This is important because it is not possible to find a linear fit for many of the observed data in real world. In a simple linear regression techniques such as minimizing least square error the decision to select a regression function for the input data is based on fitting a line that minimizes the total square error for the entire data-set. However, for a data set that is very unevenly distributed the linear regression may be not be the best suited technique. Kernel techniques also make it easier to apply linear methods in higher dimensions. Another advantage of Kernel methods is that they allow comparison of two objects consisting of multiple attributes. Another perspective to understanding SVMs is to view it as a Minimization (optimization) problem where the goal is to minimize the average distance between the two clusters and the hyperplane which is equivalent to identifying a hyperplane which has the maximum distance from the nearest data point from both the clusters. LibSVM and SVMLight are two software libraries that can be used for SVM. 2.5 Artificial Neural Networks This technique is based on the simulation of a biological neural network, which consists of artificial neurons. By training the artificial neurons over a period of time (with historical data) one can predict future output. It is based on studying how the neurons connect (represented by connection weights) and pass information to each other for different inputs. Neurons collect the input from neighboring neurons and based on the magnitude of this input decide whether to fire or not. There can be multiple layers in a ANN e.g. input, hidden and output layer. Additionally we suggest the following techniques: Gibbs sampling, Ridge regression, Mean field value, RBF kernel. 4

5 3 Conclusion Bayesian inference can be used to predict the values of missing variables. Clustering techniques can be used to identify similarities between points in a data set in addition to removing redundancy. Binary decision tree can be used for dynamic scheduling (based on rules stored on nodes) within the green control center. SVMs and ANNs have to be analyzed further on if they can be useful for weather prediction. References [1] Adrian E. Raftery, Tilmann Gneiting, Fadoua Balabdaoui and Michael Polakowski, Using Bayesian Model Averaging to Calibrate Forecast Ensembles, Monthly Weather Review. Vol. 133, no. 5, pp May [2] Antonio S. Cofino, Rafael Cano, Carmen Sordo, Jose M. Guttirez, Bayesian networks for probabilistic weather prediction, In Proceedings of the 15th European Conference on Artificial Intelligence, ECAI 2002, p [3] Burrows, William R., Colin Price, Laurence J. Wilson, 2005: Warm season lightning probability prediction for Canada and the northern United States. Wea. Forecasting, 20, [4] Thomas A.Henzinger, Anmol V. Singh, Vasu Singh, Thomas Wies, Damien Zufferey Proceedings of the 3rd USENIX conference on Hot topics in cloud computing, 1-1 [5] Yulia Gel, Adrian E. Raftery, Tilmann Gneiting, Claudia Tebaldi, Doug Nychka, William Briggs, Mark S. Roulston and Veronica J. Berrocal, Calibrated Probabilistic Mesoscale Weather Field Forecasting: The Geostatistical Output Perturbation Method, Journal of the American Statistical Association, Vol. 99, No. 467 (Sep., 2004), pp [6] McIntyre, M. E., NUMERICAL WEATHER PREDICTION: A vision of the future, Weather, 43, 8, Blackwell Publishing Ltd , p ,

6 [7] Inigo Goiri, Ryan Beauchea, Kien Le, Thu D. Nguyen, Md. E. Haque, Jordi Guitart, Jordi Torres, Ricardo Bianchini, GreenSlot: scheduling energy consumption in green datacenters, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis [8] Inigo Goiri, Kien Le, Thu D. Nguyen, Jordi Guitart, Jordi Torres, and Ricardo Bianchini, GreenHadoop: Leveraging Green Energy in Data- Processing Frameworks, European Conference on Computer Systems (EuroSys 2012) Bern, Switzerland, April 10-13, 2012 [9] Navin Sharma, Pranshu Sharma, David Irwin, and Prashant Shenoy, Predicting Solar Generation from Weather Forecasts Using Machine Learning, Proceedings of the Second IEEE International Conference on Smart Grid Communications (SmartGridComm), Brussels, Belgium, October,

### BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

### Leveraging Renewable Energy in Data Centers: Present and Future

Leveraging Renewable Energy in Data Centers: Present and Future Ricardo Bianchini Department of Computer Science Collaborators: Josep L. Berral, Inigo Goiri, Jordi Guitart, Md. Haque, William Katsak, Kien

### Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 What is machine learning? Data description and interpretation

### Machine learning for algo trading

Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with

### An Introduction to Data Mining

An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

### Predicting Solar Generation from Weather Forecasts Using Machine Learning

Predicting Solar Generation from Weather Forecasts Using Machine Learning Navin Sharma, Pranshu Sharma, David Irwin, and Prashant Shenoy Department of Computer Science University of Massachusetts Amherst

### Machine Learning for Data Science (CS4786) Lecture 1

Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:

### Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

### Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a

### CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

### Scalable Developments for Big Data Analytics in Remote Sensing

Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,

### Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

### Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

### Neural Networks and Support Vector Machines

INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines

### HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

### Predict Influencers in the Social Network

Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

### An Introduction to Statistical Machine Learning - Overview -

An Introduction to Statistical Machine Learning - Overview - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny, Switzerland

### Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

### Predicting Flight Delays

Predicting Flight Delays Dieterich Lawson jdlawson@stanford.edu William Castillo will.castillo@stanford.edu Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing

### 8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

### Clustering and Data Mining in R

Clustering and Data Mining in R Workshop Supplement Thomas Girke December 10, 2011 Introduction Data Preprocessing Data Transformations Distance Methods Cluster Linkage Hierarchical Clustering Approaches

### Customer Data Mining and Visualization by Generative Topographic Mapping Methods

Customer Data Mining and Visualization by Generative Topographic Mapping Methods Jinsan Yang and Byoung-Tak Zhang Artificial Intelligence Lab (SCAI) School of Computer Science and Engineering Seoul National

### Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

### Intrusion Detection via Machine Learning for SCADA System Protection

Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. s.l.yasakethu@surrey.ac.uk J. Jiang Department

### Review of some concepts in predictive modeling

Review of some concepts in predictive modeling Brigham and Women s Hospital Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support A disjoint list of topics? Naïve Bayes

### An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

### MACHINE LEARNING AN INTRODUCTION

AN INTRODUCTION JOSEFIN ROSÉN, SENIOR ANALYTICAL EXPERT, SAS INSTITUTE JOSEFIN.ROSEN@SAS.COM TWITTER: @ROSENJOSEFIN AGENDA What is machine learning? When, where and how is machine learning used? Exemple

### Impact of workload and renewable prediction on the value of geographical workload management. Arizona State University

Impact of workload and renewable prediction on the value of geographical workload management Zahra Abbasi, Madhurima Pore, and Sandeep Gupta Arizona State University Funded in parts by NSF CNS grants and

### DATA ANALYTICS USING R

DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

### NEURAL NETWORKS A Comprehensive Foundation

NEURAL NETWORKS A Comprehensive Foundation Second Edition Simon Haykin McMaster University Hamilton, Ontario, Canada Prentice Hall Prentice Hall Upper Saddle River; New Jersey 07458 Preface xii Acknowledgments

### CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Lecture 1 Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x-5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

### Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

### A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

### Drug Store Sales Prediction

Drug Store Sales Prediction Chenghao Wang, Yang Li Abstract - In this paper we tried to apply machine learning algorithm into a real world problem drug store sales forecasting. Given store information,

### Forecasting and Hedging in the Foreign Exchange Markets

Christian Ullrich Forecasting and Hedging in the Foreign Exchange Markets 4u Springer Contents Part I Introduction 1 Motivation 3 2 Analytical Outlook 7 2.1 Foreign Exchange Market Predictability 7 2.2

### A Survey of Kernel Clustering Methods

A Survey of Kernel Clustering Methods Maurizio Filippone, Francesco Camastra, Francesco Masulli and Stefano Rovetta Presented by: Kedar Grama Outline Unsupervised Learning and Clustering Types of clustering

### AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

### MS1b Statistical Data Mining

MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

### Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

### C19 Machine Learning

C9 Machine Learning 8 Lectures Hilary Term 25 2 Tutorial Sheets A. Zisserman Overview: Supervised classification perceptron, support vector machine, loss functions, kernels, random forests, neural networks

### International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

### An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

### MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

### A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

### ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

### A TUTORIAL. BY: Negin Yousefpour PhD Student Civil Engineering Department TEXAS A&M UNIVERSITY

ARTIFICIAL NEURAL NETWORKS: A TUTORIAL BY: Negin Yousefpour PhD Student Civil Engineering Department TEXAS A&M UNIVERSITY Contents Introduction Origin Of Neural Network Biological Neural Networks ANN Overview

10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

### Statistics for BIG data

Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

### Machine Learning with MATLAB David Willingham Application Engineer

Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the

### Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.

Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,

### 8/29/2015. Data Mining and Machine Learning. Erich Seamon University of Idaho

Data Mining and Machine Learning Erich Seamon University of Idaho www.webpages.uidaho.edu/erichs erichs@uidaho.edu 1 I am NOMAD 2 1 Data Mining and Machine Learning Outline Outlining the data mining and

### Final Exam, Spring 2007

10-701 Final Exam, Spring 2007 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you brought:

### Introduction to Machine Learning

Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Instructor: Erik Sudderth Graduate TAs: Dae Il Kim & Ben Swanson Head Undergraduate TA: William Allen Undergraduate TAs: Soravit

### Data Mining Applications for Smart Grid in Japan

1 Paper ID: 14TD0194 Data Mining Applications for Smart Grid in Japan Hiroyuki Mori Dept. of Network Design Meiji University Nakano-ku, Tokyo 171-0042 Japan hmori@isc.meiji.ac.jp 2 OUTLINE 1. Objective

### Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu

Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel

### Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

### Lecture 11: Graphical Models for Inference

Lecture 11: Graphical Models for Inference So far we have seen two graphical models that are used for inference - the Bayesian network and the Join tree. These two both represent the same joint probability

### Role of Neural network in data mining

Role of Neural network in data mining Chitranjanjit kaur Associate Prof Guru Nanak College, Sukhchainana Phagwara,(GNDU) Punjab, India Pooja kapoor Associate Prof Swami Sarvanand Group Of Institutes Dinanagar(PTU)

### Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II

### Neural Networks. Neural network is a network or circuit of neurons. Neurons can be. Biological neurons Artificial neurons

Neural Networks Neural network is a network or circuit of neurons Neurons can be Biological neurons Artificial neurons Biological neurons Building block of the brain Human brain contains over 10 billion

### EECS 445: Introduction to Machine Learning Winter 2015

Instructor: Prof. Jenna Wiens Office: 3609 BBB wiensj@umich.edu EECS 445: Introduction to Machine Learning Winter 2015 Graduate Student Instructor: Srayan Datta Office: 3349 North Quad (**office hours

### SOFTWARE EFFORT ESTIMATION USING RADIAL BASIS FUNCTION NEURAL NETWORKS Ana Maria Bautista, Angel Castellanos, Tomas San Feliu

International Journal Information Theories and Applications, Vol. 21, Number 4, 2014 319 SOFTWARE EFFORT ESTIMATION USING RADIAL BASIS FUNCTION NEURAL NETWORKS Ana Maria Bautista, Angel Castellanos, Tomas

### Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

### Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree

### Statistical Models in Data Mining

Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

### CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka

CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training

### Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

### Predictive Dynamix Inc

Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

### Employer Health Insurance Premium Prediction Elliott Lui

Employer Health Insurance Premium Prediction Elliott Lui 1 Introduction The US spends 15.2% of its GDP on health care, more than any other country, and the cost of health insurance is rising faster than

### Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

### Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

### Chapter 7. Diagnosis and Prognosis of Breast Cancer using Histopathological Data

Chapter 7 Diagnosis and Prognosis of Breast Cancer using Histopathological Data In the previous chapter, a method for classification of mammograms using wavelet analysis and adaptive neuro-fuzzy inference

### Learning outcomes. Knowledge and understanding. Competence and skills

Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

### Probabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm

Probabilistic Graphical Models 10-708 Homework 1: Due January 29, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 1-3. You must complete all four problems to

### Keywords Sales Forecasting, ES, MA, Adaptive Neuro Fuzzy Inference System, ANN, Linear Regression.

A Business Intelligence Technique for Forecasting the Automobile Sales using Adaptive Intelligent Systems (ANFIS and ANN) Alekh Dwivedi Maheshwari Niranjan Kalicharan Sahu Department of Information Technology

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

### Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

### Big Data Analytics and Optimization

Big Data Analytics and Optimization C e r t i f i c a t e P r o g r a m i n E n g i n e e r i n g E x c e l l e n c e e.edu.in http://www.insof LIST OF COURSES Essential Business Skills for a Data Scientist...

### EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

### Joseph Twagilimana, University of Louisville, Louisville, KY

ST14 Comparing Time series, Generalized Linear Models and Artificial Neural Network Models for Transactional Data analysis Joseph Twagilimana, University of Louisville, Louisville, KY ABSTRACT The aim

### DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

### The Artificial Prediction Market

The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

### Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

### Supervised and unsupervised learning - 1

Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in

### MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a

### Model-based Synthesis. Tony O Hagan

Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

### Learning. CS461 Artificial Intelligence Pinar Duygulu. Bilkent University, Spring 2007. Slides are mostly adapted from AIMA and MIT Open Courseware

1 Learning CS 461 Artificial Intelligence Pinar Duygulu Bilkent University, Slides are mostly adapted from AIMA and MIT Open Courseware 2 Learning What is learning? 3 Induction David Hume Bertrand Russell

### Multi-Class and Structured Classification

Multi-Class and Structured Classification [slides prises du cours cs294-10 UC Berkeley (2006 / 2009)] [ p y( )] http://www.cs.berkeley.edu/~jordan/courses/294-fall09 Basic Classification in ML Input Output

### Data, Measurements, Features

Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

### Final Project Report

CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

### New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

### Machine Learning I Week 14: Sequence Learning Introduction

Machine Learning I Week 14: Sequence Learning Introduction Alex Graves Technische Universität München 29. January 2009 Literature Pattern Recognition and Machine Learning Chapter 13: Sequential Data Christopher

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Introduction to Machine Learning

Introduction to Machine Learning Prof. Alexander Ihler Prof. Max Welling icamp Tutorial July 22 What is machine learning? The ability of a machine to improve its performance based on previous results:

### Novelty Detection in image recognition using IRF Neural Networks properties

Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,

### Learning. Artificial Intelligence. Learning. Types of Learning. Inductive Learning Method. Inductive Learning. Learning.

Learning Learning is essential for unknown environments, i.e., when designer lacks omniscience Artificial Intelligence Learning Chapter 8 Learning is useful as a system construction method, i.e., expose