4. GPCRs PREDICTION USING GREY INCIDENCE DEGREE MEASURE AND PRINCIPAL COMPONENT ANALYIS

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "4. GPCRs PREDICTION USING GREY INCIDENCE DEGREE MEASURE AND PRINCIPAL COMPONENT ANALYIS"

Transcription

1 4. GPCRs PREDICTION USING GREY INCIDENCE DEGREE MEASURE AND PRINCIPAL COMPONENT ANALYIS The GPCRs sequences are made up of amino acid polypeptide chains. We can also call them sub units. The number and arrangements of these sub units forming a GPCR sequence is called quaternary structure. There are different types of quaternary structures in GPCRs, such as: dimmer, monomer, tetramer, trimer and pentamer. Some biological processes are directly affected by quaternary structures. For example, monomers form sodium channels (Chen, Alcayaga, Suarez-Isla, ORourke, Tomaselli, & Marban, 2002), homo-tetramers form potassium channel (Doyle, et al., 1998), homo-pentamers make phospholamban channels (Oxenoid & Chou, 2005), (Oxenoid, Rice, & Chou, 2007)and hetero-pentamers make α7 nicotinic acetylcholine receptor (Chou, 2004).Some transitions only occur in tetramers, dimmers bind some of ligands and tetramers make some ion channels. In this method, we have again classified GPCRs into three levels as in chapter 3. We have hybridized3 feature extraction approaches i.e. Split amino acid composition (SAAC), Pseudo amino acid (PseAA) composition and Fast Fourier transform (FFT). We have employed two physiochemical properties i.e. Electronic and Bulk in PseAA, which are already explained in chapter 3. All of these feature extraction strategies are explained in chapter 2. The number of features taken in PseAA is 62, in SAAC are 60 and in FFT is256. Total number of features is 378. As the number of features after the hybridization becomes so high and to avoid curse of dimensionality, we have applied principal component analysis (PCA) is used to reduce the features. After applying PCA, size of feature vector is reduced to 180.For the sake of classification we have used nearest neighbor algorithm. We have computed the nearest neighbors of a test sequence in two ways i.e. grey incidence degree measure and Euclidian distance measure. The grey incidence degree measure is performing better than Euclidian distance. We have trained and tested our methods on D8354 and compared with other methods on datasets: D167 and D566. Over of chapter is shown in the Figure

2 Figure 4-1: Overview of chapter GREY INCIDENCE DEGREE MEASURE Deng introduced grey theory in 1982 to analyze the uncertainty of a system (Deng, 1982). This theory can be applicable to the problems in which information is fuzzy or uncertain. Grey incidence degree (GID)measure is one of the major components of this theory (Liu, Fang, & Lin, 2005).The classification of GPCRs is also a fuzzy problem. Some GPCR sequences can be put into one class based on some properties but they can also be put in another class because of some other properties. where T T, T,..., Tn 1 2 T T1, T2,..., Tn 4.1 Tk, Tk ti, t i Min Max k Max are the numeric forms of n training sequences and T t is the test sequence. is t j the grey relational coefficient. Min Min j Mink Pk Pk t j t, i t i Max j k k k k k k 4.2, Max Max P P, P P, j1,2,..., nare the indices of training sequences, k 1,2,...,180 are indices of features of a GPCR sequence and = distinguishing coefficient. The value of distinguishing coefficient is between 0 and 1. 67

3 The grey incidence degree O of the test sequence with training sequences is a weighted sum of grey relational coefficient and is given by the following equation. 180 t i t, i k k, k O G G W G G k1 where,w k is weight associated with each feature. Wehave given equal weight to each feature and taken the value of ξ equal to 0.5 as in existing work (Tsai, Liou, & Jiang, 2005), (Xiao, Wang, & Chou, 2009). The grey incidence degree G t and the training sequences G i O G, G t i 4.3 is the correlation between the test sequence. A training sequence closest to the test sequence will have high grey incidence degree measure higher than other training sequences and hence can annotate the test sequence to its class. In this method, we have employed GID in Nearest Neighbor algorithm to compute the neighbors of a test sequence, which further can help to annotate the test sequence. 4.2.PRINCIPAL COMPONENT ANALYSIS Principal component analysis (PCA) is a useful technique in pattern classification or machine learning to analyze patterns in a high dimensional data and to prominent differences and the similarities in the data. It transforms high dimensional data into very low dimension without the loss of significant information. PCA is used in many different fields from neuroscience to computer graphics because it is non-parametric method used to extract useful relevant information from confusing data sets. The mathematically description of PCA is summarized in sections given below. The mathematical details of PCA are explained in detail in (Howard, 2000). Let us suppose a multi-dimensional data. We first compute the mean across each dimension and subtract mean from each value of that dimension, the data has now mean value equal to zero. Then we calculate the covariance matrix of zero mean data. Covariance matrix shows the relation between different dimensions in high dimensional data. Covariance can only be measured for data of more than 2 dimensions. Covariance matrix is N x N matrix, where N is number of dimensions of data. Covariance of one dimension to itself is equal to variance of that dimension COV X, Y n X i X Yi Y i1 4.4 n 1 68

4 where, COV X, Y is covariance between X andy dimensions. X is the mean of X dimension and Y is the mean of X dimension and n is the number of data points. Next, we have to compute the Eigen values and Eigen vector of the covariance matrix and sort Eigen vectors according to Eigen values. Next, we will ignore some of less important Eigen vectors to reduce dimensionality of the data. Finally, multiply the transpose of the chosen Eigen vector to the original high dimensional data and use this data as features to classification algorithm. We have named the GID based method as: GPCR-GID (Rehman & Khan, 2011). The overview of GPCR-GID is shown in Figure 4-2. Figure 4-2: Overview of GPCR-GID 4.3.RESULTS AND DESCUSSIONS As explained in start of this chapter, we have trained and tested our methods on D8354. The GPCRs in this dataset are classified into three levels i.e. family, sub family and sub-sub family 69

5 levels. In this proposed method, we have used only accuracy measure for performance assessment. Following sections gives the details of the results Family level classification GPCRs are classified into five families. The percentage accuracy of GID based method is 97.82% and Euclidian distance based method has achieved 97.44% Sub family level classification The five families of GPCRs are further classified into 40 sub families at this level. The percentage accuracy of GID based method is 81.55% and Euclidian distance based method is 80.97% Sub-sub family level classification The 40 sub families of GPCRs are further classified into 108 sub-sub families at this level. The percentage accuracy of GID based method is 73.32% and Euclidian distance based method is 72.66%.The performance of both methods is also shown in Figure 4-3. Figure 4-3: Performance of GID and Euclidian distance methods Figure 4-3clearly shows that the performance of GPCR-GID is superior than Euclidian distance based method at all the three levels. Hence, we have compared GPCR-GID with other existing methods. 70

6 Comparison with other methods We have trained our method on D8354 dataset and compared it with other methods using D8354. We have also compared our method with existing methods using D167 and D566 datasets. D167 and D566 are already explained in chapter 2. The comparison details are as follows Comparison with Selective top down approach In the selective top down approach, GPCRs are hierarchically classified into 3 levels (Davies, Secker, Freitas, Mendao, Timmis, & Flower, 2007). The selective top down method has assessed their performance using accuracy measure so we have compared our accuracy with them as shown in Figure 4-4. Figure 4-4: Comparison with selective top down approach At family level, the best percentage accuracy achieved in selective top down approach is 95.87%, while accuracy achieved in GPCR-GID is 97.82%. At sub family level, the best accuracy achieved in selective top down approach is 80.77% while accuracy achieved in GPCR-GID 81.55%. Selective top down approach has achieved 69.98% accuracy at sub-sub family level, while accuracy achieved in GPCR-GID is 73.32%. At all the three levels of GPCRs, GPCR-GID is significantly superior to the selective top down approach and hence strengthening the worth of GPCR-GID. 71

7 Comparison with other existing methods on D167 and D566 datasets There are 6 existing methods with whom we have compared GPCR-GID on D167 dataset i.e. (Elrod & Chou, 2002), (Huang, Cai, Ji, & Li, 2004), (Bhasin & Raghava, 2005), (Gao & Wang, 2006), (Gao, Wu, Ma, Lu, & He, 2008) and PCA-GPCR (Peng, Yang, & Chen, 2010 ). Again, we have used accuracy measure for the sake of comparison. This comparison is shown in Figure 4-5, which clearly shows the superiority of GPCR-GID over all of the 6 methods. Figure 4-5: Comparison on D167 There are 2 methods with which we have compared GPCR-GID on D566. One is PCA-GPCR (Peng, Yang, & Chen, 2010 )and the other is by Chou (Chou & Elrod, 2002). The percentage accuracy achieved PCA-GPCR is 97.88% and in (Chou & Elrod, 2002) is 92.05%, where as the accuracy achieved in GPCR-GID is 97.96%. 72

8 Figure 4-6: Comparison on D566 Figure 4-6shows the superiority of GPCR-GID over PCA-GPCR and Chou s method (Chou & Elrod, 2002). This improvement in performance of GPCR-GID is because of several reasons. One reason is the hybridization of spatial domain and transformed domain features and applying PCA for feature reduction. Secondly, GID measure based method can efficiently discriminate classes by computing quaternary structure of GPCR numerically. 73

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov

Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Algorithm and computational complexity of Insulin

Algorithm and computational complexity of Insulin Algorithm and computational complexity Insulin Lutvo Kurić Bosnia and Herzegovina, Novi Travnik, Kalinska 7 Abstract:This paper discusses cyberinformation studies the amino acid composition insulin, in

More information

Face Recognition using Principle Component Analysis

Face Recognition using Principle Component Analysis Face Recognition using Principle Component Analysis Kyungnam Kim Department of Computer Science University of Maryland, College Park MD 20742, USA Summary This is the summary of the basic idea about PCA

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Improved Fuzzy C-means Clustering Algorithm Based on Cluster Density

Improved Fuzzy C-means Clustering Algorithm Based on Cluster Density Journal of Computational Information Systems 8: 2 (2012) 727 737 Available at http://www.jofcis.com Improved Fuzzy C-means Clustering Algorithm Based on Cluster Density Xiaojun LOU, Junying LI, Haitao

More information

Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

More information

PCA to Eigenfaces. CS 510 Lecture #16 March 23 th A 9 dimensional PCA example

PCA to Eigenfaces. CS 510 Lecture #16 March 23 th A 9 dimensional PCA example PCA to Eigenfaces CS 510 Lecture #16 March 23 th 2015 A 9 dimensional PCA example is dark around the edges and bright in the middle. is light with dark vertical bars. is light with dark horizontal bars.

More information

Adaptive Face Recognition System from Myanmar NRC Card

Adaptive Face Recognition System from Myanmar NRC Card Adaptive Face Recognition System from Myanmar NRC Card Ei Phyo Wai University of Computer Studies, Yangon, Myanmar Myint Myint Sein University of Computer Studies, Yangon, Myanmar ABSTRACT Biometrics is

More information

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S. AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

More information

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp. 238-239

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp. 238-239 STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. by John C. Davis Clarificationof zonationprocedure described onpp. 38-39 Because the notation used in this section (Eqs. 4.8 through 4.84) is inconsistent

More information

PCA, Clustering and Classification. By H. Bjørn Nielsen strongly inspired by Agnieszka S. Juncker

PCA, Clustering and Classification. By H. Bjørn Nielsen strongly inspired by Agnieszka S. Juncker PCA, Clustering and Classification By H. Bjørn Nielsen strongly inspired by Agnieszka S. Juncker Motivation: Multidimensional data Pat1 Pat2 Pat3 Pat4 Pat5 Pat6 Pat7 Pat8 Pat9 209619_at 7758 4705 5342

More information

Classifiers & Classification

Classifiers & Classification Classifiers & Classification Forsyth & Ponce Computer Vision A Modern Approach chapter 22 Pattern Classification Duda, Hart and Stork School of Computer Science & Statistics Trinity College Dublin Dublin

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded

More information

Object Recognition and Template Matching

Object Recognition and Template Matching Object Recognition and Template Matching Template Matching A template is a small image (sub-image) The goal is to find occurrences of this template in a larger image That is, you want to find matches of

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Palmprint as a Biometric Identifier

Palmprint as a Biometric Identifier Palmprint as a Biometric Identifier 1 Kasturika B. Ray, 2 Rachita Misra 1 Orissa Engineering College, Nabojyoti Vihar, Bhubaneswar, Orissa, India 2 Dept. Of IT, CV Raman College of Engineering, Bhubaneswar,

More information

Volume 2, Issue 9, September 2014 International Journal of Advance Research in Computer Science and Management Studies

Volume 2, Issue 9, September 2014 International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 9, September 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

U.P.B. Sci. Bull., Series C, Vol. 77, Iss. 1, 2015 ISSN 2286 3540

U.P.B. Sci. Bull., Series C, Vol. 77, Iss. 1, 2015 ISSN 2286 3540 U.P.B. Sci. Bull., Series C, Vol. 77, Iss. 1, 2015 ISSN 2286 3540 ENTERPRISE FINANCIAL DISTRESS PREDICTION BASED ON BACKWARD PROPAGATION NEURAL NETWORK: AN EMPIRICAL STUDY ON THE CHINESE LISTED EQUIPMENT

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

Face Recognition using SIFT Features

Face Recognition using SIFT Features Face Recognition using SIFT Features Mohamed Aly CNS186 Term Project Winter 2006 Abstract Face recognition has many important practical applications, like surveillance and access control.

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

Class-specific Sparse Coding for Learning of Object Representations

Class-specific Sparse Coding for Learning of Object Representations Class-specific Sparse Coding for Learning of Object Representations Stephan Hasler, Heiko Wersing, and Edgar Körner Honda Research Institute Europe GmbH Carl-Legien-Str. 30, 63073 Offenbach am Main, Germany

More information

Biometric Authentication using Online Signatures

Biometric Authentication using Online Signatures Biometric Authentication using Online Signatures Alisher Kholmatov and Berrin Yanikoglu alisher@su.sabanciuniv.edu, berrin@sabanciuniv.edu http://fens.sabanciuniv.edu Sabanci University, Tuzla, Istanbul,

More information

Clustering UE 141 Spring 2013

Clustering UE 141 Spring 2013 Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

Clustering and Data Mining in R

Clustering and Data Mining in R Clustering and Data Mining in R Workshop Supplement Thomas Girke December 10, 2011 Introduction Data Preprocessing Data Transformations Distance Methods Cluster Linkage Hierarchical Clustering Approaches

More information

Denial of Service Attack Detection Using Multivariate Correlation Information and Support Vector Machine Classification

Denial of Service Attack Detection Using Multivariate Correlation Information and Support Vector Machine Classification International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Issue-3 E-ISSN: 2347-2693 Denial of Service Attack Detection Using Multivariate Correlation Information and

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

DATA ANALYTICS USING R

DATA ANALYTICS USING R DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

More information

Machine Learning Logistic Regression

Machine Learning Logistic Regression Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.

More information

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

More information

Study on Human Performance Reliability in Green Construction Engineering

Study on Human Performance Reliability in Green Construction Engineering Study on Human Performance Reliability in Green Construction Engineering Xiaoping Bai a, Cheng Qian b School of management, Xi an University of Architecture and Technology, Xi an 710055, China a xxpp8899@126.com,

More information

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. VI (Mar-Apr. 2014), PP 44-48 A Survey on Outlier Detection Techniques for Credit Card Fraud

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

Mathematical Model Based Total Security System with Qualitative and Quantitative Data of Human

Mathematical Model Based Total Security System with Qualitative and Quantitative Data of Human Int Jr of Mathematics Sciences & Applications Vol3, No1, January-June 2013 Copyright Mind Reader Publications ISSN No: 2230-9888 wwwjournalshubcom Mathematical Model Based Total Security System with Qualitative

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

Name: Date: Adding Zero. Addition. Worksheet A

Name: Date: Adding Zero. Addition. Worksheet A A DIVISION OF + + + + + Adding Zero + + + + + + + + + + + + + + + Addition Worksheet A + + + + + Adding Zero + + + + + + + + + + + + + + + Addition Worksheet B + + + + + Adding Zero + + + + + + + + + +

More information

Principal components analysis

Principal components analysis CS229 Lecture notes Andrew Ng Part XI Principal components analysis In our discussion of factor analysis, we gave a way to model data x R n as approximately lying in some k-dimension subspace, where k

More information

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Principal Components Analysis (PCA) Janette Walde janette.walde@uibk.ac.at Department of Statistics University of Innsbruck Outline I Introduction Idea of PCA Principle of the Method Decomposing an Association

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

Alignment and Preprocessing for Data Analysis

Alignment and Preprocessing for Data Analysis Alignment and Preprocessing for Data Analysis Preprocessing tools for chromatography Basics of alignment GC FID (D) data and issues PCA F Ratios GC MS (D) data and issues PCA F Ratios PARAFAC Piecewise

More information

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

More information

Optimal PID Controller Design for AVR System

Optimal PID Controller Design for AVR System Tamkang Journal of Science and Engineering, Vol. 2, No. 3, pp. 259 270 (2009) 259 Optimal PID Controller Design for AVR System Ching-Chang Wong*, Shih-An Li and Hou-Yi Wang Department of Electrical Engineering,

More information

Price Prediction of Share Market using Artificial Neural Network (ANN)

Price Prediction of Share Market using Artificial Neural Network (ANN) Prediction of Share Market using Artificial Neural Network (ANN) Zabir Haider Khan Department of CSE, SUST, Sylhet, Bangladesh Tasnim Sharmin Alin Department of CSE, SUST, Sylhet, Bangladesh Md. Akter

More information

Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 What is machine learning? Data description and interpretation

More information

Classification of Household Devices by Electricity Usage Profiles

Classification of Household Devices by Electricity Usage Profiles Classification of Household Devices by Electricity Usage Profiles Jason Lines 1, Anthony Bagnall 1, Patrick Caiger-Smith 2, and Simon Anderson 2 1 School of Computing Sciences University of East Anglia

More information

Performance Analysis of Data Mining Techniques for Improving the Accuracy of Wind Power Forecast Combination

Performance Analysis of Data Mining Techniques for Improving the Accuracy of Wind Power Forecast Combination Performance Analysis of Data Mining Techniques for Improving the Accuracy of Wind Power Forecast Combination Ceyda Er Koksoy 1, Mehmet Baris Ozkan 1, Dilek Küçük 1 Abdullah Bestil 1, Sena Sonmez 1, Serkan

More information

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577 T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Design call center management system of e-commerce based on BP neural network and multifractal

Design call center management system of e-commerce based on BP neural network and multifractal Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(6):951-956 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Design call center management system of e-commerce

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Fuzzy Based Defect Detection in Printed Circuit Boards

Fuzzy Based Defect Detection in Printed Circuit Boards Volume 1, Number 1, October 2014 SOP TRANSACTIONS ON SIGNAL PROCESSING Fuzzy Based Defect Detection in Printed Circuit Boards Neha koul *, Gurmeet kaur, Beant kaur Department of Electronics And Communication

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE

DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE 1 K.Murugan, 2 P.Varalakshmi, 3 R.Nandha Kumar, 4 S.Boobalan 1 Teaching Fellow, Department of Computer Technology, Anna University 2 Assistant

More information

Demand Forecasting Optimization in Supply Chain

Demand Forecasting Optimization in Supply Chain 2011 International Conference on Information Management and Engineering (ICIME 2011) IPCSIT vol. 52 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V52.12 Demand Forecasting Optimization

More information

4.3 Least Squares Approximations

4.3 Least Squares Approximations 18 Chapter. Orthogonality.3 Least Squares Approximations It often happens that Ax D b has no solution. The usual reason is: too many equations. The matrix has more rows than columns. There are more equations

More information

Clustering & Association

Clustering & Association Clustering - Overview What is cluster analysis? Grouping data objects based only on information found in the data describing these objects and their relationships Maximize the similarity within objects

More information

Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring

Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring 714 Evaluation of Feature election Methods for Predictive Modeling Using Neural Networks in Credits coring Raghavendra B. K. Dr. M.G.R. Educational and Research Institute, Chennai-95 Email: raghavendra_bk@rediffmail.com

More information

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin

More information

Analysis of Landsat ETM+ Image Enhancement for Lithological Classification Improvement in Eagle Plain Area, Northern Yukon

Analysis of Landsat ETM+ Image Enhancement for Lithological Classification Improvement in Eagle Plain Area, Northern Yukon Analysis of Landsat ETM+ Image Enhancement for Lithological Classification Improvement in Eagle Plain Area, Northern Yukon Shihua Zhao, Department of Geology, University of Calgary, zhaosh@ucalgary.ca,

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

Churn problem in retail banking Current methods in churn prediction models Fuzzy c-means clustering algorithm vs. classical k-means clustering

Churn problem in retail banking Current methods in churn prediction models Fuzzy c-means clustering algorithm vs. classical k-means clustering CHURN PREDICTION MODEL IN RETAIL BANKING USING FUZZY C- MEANS CLUSTERING Džulijana Popović Consumer Finance, Zagrebačka banka d.d. Bojana Dalbelo Bašić Faculty of Electrical Engineering and Computing University

More information

SVM Ensemble Model for Investment Prediction

SVM Ensemble Model for Investment Prediction 19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of

More information

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING Sumit Goswami 1 and Mayank Singh Shishodia 2 1 Indian Institute of Technology-Kharagpur, Kharagpur, India sumit_13@yahoo.com 2 School of Computer

More information

Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications

Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications Data Mining In Modern Astronomy Sky Surveys: Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications Ching-Wa Yip cwyip@pha.jhu.edu; Bloomberg 518 Human are Great Pattern Recognizers

More information

Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning

Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning By: Shan Suthaharan Suthaharan, S. (2014). Big data classification: Problems and challenges in network

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Tracking and Recognition in Sports Videos

Tracking and Recognition in Sports Videos Tracking and Recognition in Sports Videos Mustafa Teke a, Masoud Sattari b a Graduate School of Informatics, Middle East Technical University, Ankara, Turkey mustafa.teke@gmail.com b Department of Computer

More information

Analysis of Model and Key Technology for P2P Network Route Security Evaluation with 2-tuple Linguistic Information

Analysis of Model and Key Technology for P2P Network Route Security Evaluation with 2-tuple Linguistic Information Journal of Computational Information Systems 9: 14 2013 5529 5534 Available at http://www.jofcis.com Analysis of Model and Key Technology for P2P Network Route Security Evaluation with 2-tuple Linguistic

More information

Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

More information

Content-Based Recommendation

Content-Based Recommendation Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches

More information

AUTOMATIC THEFT SECURITY SYSTEM (SMART SURVEILLANCE CAMERA)

AUTOMATIC THEFT SECURITY SYSTEM (SMART SURVEILLANCE CAMERA) AUTOMATIC THEFT SECURITY SYSTEM (SMART SURVEILLANCE CAMERA) Veena G.S 1, Chandrika Prasad 2 and Khaleel K 3 Department of Computer Science and Engineering, M.S.R.I.T,Bangalore, Karnataka veenags@msrit.edu

More information

Database Modeling and Visualization Simulation technology Based on Java3D Hongxia Liu

Database Modeling and Visualization Simulation technology Based on Java3D Hongxia Liu International Conference on Information Sciences, Machinery, Materials and Energy (ICISMME 05) Database Modeling and Visualization Simulation technology Based on Java3D Hongxia Liu Department of Electronic

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Supervised and unsupervised learning - 1

Supervised and unsupervised learning - 1 Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in

More information

Stock price prediction using genetic algorithms and evolution strategies

Stock price prediction using genetic algorithms and evolution strategies Stock price prediction using genetic algorithms and evolution strategies Ganesh Bonde Institute of Artificial Intelligence University Of Georgia Athens,GA-30601 Email: ganesh84@uga.edu Rasheed Khaled Institute

More information

Standardization and Its Effects on K-Means Clustering Algorithm

Standardization and Its Effects on K-Means Clustering Algorithm Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

More information

Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

More information

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

More information

Neural Networks for Sentiment Detection in Financial Text

Neural Networks for Sentiment Detection in Financial Text Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

T-test & factor analysis

T-test & factor analysis Parametric tests T-test & factor analysis Better than non parametric tests Stringent assumptions More strings attached Assumes population distribution of sample is normal Major problem Alternatives Continue

More information

Clustering Methods in Data Mining with its Applications in High Education

Clustering Methods in Data Mining with its Applications in High Education 2012 International Conference on Education Technology and Computer (ICETC2012) IPCSIT vol.43 (2012) (2012) IACSIT Press, Singapore Clustering Methods in Data Mining with its Applications in High Education

More information

Lecture 7 Cogsci 109. Thurs. Oct. 12, 2006

Lecture 7 Cogsci 109. Thurs. Oct. 12, 2006 Lecture 7 Cogsci 109 Thurs. Oct. 12, 2006 Announcements Homework 2 is posted. Office hours Midterm is coming up somewhere in the next couple of weeks Anything lectured on, presented in section, in the

More information

Characteristics and statistics of digital remote sensing imagery

Characteristics and statistics of digital remote sensing imagery Characteristics and statistics of digital remote sensing imagery There are two fundamental ways to obtain digital imagery: Acquire remotely sensed imagery in an analog format (often referred to as hard-copy)

More information

A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique

A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique Aida Parbaleh 1, Dr. Heirsh Soltanpanah 2* 1 Department of Computer Engineering, Islamic Azad University, Sanandaj

More information

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms K-means and its variants Hierarchical clustering

More information

Index Terms: Face Recognition, Face Detection, Monitoring, Attendance System, and System Access Control.

Index Terms: Face Recognition, Face Detection, Monitoring, Attendance System, and System Access Control. Modern Technique Of Lecture Attendance Using Face Recognition. Shreya Nallawar, Neha Giri, Neeraj Deshbhratar, Shamal Sane, Trupti Gautre, Avinash Bansod Bapurao Deshmukh College Of Engineering, Sewagram,

More information

Data Mining Analysis of HIV-1 Protease Crystal Structures

Data Mining Analysis of HIV-1 Protease Crystal Structures Data Mining Analysis of HIV-1 Protease Crystal Structures Gene M. Ko, A. Srinivas Reddy, Sunil Kumar, and Rajni Garg AP0907 09 Data Mining Analysis of HIV-1 Protease Crystal Structures Gene M. Ko 1, A.

More information

1 Introduction to Matrices

1 Introduction to Matrices 1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information