4. GPCRs PREDICTION USING GREY INCIDENCE DEGREE MEASURE AND PRINCIPAL COMPONENT ANALYIS
|
|
- Rodger Gardner
- 8 years ago
- Views:
Transcription
1 4. GPCRs PREDICTION USING GREY INCIDENCE DEGREE MEASURE AND PRINCIPAL COMPONENT ANALYIS The GPCRs sequences are made up of amino acid polypeptide chains. We can also call them sub units. The number and arrangements of these sub units forming a GPCR sequence is called quaternary structure. There are different types of quaternary structures in GPCRs, such as: dimmer, monomer, tetramer, trimer and pentamer. Some biological processes are directly affected by quaternary structures. For example, monomers form sodium channels (Chen, Alcayaga, Suarez-Isla, ORourke, Tomaselli, & Marban, 2002), homo-tetramers form potassium channel (Doyle, et al., 1998), homo-pentamers make phospholamban channels (Oxenoid & Chou, 2005), (Oxenoid, Rice, & Chou, 2007)and hetero-pentamers make α7 nicotinic acetylcholine receptor (Chou, 2004).Some transitions only occur in tetramers, dimmers bind some of ligands and tetramers make some ion channels. In this method, we have again classified GPCRs into three levels as in chapter 3. We have hybridized3 feature extraction approaches i.e. Split amino acid composition (SAAC), Pseudo amino acid (PseAA) composition and Fast Fourier transform (FFT). We have employed two physiochemical properties i.e. Electronic and Bulk in PseAA, which are already explained in chapter 3. All of these feature extraction strategies are explained in chapter 2. The number of features taken in PseAA is 62, in SAAC are 60 and in FFT is256. Total number of features is 378. As the number of features after the hybridization becomes so high and to avoid curse of dimensionality, we have applied principal component analysis (PCA) is used to reduce the features. After applying PCA, size of feature vector is reduced to 180.For the sake of classification we have used nearest neighbor algorithm. We have computed the nearest neighbors of a test sequence in two ways i.e. grey incidence degree measure and Euclidian distance measure. The grey incidence degree measure is performing better than Euclidian distance. We have trained and tested our methods on D8354 and compared with other methods on datasets: D167 and D566. Over of chapter is shown in the Figure
2 Figure 4-1: Overview of chapter GREY INCIDENCE DEGREE MEASURE Deng introduced grey theory in 1982 to analyze the uncertainty of a system (Deng, 1982). This theory can be applicable to the problems in which information is fuzzy or uncertain. Grey incidence degree (GID)measure is one of the major components of this theory (Liu, Fang, & Lin, 2005).The classification of GPCRs is also a fuzzy problem. Some GPCR sequences can be put into one class based on some properties but they can also be put in another class because of some other properties. where T T, T,..., Tn 1 2 T T1, T2,..., Tn 4.1 Tk, Tk ti, t i Min Max k Max are the numeric forms of n training sequences and T t is the test sequence. is t j the grey relational coefficient. Min Min j Mink Pk Pk t j t, i t i Max j k k k k k k 4.2, Max Max P P, P P, j1,2,..., nare the indices of training sequences, k 1,2,...,180 are indices of features of a GPCR sequence and = distinguishing coefficient. The value of distinguishing coefficient is between 0 and 1. 67
3 The grey incidence degree O of the test sequence with training sequences is a weighted sum of grey relational coefficient and is given by the following equation. 180 t i t, i k k, k O G G W G G k1 where,w k is weight associated with each feature. Wehave given equal weight to each feature and taken the value of ξ equal to 0.5 as in existing work (Tsai, Liou, & Jiang, 2005), (Xiao, Wang, & Chou, 2009). The grey incidence degree G t and the training sequences G i O G, G t i 4.3 is the correlation between the test sequence. A training sequence closest to the test sequence will have high grey incidence degree measure higher than other training sequences and hence can annotate the test sequence to its class. In this method, we have employed GID in Nearest Neighbor algorithm to compute the neighbors of a test sequence, which further can help to annotate the test sequence. 4.2.PRINCIPAL COMPONENT ANALYSIS Principal component analysis (PCA) is a useful technique in pattern classification or machine learning to analyze patterns in a high dimensional data and to prominent differences and the similarities in the data. It transforms high dimensional data into very low dimension without the loss of significant information. PCA is used in many different fields from neuroscience to computer graphics because it is non-parametric method used to extract useful relevant information from confusing data sets. The mathematically description of PCA is summarized in sections given below. The mathematical details of PCA are explained in detail in (Howard, 2000). Let us suppose a multi-dimensional data. We first compute the mean across each dimension and subtract mean from each value of that dimension, the data has now mean value equal to zero. Then we calculate the covariance matrix of zero mean data. Covariance matrix shows the relation between different dimensions in high dimensional data. Covariance can only be measured for data of more than 2 dimensions. Covariance matrix is N x N matrix, where N is number of dimensions of data. Covariance of one dimension to itself is equal to variance of that dimension COV X, Y n X i X Yi Y i1 4.4 n 1 68
4 where, COV X, Y is covariance between X andy dimensions. X is the mean of X dimension and Y is the mean of X dimension and n is the number of data points. Next, we have to compute the Eigen values and Eigen vector of the covariance matrix and sort Eigen vectors according to Eigen values. Next, we will ignore some of less important Eigen vectors to reduce dimensionality of the data. Finally, multiply the transpose of the chosen Eigen vector to the original high dimensional data and use this data as features to classification algorithm. We have named the GID based method as: GPCR-GID (Rehman & Khan, 2011). The overview of GPCR-GID is shown in Figure 4-2. Figure 4-2: Overview of GPCR-GID 4.3.RESULTS AND DESCUSSIONS As explained in start of this chapter, we have trained and tested our methods on D8354. The GPCRs in this dataset are classified into three levels i.e. family, sub family and sub-sub family 69
5 levels. In this proposed method, we have used only accuracy measure for performance assessment. Following sections gives the details of the results Family level classification GPCRs are classified into five families. The percentage accuracy of GID based method is 97.82% and Euclidian distance based method has achieved 97.44% Sub family level classification The five families of GPCRs are further classified into 40 sub families at this level. The percentage accuracy of GID based method is 81.55% and Euclidian distance based method is 80.97% Sub-sub family level classification The 40 sub families of GPCRs are further classified into 108 sub-sub families at this level. The percentage accuracy of GID based method is 73.32% and Euclidian distance based method is 72.66%.The performance of both methods is also shown in Figure 4-3. Figure 4-3: Performance of GID and Euclidian distance methods Figure 4-3clearly shows that the performance of GPCR-GID is superior than Euclidian distance based method at all the three levels. Hence, we have compared GPCR-GID with other existing methods. 70
6 Comparison with other methods We have trained our method on D8354 dataset and compared it with other methods using D8354. We have also compared our method with existing methods using D167 and D566 datasets. D167 and D566 are already explained in chapter 2. The comparison details are as follows Comparison with Selective top down approach In the selective top down approach, GPCRs are hierarchically classified into 3 levels (Davies, Secker, Freitas, Mendao, Timmis, & Flower, 2007). The selective top down method has assessed their performance using accuracy measure so we have compared our accuracy with them as shown in Figure 4-4. Figure 4-4: Comparison with selective top down approach At family level, the best percentage accuracy achieved in selective top down approach is 95.87%, while accuracy achieved in GPCR-GID is 97.82%. At sub family level, the best accuracy achieved in selective top down approach is 80.77% while accuracy achieved in GPCR-GID 81.55%. Selective top down approach has achieved 69.98% accuracy at sub-sub family level, while accuracy achieved in GPCR-GID is 73.32%. At all the three levels of GPCRs, GPCR-GID is significantly superior to the selective top down approach and hence strengthening the worth of GPCR-GID. 71
7 Comparison with other existing methods on D167 and D566 datasets There are 6 existing methods with whom we have compared GPCR-GID on D167 dataset i.e. (Elrod & Chou, 2002), (Huang, Cai, Ji, & Li, 2004), (Bhasin & Raghava, 2005), (Gao & Wang, 2006), (Gao, Wu, Ma, Lu, & He, 2008) and PCA-GPCR (Peng, Yang, & Chen, 2010 ). Again, we have used accuracy measure for the sake of comparison. This comparison is shown in Figure 4-5, which clearly shows the superiority of GPCR-GID over all of the 6 methods. Figure 4-5: Comparison on D167 There are 2 methods with which we have compared GPCR-GID on D566. One is PCA-GPCR (Peng, Yang, & Chen, 2010 )and the other is by Chou (Chou & Elrod, 2002). The percentage accuracy achieved PCA-GPCR is 97.88% and in (Chou & Elrod, 2002) is 92.05%, where as the accuracy achieved in GPCR-GID is 97.96%. 72
8 Figure 4-6: Comparison on D566 Figure 4-6shows the superiority of GPCR-GID over PCA-GPCR and Chou s method (Chou & Elrod, 2002). This improvement in performance of GPCR-GID is because of several reasons. One reason is the hybridization of spatial domain and transformed domain features and applying PCA for feature reduction. Secondly, GID measure based method can efficiently discriminate classes by computing quaternary structure of GPCR numerically. 73
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationAlgorithm and computational complexity of Insulin
Algorithm and computational complexity Insulin Lutvo Kurić Bosnia and Herzegovina, Novi Travnik, Kalinska 7 Abstract:This paper discusses cyberinformation studies the amino acid composition insulin, in
More informationComparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More informationIntroduction to Principal Components and FactorAnalysis
Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a
More informationAdaptive Face Recognition System from Myanmar NRC Card
Adaptive Face Recognition System from Myanmar NRC Card Ei Phyo Wai University of Computer Studies, Yangon, Myanmar Myint Myint Sein University of Computer Studies, Yangon, Myanmar ABSTRACT Biometrics is
More informationAUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
More informationSTATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp. 238-239
STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. by John C. Davis Clarificationof zonationprocedure described onpp. 38-39 Because the notation used in this section (Eqs. 4.8 through 4.84) is inconsistent
More informationU.P.B. Sci. Bull., Series C, Vol. 77, Iss. 1, 2015 ISSN 2286 3540
U.P.B. Sci. Bull., Series C, Vol. 77, Iss. 1, 2015 ISSN 2286 3540 ENTERPRISE FINANCIAL DISTRESS PREDICTION BASED ON BACKWARD PROPAGATION NEURAL NETWORK: AN EMPIRICAL STUDY ON THE CHINESE LISTED EQUIPMENT
More informationPrincipal Component Analysis
Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationPalmprint as a Biometric Identifier
Palmprint as a Biometric Identifier 1 Kasturika B. Ray, 2 Rachita Misra 1 Orissa Engineering College, Nabojyoti Vihar, Bhubaneswar, Orissa, India 2 Dept. Of IT, CV Raman College of Engineering, Bhubaneswar,
More informationObject Recognition and Template Matching
Object Recognition and Template Matching Template Matching A template is a small image (sub-image) The goal is to find occurrences of this template in a larger image That is, you want to find matches of
More informationVolume 2, Issue 9, September 2014 International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 9, September 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationMedical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu
Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
More informationClass-specific Sparse Coding for Learning of Object Representations
Class-specific Sparse Coding for Learning of Object Representations Stephan Hasler, Heiko Wersing, and Edgar Körner Honda Research Institute Europe GmbH Carl-Legien-Str. 30, 63073 Offenbach am Main, Germany
More informationMathematical Model Based Total Security System with Qualitative and Quantitative Data of Human
Int Jr of Mathematics Sciences & Applications Vol3, No1, January-June 2013 Copyright Mind Reader Publications ISSN No: 2230-9888 wwwjournalshubcom Mathematical Model Based Total Security System with Qualitative
More informationBiometric Authentication using Online Signatures
Biometric Authentication using Online Signatures Alisher Kholmatov and Berrin Yanikoglu alisher@su.sabanciuniv.edu, berrin@sabanciuniv.edu http://fens.sabanciuniv.edu Sabanci University, Tuzla, Istanbul,
More informationClustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationDenial of Service Attack Detection Using Multivariate Correlation Information and Support Vector Machine Classification
International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Issue-3 E-ISSN: 2347-2693 Denial of Service Attack Detection Using Multivariate Correlation Information and
More informationNon-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationMachine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
More informationPerformance Analysis of Data Mining Techniques for Improving the Accuracy of Wind Power Forecast Combination
Performance Analysis of Data Mining Techniques for Improving the Accuracy of Wind Power Forecast Combination Ceyda Er Koksoy 1, Mehmet Baris Ozkan 1, Dilek Küçük 1 Abdullah Bestil 1, Sena Sonmez 1, Serkan
More informationAnalysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
More informationA Survey on Outlier Detection Techniques for Credit Card Fraud Detection
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. VI (Mar-Apr. 2014), PP 44-48 A Survey on Outlier Detection Techniques for Credit Card Fraud
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationTHREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
More informationPrincipal components analysis
CS229 Lecture notes Andrew Ng Part XI Principal components analysis In our discussion of factor analysis, we gave a way to model data x R n as approximately lying in some k-dimension subspace, where k
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationSVM Ensemble Model for Investment Prediction
19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationAlignment and Preprocessing for Data Analysis
Alignment and Preprocessing for Data Analysis Preprocessing tools for chromatography Basics of alignment GC FID (D) data and issues PCA F Ratios GC MS (D) data and issues PCA F Ratios PARAFAC Piecewise
More informationPrice Prediction of Share Market using Artificial Neural Network (ANN)
Prediction of Share Market using Artificial Neural Network (ANN) Zabir Haider Khan Department of CSE, SUST, Sylhet, Bangladesh Tasnim Sharmin Alin Department of CSE, SUST, Sylhet, Bangladesh Md. Akter
More informationOptimal PID Controller Design for AVR System
Tamkang Journal of Science and Engineering, Vol. 2, No. 3, pp. 259 270 (2009) 259 Optimal PID Controller Design for AVR System Ching-Chang Wong*, Shih-An Li and Hou-Yi Wang Department of Electrical Engineering,
More informationT-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577
T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
More informationDesign call center management system of e-commerce based on BP neural network and multifractal
Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(6):951-956 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Design call center management system of e-commerce
More informationClassification of Household Devices by Electricity Usage Profiles
Classification of Household Devices by Electricity Usage Profiles Jason Lines 1, Anthony Bagnall 1, Patrick Caiger-Smith 2, and Simon Anderson 2 1 School of Computing Sciences University of East Anglia
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationDemand Forecasting Optimization in Supply Chain
2011 International Conference on Information Management and Engineering (ICIME 2011) IPCSIT vol. 52 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V52.12 Demand Forecasting Optimization
More informationDATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE
DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE 1 K.Murugan, 2 P.Varalakshmi, 3 R.Nandha Kumar, 4 S.Boobalan 1 Teaching Fellow, Department of Computer Technology, Anna University 2 Assistant
More information4.3 Least Squares Approximations
18 Chapter. Orthogonality.3 Least Squares Approximations It often happens that Ax D b has no solution. The usual reason is: too many equations. The matrix has more rows than columns. There are more equations
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationEvaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring
714 Evaluation of Feature election Methods for Predictive Modeling Using Neural Networks in Credits coring Raghavendra B. K. Dr. M.G.R. Educational and Research Institute, Chennai-95 Email: raghavendra_bk@rediffmail.com
More informationAnalysis of Landsat ETM+ Image Enhancement for Lithological Classification Improvement in Eagle Plain Area, Northern Yukon
Analysis of Landsat ETM+ Image Enhancement for Lithological Classification Improvement in Eagle Plain Area, Northern Yukon Shihua Zhao, Department of Geology, University of Calgary, zhaosh@ucalgary.ca,
More informationAssessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall
Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationAutomatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
More informationTracking and Recognition in Sports Videos
Tracking and Recognition in Sports Videos Mustafa Teke a, Masoud Sattari b a Graduate School of Informatics, Middle East Technical University, Ankara, Turkey mustafa.teke@gmail.com b Department of Computer
More informationA FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING
A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING Sumit Goswami 1 and Mayank Singh Shishodia 2 1 Indian Institute of Technology-Kharagpur, Kharagpur, India sumit_13@yahoo.com 2 School of Computer
More informationD-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationAnalysis of Model and Key Technology for P2P Network Route Security Evaluation with 2-tuple Linguistic Information
Journal of Computational Information Systems 9: 14 2013 5529 5534 Available at http://www.jofcis.com Analysis of Model and Key Technology for P2P Network Route Security Evaluation with 2-tuple Linguistic
More informationBig Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning
Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning By: Shan Suthaharan Suthaharan, S. (2014). Big data classification: Problems and challenges in network
More informationDatabase Modeling and Visualization Simulation technology Based on Java3D Hongxia Liu
International Conference on Information Sciences, Machinery, Materials and Energy (ICISMME 05) Database Modeling and Visualization Simulation technology Based on Java3D Hongxia Liu Department of Electronic
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
More informationAUTOMATIC THEFT SECURITY SYSTEM (SMART SURVEILLANCE CAMERA)
AUTOMATIC THEFT SECURITY SYSTEM (SMART SURVEILLANCE CAMERA) Veena G.S 1, Chandrika Prasad 2 and Khaleel K 3 Department of Computer Science and Engineering, M.S.R.I.T,Bangalore, Karnataka veenags@msrit.edu
More informationManifold Learning Examples PCA, LLE and ISOMAP
Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition
More informationNetwork Intrusion Detection using Semi Supervised Support Vector Machine
Network Intrusion Detection using Semi Supervised Support Vector Machine Jyoti Haweliya Department of Computer Engineering Institute of Engineering & Technology, Devi Ahilya University Indore, India ABSTRACT
More informationContent-Based Recommendation
Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationStock price prediction using genetic algorithms and evolution strategies
Stock price prediction using genetic algorithms and evolution strategies Ganesh Bonde Institute of Artificial Intelligence University Of Georgia Athens,GA-30601 Email: ganesh84@uga.edu Rasheed Khaled Institute
More informationDimensionality Reduction: Principal Components Analysis
Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely
More informationStandardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationNeural Networks for Sentiment Detection in Financial Text
Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.
More informationStudy on Human Performance Reliability in Green Construction Engineering
Study on Human Performance Reliability in Green Construction Engineering Xiaoping Bai a, Cheng Qian b School of management, Xi an University of Architecture and Technology, Xi an 710055, China a xxpp8899@126.com,
More informationSupervised and unsupervised learning - 1
Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in
More informationClustering Methods in Data Mining with its Applications in High Education
2012 International Conference on Education Technology and Computer (ICETC2012) IPCSIT vol.43 (2012) (2012) IACSIT Press, Singapore Clustering Methods in Data Mining with its Applications in High Education
More informationClustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
More informationJava Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
More informationA new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique
A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique Aida Parbaleh 1, Dr. Heirsh Soltanpanah 2* 1 Department of Computer Engineering, Islamic Azad University, Sanandaj
More informationT-test & factor analysis
Parametric tests T-test & factor analysis Better than non parametric tests Stringent assumptions More strings attached Assumes population distribution of sample is normal Major problem Alternatives Continue
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms K-means and its variants Hierarchical clustering
More informationIndex Terms: Face Recognition, Face Detection, Monitoring, Attendance System, and System Access Control.
Modern Technique Of Lecture Attendance Using Face Recognition. Shreya Nallawar, Neha Giri, Neeraj Deshbhratar, Shamal Sane, Trupti Gautre, Avinash Bansod Bapurao Deshmukh College Of Engineering, Sewagram,
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationAsian Option Pricing Formula for Uncertain Financial Market
Sun and Chen Journal of Uncertainty Analysis and Applications (215) 3:11 DOI 1.1186/s4467-15-35-7 RESEARCH Open Access Asian Option Pricing Formula for Uncertain Financial Market Jiajun Sun 1 and Xiaowei
More information1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
More informationData Mining Analysis of HIV-1 Protease Crystal Structures
Data Mining Analysis of HIV-1 Protease Crystal Structures Gene M. Ko, A. Srinivas Reddy, Sunil Kumar, and Rajni Garg AP0907 09 Data Mining Analysis of HIV-1 Protease Crystal Structures Gene M. Ko 1, A.
More informationJoint models for classification and comparison of mortality in different countries.
Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute
More informationResearch on Credibility Measurement Method of Data In Big Data
Journal of Information Hiding and Multimedia Signal Processing c 2016 ISSN 2073-4212 Ubiquitous International Volume 7, Number 4, July 2016 Research on Credibility Measurement Method of Data In Big Data
More informationClustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationChapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors
Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors Lesson 05: Array Processors Objective To learn how the array processes in multiple pipelines 2 Array Processor
More informationPersonalized Hierarchical Clustering
Personalized Hierarchical Clustering Korinna Bade, Andreas Nürnberger Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, D-39106 Magdeburg, Germany {kbade,nuernb}@iws.cs.uni-magdeburg.de
More informationKnowledge Discovery and Data Mining. Structured vs. Non-Structured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.
More informationMetric Multidimensional Scaling (MDS): Analyzing Distance Matrices
Metric Multidimensional Scaling (MDS): Analyzing Distance Matrices Hervé Abdi 1 1 Overview Metric multidimensional scaling (MDS) transforms a distance matrix into a set of coordinates such that the (Euclidean)
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationTOWARD BIG DATA ANALYSIS WORKSHOP
TOWARD BIG DATA ANALYSIS WORKSHOP 邁 向 巨 量 資 料 分 析 研 討 會 摘 要 集 2015.06.05-06 巨 量 資 料 之 矩 陣 視 覺 化 陳 君 厚 中 央 研 究 院 統 計 科 學 研 究 所 摘 要 視 覺 化 (Visualization) 與 探 索 式 資 料 分 析 (Exploratory Data Analysis, EDA)
More informationEfficient Attendance Management: A Face Recognition Approach
Efficient Attendance Management: A Face Recognition Approach Badal J. Deshmukh, Sudhir M. Kharad Abstract Taking student attendance in a classroom has always been a tedious task faultfinders. It is completely
More informationComponent Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
More informationA Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
More information