Metodi Numerici per la Bioinformatica

Size: px
Start display at page:

Download "Metodi Numerici per la Bioinformatica"

Transcription

1 Metodi Numerici per la Bioinformatica Biclustering A.A. 2008/2009 1

2 Outline Motivation What is Biclustering? Why Biclustering and not just Clustering? Bicluster Types Algorithms 2

3 Motivations Gene expression matrices have been extensively analyzed using clustering in one of two dimensions The gene dimension The condition dimension This correspond to the: Analysis of expression patterns of genes by comparing rows in the matrix. Analysis of expression patterns of samples by comparing columns in the matrix. 3

4 Motivations Analysis via clustering makes several a priori assumptions that may not be adequate in all circumstances: Clustering can be applied to either genes or samples, implicitly directing the analysis to a particular aspect of the system under study (e.g., groups of patients or groups of co-regulated genes) Clustering algorithms usually seek a disjoint cover of the set of elements, requiring that no gene or sample belongs to more than one cluster. 4

5 Motivations the results of the application of standard clustering techniques to genes are limited due to the existence of a number of experimental conditions where the activity of genes is uncorrelated. Many activation patterns are common to a group of genes only under specific experimental conditions. Discovering such local expression patterns may be the key to uncovering many genetic pathways that are not apparent otherwise. It is therefore highly desirable to move beyond the clustering paradigm and develop approaches capable of discovering local patterns in microarray data. 5

6 What is Biclustering? BICLUSTER: a submatrix spanned by a set of genes (rows) and a set of sample (column) given a gene expression matrix, it s possible to characterize the biological phenomena it embodies by a collection of biclusters, each representing a different type of joint behavior of a set of genes in a corresponding set of samples. 6

7 What is Biclustering? 7

8 What is Biclustering? Given the matrix A = (X,Y) I= Subset of rows J= Subset of columns (I,Y) = a subset of rows that exhibit similar behavior across the set of all columns = cluster of rows (X,J) = a subset of columns that exhibit similar behavior across the set of all rows = cluster of columns 8

9 What is Biclustering? Biclustering Goals: find a set of significant biclusters in a matrix: identify sub-matrices (subsets of rows and subsets of columns) with interesting properties. Perform simultaneous clustering on the row and column dimensions of the gene expression matrix instead of clustering the rows and columns separetely. Gene Expression Data Analysis Identify subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition 9

10 Why Biclustering and not just Clustering? general models Clustering Can be applied to either the rows or the columns of the data matrix, separately. Produce either clusters of rows (subgroups of rows) or clusters of columns (subgroups of columns). local models Biclustering Perform simultaneous clustering of both rows and columns of the data matrix. Produce biclusters (subgroups of rows and subgroups of columns) 10

11 Why Biclustering and not just Clustering? Unlike Clustering : Biclustering identifies groups of genes that show similar activity patterns under a specific subset of the experimental conditions. Biclustering is the key technique to use when: Only a small set of the genes participates in a cellular process of interest. An interesting cellular process is active only in a subset of the conditions. A single gene may participate in multiple pathways that may or not be co-active under all conditions. 11

12 Biclustering V s Clustering Gene A Gene B Gene C Gene D Gene E Gene F Gene G Gene H Gene I Gene J Gene K Gene L Gene M Clustering Gene A Gene B Gene C Gene D Gene K Gene L Bicluster {1,2,3,5,7,10} {A,B,C,D,E,F} Similarity does not exist over all attributes Solution: Cluster both Row and Columns Simultaneously - Biclustering

13 Biclustering characteristics Biclustering algorithms should identify groups of genes and conditions, obeying the following rules: A cluster of genes should be defined with respect to only a subset of the conditions. A cluster of conditions should be defined with respect to only a subset of the genes. The clusters should not be exclusive and/or exhaustive There are no a-priori constraints on the organization of biclusters: a gene or condition should be able to belong to more than one bicluster or to no bicluster at all. The lack of structural constrains on biclustering solutions allows greater freedom but is consequently more vulnerable to overfitting biclustering algorithms must guarantee that the output biclusters are meaningful accompanying statistical model or a heuristic scoring method that define which of the many possible submatrices represent a significant biological behavior. 13

14 Biclustering: clinical application In clinical applications, gene expression analysis is done on tissues taken from patients with a medical condition. Using such assays, biologists have identified molecular fingerprints that can help in the classification and diagnosis of the patient status and guide treatment protocols. the focus is: identify profiles of expression over a subset of the genes that can be associated with clinical conditions and treatment outcomes, where ideally, the set of samples is equal in all but the subtype or the stage of the disease. However, a patient may be a part of more than one clinical group, e.g., may suffer from syndrome A, have a genetic background B and be exposed to environment C. Biclustering analysis is thus highly appropriate for identifying and distinguishing the biological factors affecting the patients along with the corresponding gene subsets. 14

15 Biclustering: functional genomics application Goal: understand the functions of each of the genes operating in a biological system. The rationale is that genes with similar expression patterns are likely to be regulated by the same factors and therefore may share function. By collecting expression profiles from many different biological conditions and identifying joint patterns of gene expression among them, researchers have characterized transcriptional programs and assigned putative function to thousands of genes. Since genes have multiple functions, and since transcriptional programs are often based on combinatorial regulation, biclustering is highly appropriate for these applications as well. An important aspect of gene expression data is their high noise levels: biclustering algorithms should be robust enough to cope with significant levels of noise 15

16 Bicluster Types An interesting criteria to evaluate a biclustering algorithm concerns the identification of the type of biclusters the algorithm is able to find. We identified four major classes of biclusters: 1. Biclusters with constant values. 2. Biclusters with constant values on rows or columns. 3. Biclusters with coherent values. 4. Biclusters with coherent evolutions. 16

17 Bicluster Types According to the specific properties of each problem One or more of these different types of biclusters are generally considered interesting. A different type of merit function should be used to evaluate the quality of the biclusters identified. The choice of the merit function is strongly related with the characteristics of the biclusters each algorithm aims at finding. 17

18 Biclusters with constant values The simplest biclustering algorithms identify subsets of rows and subsets of columns with constant values. A perfect constant bicluster is a sub-matrix (I,J) where all values within the bicluster are equal for all i I and j J: a ij = µ a a a a a a a a a a a a a a a a The merit function used to compute and evaluate constant biclusters is, in general, the variance or some metric based on it. 18

19 Biclusters with constant values on rows A perfect bicluster with constant rows: is a sub-matrix (I,J) where all values within the bicluster can be obtained using one of the following expressions: a a a a a a a a a ij = µ +α i a+i a+i a+i a+i a ij = µ x α i Where: µ is the typical value within the bicluster α is the adjustment for row i I. a+j a+j a+j a+j a+k a+k a+k a+k a x i a x i a x i a x i a x j a x j a x j a x j a x k a x k a x k a x k A bicluster with constant values in the rows identifies a subset of genes with similar expression values across a subset of conditions, allowing the expression levels to differ from gene to gene. 19

20 Biclusters with constant values on columns A perfect bicluster with constant columns: is a sub-matrix (I,J) where all values within the bicluster can be obtained using one of the following expressions: a a+i a+j a+k a a x i a x j a x k a ij = µ + β j a a+i a+j a+k a a x i a x j a x k a ij = µ x β j a a+i a+j a+k a a+i a+j a+k Where: µ is the typical value within the bicluster β is the adjustment for column j J. a a x i a x j a x k a a x i a x j a x k A bicluster with constant values in the columns identifies a subset of conditions within which a subset of genes present similar expression values assuming that the expression values may differ from condition to condition. 20

21 Biclusters with constant values on rows or columns The straightforward approach to identify non-constant biclusters is to normalize the rows or the columns of the data matrix using the row mean and the column mean, respectively. By doing this, the biclusters with constant rows/columns are transformed into constant biclusters before the biclustering algorithm is applied. 21

22 Biclusters with coherent values A perfect bicluster with coherent values: is defined as a subset of rows and a subset of columns whose values are predicted using the following expressions: ADDITIVE MODEL: a ij = µ + α i + β j Where: µ is the typical value within the bicluster α i is the adjustment for row i I β j is the adjustment for row j J. a b c d a+i b+i c+i d+i a+j b+j c+j d+j a+k b+k c+k d+k 22

23 Biclusters with coherent values MULTIPLICATIVE MODEL: a b c d a x i b x i c x i d x i a ij = µ x α i x β j a x j b x j c x j d x j a x k b x k c x k d x k Where: µ is the typical value within the bicluster α i is the adjustment for row i I β j is the adjustment for row j J. 23

24 Types of Biclusters : examples Constant values Constant values on rows Constant values on columns Coherent values Additive model 24 Multiplicative model

25 General additive models For every element a ij : The general additive model represents a sum of models. Each model represents the contribution of the bicluster B k to the value of a ij in case i I and j J. The general additive model is defined as follows: where: aij k is the number of biclusters = K k =0 The terms θ ik andκ jk are binary values that represent memberships: θ ρik is the membership of row i in the bicluster k. κ jk is the membership of column j in the bicluster k. ijk ρ ik κ jk 25

26 General additive models The value of θ ijk specifies the contribution of each bicluster k and can be one of the following expressions: µ k µ k + α ik µ k + β jk µ k + α ik + β jk Representing different types of biclusters: Constant Biclusters Biclusters with constant rows/columns Biclusters with additive model 26

27 General additive models: GENERAL ADDITIVE MODELS: Constant values Constant rows Constant columns Coherent Values

28 General multiplicative models Similiarly we can also think of a general multiplicative model: a ij = K k = 0 θ ijk ρ ik κ jk where: K is the number of biclusters The terms θ ik andκ jk are binary values that represent memberships: ρik is the membership of row i in the bicluster k. κ jk is the membership of column j in the bicluster k. 28

29 General multiplicative models The value of θ ijk specifies the contribution of each bicluster k and can be one of the following expressions: µ k µ k x α ik µ k x β jk µ k x α ik + β jk Representing different types of biclusters: Constant Biclusters Biclusters with constant rows/columns Biclusters with multiplicative model 29

30 General multiplicative models GENERAL MULTIPLICATIVE MODELS: Constant values Constant rows Constant columns 2X X2 6X2 Coherent Values X4

31 BICLUSTERING ALGORITHMS 31

32 Algorithms DifferentObjectives Identify one bicluster. Identify a given number of biclusters. DifferentApproaches Discover one bicluster at a time. Discover one set of biclusters at a time. Discover all biclusters at the same time (Simultaneous bicluster identification) 32

33 Algorithms: Iterative Row and Column Clustering Combination Apply clustering algorithms to the rows and columns of the data matrix, separately. Combine the results using some sort of iterative procedure to combine the two cluster arrangements. Divide and Conquer: Break the problem into several sub-problems that are similar to the original problem but smaller in size. Solve the problems recursively. Combine the intermediate solutions to create a solution to the original problem. Usually break the matrix into submatrices (biclusters) based on a certain criterion and then continue the biclustering process on the new submatrices. 33

34 Algorithms: Greedy Iterative Search: make a locally optimal choice in the hope that this choice will lead to a globally good solution. Usually perform greedy row/column addition/removal. Exhaustive Bicluster Enumeration: Cheng & Church Algorithm The best biclusters are identified using an exhaustive enumeration of all possible biclusters existent in the data, in exponential time. 34

35 Overview of the Biclustering Algorithms Method Publish Cluster Model Goal Cheng & Church ISMB 2000 Background + row effect + column effect Getz et al. (CTWC) PNAS 2000 Depending on plugin clustering algorithm Lazzeroni & Owen Bioinformatics Background + row effect (Plaid Models) column effect Ben-Dor et al. (OPSM) Tanay et al. (SAMBA) Yang et al. (FLOC) Kluger et al. (Spectral) RECOMB 2002 Bioinformatics 2002 BIBE 2003 Genome Res All genes have the same order of expression values Maximum bounded bipartite subgraph Background + row effect + column effect Background row effect column effect Minimize mean squared residue of biclusters Depending on plugin clustering algorithm Minimize modeling error Minimize the p-values of biclusters Minimize the p-values of biclusters Minimize mean squared residue of biclusters Finding checkerboard structures 35 Taken from Kevin Yip, 2003

36 Overview of the Biclustering Algorithms Method Allow overlap? Bicluster Discovery Complexity Testing Data Cheng & Church Yes (rare in reality) One at a time O(MN) or O(MlogN) Yeast ( ), lymphoma ( ) Getz et al. (CTWC) Yes One set at a time Exponential Leukemia ( ), colon cancer ( ) Lazzeroni & Owen Yes One at a time Polynomial Food (961 6), (Plaid Models) Ben-Dor et al. (OPSM) Tanay et al. (SAMBA) Yang et al. (FLOC) Kluger et al. (Spectral) Yes Yes Yes No All at the same time All at the same time All at the same time All at the same time forex (276 18), yeast ( ) O(NM 3 l) Breast tumor ( ) O((N2 d+1 ) log (r+1) /r(rd) ) Lymphoma ( ), yeast ( ) O((N+M) 2 kp) Yeast ( ) Polynomial Lymphoma (1 rel., 1 abs.), leukemia, breast cell line, CNS embryonal tumor 36

37 Cheng and Church s Algorithm Cheng and Church were the first to introduce biclustering to gene expression analysis. Their algorithmic framework represents the biclustering problem as an optimization problem, defining a score for each candidate bicluster and developing heuristics to solve the constrained optimization problem defined by this score function. The constraints force the uniformity of the matrix and the procedure gives preference to larger submatrices. Cheng and Church implicitly assume that (gene, condition) pairs in a good bicluster have a constant expression level, plus possibly additive row and column specific effects. 37 Biclustering of Expression data Y. Cheng and M.Church, ISMB 2000

38 Cheng and Church s Algorithm Model: A bicluster is represented by the submatrix A of the whole expression matrix (the involved rows and columns need not be contiguous in the original matrix). Each entry a ij in the bicluster is the summation of: 1. The background level 2. The row (gene) effect 3. The column (condition) effect A dataset contains a number of biclusters, which are not necessarily disjoint. 38

39 Cheng and Church s Algorithm:residue In the matrix A the residue score of element a ij is given by: I i J j a a ij = mean of row i a Ij =mean of column j a Ij = mean of A a IJ = a ij a = Ij j J = i I, j J J I J a ij i I I a ij a ij R ( a ) = a a a + ij ij ij Ij a IJ Biological meaning: the genes have the same (amount of) response to the conditions 39

40 Cheng and Church s Algorithm: mean square residue The mean square residue is the variance of the set of all elements in the bicluster, plus the mean row variance and the mean column variance. H ( I, J ) = I 1 J 2 ( aij aij aij + aij ) = i I, j J i I, j J 2 R ij I J A submatrix A IJ is called a δ-bicluster if H(I,J) δ for some δ 0. GOAL: find biclusters with low mean squared residue, in particular, large and maximal ones with scores below a certain threshold δ. 40

41 Cheng & Church algorithm H ( I, J ) = I 1 J i I, j J ( a ij a ij a Ij + a IJ ) 2 = i I, j J R 2 ij I J A score of H(I,J)=0 would mean that we are in the case of a constant bicluster of elements of a single value. (The gene expression levels fluctuates in unison) With a score of H(I,J) 0 it is always possible to remove a row ora a column to lower the score, until the remaining bicluster becomes constant. The global H score gives an indicator of how data fits together within that matrix; whether it has some coherence or is random: A high H value signifies that data is uncorrelated. A low H score values means that there is a correlation in the matrix 41

42 Minimum squared residue: example If 5 was replaced with 3 then the score would change to : H(M 2 )= 2.06 A matrix with elements randomly and uniformly generated in the range [a,b] (a=1, b=12), has an expected score of(b-a) 2 /12. In this case: H(M 3 )= (12-1 )2 /12=

43 Cheng & Church algorithm Constraints: 1xM and Nx1 matrixes always give zero residue. Find biclusters with maximum sizes, with residues not more than a threshold δ (largest δ-biclusters) Constant matrixes always give zero residue. Use average row variance to evaluate the interestingness of a bicluster. Biologically, it represents genes that have large change in expression values over different conditions. 43

44 Cheng & Church algorithm Objective function for heuristic methods (to minimize): H ( I, J ) = I 1 J i I, j J ( a ij a ij a Ij + a IJ ) 2 = i I, j J R 2 ij I J sum of the components from each row and column, which suggests simple greedy algorithms to evaluate each row and column independently 44

45 Cheng and Church s Algorithm Greedy approach to rapidly converge to a maximal bicluster. In phase I, it removes rows/columns with a large contribution to the mean residue score (msr). In phase II, rows/columns are added that have a low contribution to the msr without exceeding δ. After a bicluster is identified, its values are randomized to prevent it to show up again.

46 Cheng and Church s Algorithm Given the threshold parameter δ, the algorithm runs in two phases: FIRST PHASE: the algorithm removes rows and columns from the full matrix. At each step,where the current submatrix has row set and column set, the algorithm examines the set of possible moves. 1 d ( i ) = for rows it calculates: j J RSI, J ( i, j) J for columns it calculates: 1 e ( j) = i RS I I, J ( i, j) I It then selects the highest scoring row or column and removes it from the current submatrix, as long as H(I,J)>δ. The idea is that rows/columns with large contribution to the score can be removed with guaranteed improvement (decrease) in the total mean square residue score. A possible variation of this heuristic removes at each step all rows/columns with a contribution to the residue score that is higher than some threshold. 46

47 Cheng and Church s Algorithm SECOND PHASE: Goal: increases the matrix size without crossing the threshold δ. For this rows and columns are being added, using the same scoring scheme, but this time looking for the lowest square residues d(i) e(j) at each move, and terminating where none of the possible moves increases the matrix size without crossing the threshold δ. Upon convergence, the algorithm outputs a submatrix with low mean residue and locally maximal size. To discover more than one bicluster, Cheng and Church suggested repeated application of the biclustering algorithm on modified matrices. The modification includes randomization of the values in the cells of the previously discovered biclusters, preventing the correlative signal in them to be beneficial for any other bicluster in the matrix. This has the obvious effect of precluding the identification of biclusters with significant overlaps. 47

48 Evolutionary bicluster Binary encoding for rows/columns Fitness: mean squared residue row variance large volume penalty (exponential) Typical genetic operators Evolutionary Biclustering of Gene Expressions H.Banka and S.Mitra ACM, Ubiquity, 7 (42)

49 Genetic Algorithms -a brief introduction- The idea of genetic algorithm (GA) was first introduced by John Holland in early 1970 s based on the adaptive global search heuristic inspired by natural evolution and genetics with survival of the fittest strategy. It is a stochastic population based search strategy works on biological mechanism of natural selection, crossover, and mutation. GAs are executed iteratively on a set of coded solutions, called population, with the three basic operators: selection, crossover, and mutation. For solving a problem, GA starts with a set of encoded random solutions (i.e., chromosomes) and evolves better set of solutions over generations (iterations) by applying the basic GA operators. Better solutions are determined from objective values (fitness functions) that determines the suitability of reproduction for the solutions. Hence better solutions are selected whereas the bad ones are eliminated from the population at each generation 49

50 Simple Genetic Algorithm { } initialize population; evaluate population; while Termination Criteria Not Satisfied { } select parents for reproduction; perform recombination and mutation; evaluate population;

51 Evolutionary biclustering: Representation An encoded solution representing a bicluster: Each bicluster is represented by a fixed sized binary string called chromosome or individual, with a bit string for genes appended by another bit string for conditions. The chromosome corresponds to a solution for this optimal bicluster generation problem. A bit is set to one if the corresponding gene and/or condition is present in the bicluster, and reset to zero otherwise. 51

52 Evolutionary biclustering: fitness function Goal: generating maximal set of genes and conditions while maintaining the homogeneity of the biclusters Maximize: where: Multi-objective optimization g and c are the number of ones in the genes and conditions within the bicluster, G(g, c) is its mean squared residue score δ is the user-defined threshold for the maximum acceptable dissimilarity or mean squared residue score of the bicluster G and C are the total number of genes and conditions of the original gene expression array 52

53 Evolutionary biclustering: Local search Since the initial biclusters are generated randomly, it may happen that some irrelevant genes and/or conditions get included in spite of their expression values lying far apart in the feature space. An analogous situation may also arise during crossover and mutation in each generation. These genes and conditions, with dissimilar values, need to be eliminated deterministically. Furthermore, for good biclustering, some genes and/or conditions having similar expression values need to be incorporated as well. The algorithm starts with a given bicluster and an initial gene expression array (G,C). The irrelevant genes or conditions having mean squared residue above (or below) a certain threshold are now selectively eliminated (or added) using the some conditions. 53

54 Evolutionary biclustering: Domination: The conditions for a solution to be dominated with respect to the other solutions is: If there are M objective functions, a solution x(1) is said to dominate another solution x(2), if both conditions the solution x(1) is no worse than x(2) in all the M objective functions and the solution x(1) is strictly better than x(2) in at least one of the M objective functions. Crowding distance: this assigns the highest value to the boundary solutions and the average distance of two solutions [(i+1) th and (i 1) th ] on either side of solution i along each of the objectives. Crowding selection: A solution i wins tournament with another solution j if: solution i has better rank, i.e., r i < r j. both the solutions are in the same front, i.e., r i = r j, but solution i is less densely located in the search space, i.e., d i > d j. 54

55 Evolutionary biclustering: The algorithm The main steps of the proposed algorithm, repeated over a specified number of generations, are: 1. Generate a random population of size P. 2. Delete or add multiple nodes (genes and conditions) from each individual of the population. 3. Calculate the multi-objective fitness functions f1 and f2 4. Rank the population using the dominance criteria. 5. Calculate crowding distance. 6. Perform selection using crowding tournament selection. 7. Perform crossover and mutation (as in conventional GA) to generate offspring population of size P. 8. Combine parent and offspring population. 9. Rank the mixed population using dominance criteria and crowding distance, as above. 10.Replace the parent population by the best P members of the combined population. 55

56 Biclustering advantages 1. automatically selects genes and conditions with more coherent measurement 2. groups items based on a similarity measure that depends on a context, which is best defined as a subset of the attributes. It discovers not only the grouping, but the context as well. And to some extent, these two become inseparable and exchangeable, which is a major difference between biclustering and clustering rows after clustering columns. 3. allows rows and columns to be included in multiple biclusters, and thus allows one gene or one condition to be identified by more than one function categories. This added flexibility correctly reflects the reality in the functionality of genes and overlapping factors in tissue samples and experiment conditions. 56

57 Biclustering: observations The algorithms presented demonstrate some of the approaches developed for the identification of bicluster patterns in large matrices, and in gene expression matrices in particular. A classification of the different methods ca be: a) By their model and scoring schemes b) By the type of algorithm used for detecting biclusters 57

58 Biclustering: models and score To ensure that the biclusters are statistically significant, each of the biclustering methods defines a scoring scheme to assess the quality of candidate biclusters, or a constraint that determines which submatrices represent significant bicluster behavior. Constraint based methods: search for gene (property) sets that define stable subsets of properties. Algorithms: iterative signature algorithm, the coupled two-way clustering method and the spectral algorithm of Kluger et al. Scoring based methods : rely on a background model for the data. The basic model assumes that biclusters are essentially uniform submatrices and scores them according to their deviation from such uniform behavior. More elaborate models allow different distributions for each condition and gene, usually in a linear way. Algorithms: the Cheng-Church algorithm and the Plaid model. 58

59 Biclustering: algorithmic approaches The algorithmic approaches for detecting biclusters given the data are greatly affected by the type of score/constraint model in use: Several algorithms alternate between phases of gene sets and condition sets optimization (the iterative signature algorithm and the coupled two-way clustering algorithm.) Other use standard linear algebra or optimization algorithms to solve key subproblems. (Plaid model and the Spectral algorithm) A heuristic hill climbing algorithm is used in the Cheng-Church algorithm. 59

60 Research Opportunities Many issues in biclustering algorithm design also remain open and should be addressed by the scientific community: Propose other bicluster models. Based on the current models, propose new algorithms that improve bicluster quality (validated statistically or biologically) and/or time complexity. Combine the strength of multiple studies. Investigate the effects of normalization to the models/algorithms. Compare the different methods on some other real datasets. Make better use of domain knowledge. 60

Biclustering Algorithms for Biological Data Analysis: A Survey

Biclustering Algorithms for Biological Data Analysis: A Survey INESC-ID TEC. REP. 1/2004, JAN 2004 1 Biclustering Algorithms for Biological Data Analysis: A Survey Sara C. Madeira and Arlindo L. Oliveira Abstract A large number of clustering approaches have been proposed

More information

A Toolbox for Bicluster Analysis in R

A Toolbox for Bicluster Analysis in R Sebastian Kaiser and Friedrich Leisch A Toolbox for Bicluster Analysis in R Technical Report Number 028, 2008 Department of Statistics University of Munich http://www.stat.uni-muenchen.de A Toolbox for

More information

Computing the maximum similarity bi-clusters of gene expression data

Computing the maximum similarity bi-clusters of gene expression data BIOINFORMATICS ORIGINAL PAPER Vol. 23 no. 1 2007, pages 50 56 doi:10.1093/bioinformatics/btl560 Gene expression Computing the maximum similarity bi-clusters of gene expression data Xiaowen Liu and Lusheng

More information

Comparative Analysis of Biclustering Algorithms

Comparative Analysis of Biclustering Algorithms Comparative Analysis of Biclustering Algorithms Doruk Bozdağ Biomedical Informatics The Ohio State University bozdagd@bmi.osu.edu Ashwin S. Kumar Computer Science and Engineering Biomedical Informatics

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Evolutionary Detection of Rules for Text Categorization. Application to Spam Filtering

Evolutionary Detection of Rules for Text Categorization. Application to Spam Filtering Advances in Intelligent Systems and Technologies Proceedings ECIT2004 - Third European Conference on Intelligent Systems and Technologies Iasi, Romania, July 21-23, 2004 Evolutionary Detection of Rules

More information

GA as a Data Optimization Tool for Predictive Analytics

GA as a Data Optimization Tool for Predictive Analytics GA as a Data Optimization Tool for Predictive Analytics Chandra.J 1, Dr.Nachamai.M 2,Dr.Anitha.S.Pillai 3 1Assistant Professor, Department of computer Science, Christ University, Bangalore,India, chandra.j@christunivesity.in

More information

A Brief Study of the Nurse Scheduling Problem (NSP)

A Brief Study of the Nurse Scheduling Problem (NSP) A Brief Study of the Nurse Scheduling Problem (NSP) Lizzy Augustine, Morgan Faer, Andreas Kavountzis, Reema Patel Submitted Tuesday December 15, 2009 0. Introduction and Background Our interest in the

More information

Introduction To Genetic Algorithms

Introduction To Genetic Algorithms 1 Introduction To Genetic Algorithms Dr. Rajib Kumar Bhattacharjya Department of Civil Engineering IIT Guwahati Email: rkbc@iitg.ernet.in References 2 D. E. Goldberg, Genetic Algorithm In Search, Optimization

More information

Genetic Algorithm. Based on Darwinian Paradigm. Intrinsically a robust search and optimization mechanism. Conceptual Algorithm

Genetic Algorithm. Based on Darwinian Paradigm. Intrinsically a robust search and optimization mechanism. Conceptual Algorithm 24 Genetic Algorithm Based on Darwinian Paradigm Reproduction Competition Survive Selection Intrinsically a robust search and optimization mechanism Slide -47 - Conceptual Algorithm Slide -48 - 25 Genetic

More information

Empirically Identifying the Best Genetic Algorithm for Covering Array Generation

Empirically Identifying the Best Genetic Algorithm for Covering Array Generation Empirically Identifying the Best Genetic Algorithm for Covering Array Generation Liang Yalan 1, Changhai Nie 1, Jonathan M. Kauffman 2, Gregory M. Kapfhammer 2, Hareton Leung 3 1 Department of Computer

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Exploratory data analysis for microarray data

Exploratory data analysis for microarray data Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany heydebre@molgen.mpg.de Visualization

More information

Package NHEMOtree. February 19, 2015

Package NHEMOtree. February 19, 2015 Type Package Package NHEMOtree February 19, 2015 Title Non-hierarchical evolutionary multi-objective tree learner to perform cost-sensitive classification Depends partykit, emoa, sets, rpart Version 1.0

More information

Genetic Algorithms commonly used selection, replacement, and variation operators Fernando Lobo University of Algarve

Genetic Algorithms commonly used selection, replacement, and variation operators Fernando Lobo University of Algarve Genetic Algorithms commonly used selection, replacement, and variation operators Fernando Lobo University of Algarve Outline Selection methods Replacement methods Variation operators Selection Methods

More information

Unsupervised learning: Clustering

Unsupervised learning: Clustering Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What

More information

Model-based Parameter Optimization of an Engine Control Unit using Genetic Algorithms

Model-based Parameter Optimization of an Engine Control Unit using Genetic Algorithms Symposium on Automotive/Avionics Avionics Systems Engineering (SAASE) 2009, UC San Diego Model-based Parameter Optimization of an Engine Control Unit using Genetic Algorithms Dipl.-Inform. Malte Lochau

More information

An evolutionary learning spam filter system

An evolutionary learning spam filter system An evolutionary learning spam filter system Catalin Stoean 1, Ruxandra Gorunescu 2, Mike Preuss 3, D. Dumitrescu 4 1 University of Craiova, Romania, catalin.stoean@inf.ucv.ro 2 University of Craiova, Romania,

More information

New Modifications of Selection Operator in Genetic Algorithms for the Traveling Salesman Problem

New Modifications of Selection Operator in Genetic Algorithms for the Traveling Salesman Problem New Modifications of Selection Operator in Genetic Algorithms for the Traveling Salesman Problem Radovic, Marija; and Milutinovic, Veljko Abstract One of the algorithms used for solving Traveling Salesman

More information

Original Article Efficient Genetic Algorithm on Linear Programming Problem for Fittest Chromosomes

Original Article Efficient Genetic Algorithm on Linear Programming Problem for Fittest Chromosomes International Archive of Applied Sciences and Technology Volume 3 [2] June 2012: 47-57 ISSN: 0976-4828 Society of Education, India Website: www.soeagra.com/iaast/iaast.htm Original Article Efficient Genetic

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 General Integer Linear Program: (ILP) min c T x Ax b x 0 integer Assumption: A, b integer The integrality condition

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm

Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm Journal of Al-Nahrain University Vol.15 (2), June, 2012, pp.161-168 Science Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm Manal F. Younis Computer Department, College

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Management of Software Projects with GAs

Management of Software Projects with GAs MIC05: The Sixth Metaheuristics International Conference 1152-1 Management of Software Projects with GAs Enrique Alba J. Francisco Chicano Departamento de Lenguajes y Ciencias de la Computación, Universidad

More information

Genetic Algorithms and Sudoku

Genetic Algorithms and Sudoku Genetic Algorithms and Sudoku Dr. John M. Weiss Department of Mathematics and Computer Science South Dakota School of Mines and Technology (SDSM&T) Rapid City, SD 57701-3995 john.weiss@sdsmt.edu MICS 2009

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

How I won the Chess Ratings: Elo vs the rest of the world Competition

How I won the Chess Ratings: Elo vs the rest of the world Competition How I won the Chess Ratings: Elo vs the rest of the world Competition Yannis Sismanis November 2010 Abstract This article discusses in detail the rating system that won the kaggle competition Chess Ratings:

More information

Alpha Cut based Novel Selection for Genetic Algorithm

Alpha Cut based Novel Selection for Genetic Algorithm Alpha Cut based Novel for Genetic Algorithm Rakesh Kumar Professor Girdhar Gopal Research Scholar Rajesh Kumar Assistant Professor ABSTRACT Genetic algorithm (GA) has several genetic operators that can

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

CSCI-8940: An Intelligent Decision Aid for Battlefield Communications Network Configuration

CSCI-8940: An Intelligent Decision Aid for Battlefield Communications Network Configuration CSCI-8940: An Intelligent Decision Aid for Battlefield Communications Network Configuration W.D. Potter Department of Computer Science & Artificial Intelligence Center University of Georgia Abstract The

More information

Nonlinear Model Predictive Control of Hammerstein and Wiener Models Using Genetic Algorithms

Nonlinear Model Predictive Control of Hammerstein and Wiener Models Using Genetic Algorithms Nonlinear Model Predictive Control of Hammerstein and Wiener Models Using Genetic Algorithms Al-Duwaish H. and Naeem, Wasif Electrical Engineering Department/King Fahd University of Petroleum and Minerals

More information

Volume 3, Issue 2, February 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 2, February 2015 International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 2, February 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com

More information

A Robust Method for Solving Transcendental Equations

A Robust Method for Solving Transcendental Equations www.ijcsi.org 413 A Robust Method for Solving Transcendental Equations Md. Golam Moazzam, Amita Chakraborty and Md. Al-Amin Bhuiyan Department of Computer Science and Engineering, Jahangirnagar University,

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

An unsupervised fuzzy ensemble algorithmic scheme for gene expression data analysis

An unsupervised fuzzy ensemble algorithmic scheme for gene expression data analysis An unsupervised fuzzy ensemble algorithmic scheme for gene expression data analysis Roberto Avogadri 1, Giorgio Valentini 1 1 DSI, Dipartimento di Scienze dell Informazione, Università degli Studi di Milano,Via

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Standardization and Its Effects on K-Means Clustering Algorithm

Standardization and Its Effects on K-Means Clustering Algorithm Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

More information

Estimation of the COCOMO Model Parameters Using Genetic Algorithms for NASA Software Projects

Estimation of the COCOMO Model Parameters Using Genetic Algorithms for NASA Software Projects Journal of Computer Science 2 (2): 118-123, 2006 ISSN 1549-3636 2006 Science Publications Estimation of the COCOMO Model Parameters Using Genetic Algorithms for NASA Software Projects Alaa F. Sheta Computers

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery

A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery Alex A. Freitas Postgraduate Program in Computer Science, Pontificia Universidade Catolica do Parana Rua Imaculada Conceicao,

More information

ISSN: 2319-5967 ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 2, Issue 3, May 2013

ISSN: 2319-5967 ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 2, Issue 3, May 2013 Transistor Level Fault Finding in VLSI Circuits using Genetic Algorithm Lalit A. Patel, Sarman K. Hadia CSPIT, CHARUSAT, Changa., CSPIT, CHARUSAT, Changa Abstract This paper presents, genetic based algorithm

More information

Boolean Network Models

Boolean Network Models Boolean Network Models 2/5/03 History Kaufmann, 1970s Studied organization and dynamics properties of (N,k) Boolean Networks Found out that highly connected networks behave differently than lowly connected

More information

New binary representation in Genetic Algorithms for solving TSP by mapping permutations to a list of ordered numbers

New binary representation in Genetic Algorithms for solving TSP by mapping permutations to a list of ordered numbers Proceedings of the 5th WSEAS Int Conf on COMPUTATIONAL INTELLIGENCE, MAN-MACHINE SYSTEMS AND CYBERNETICS, Venice, Italy, November 0-, 006 363 New binary representation in Genetic Algorithms for solving

More information

Management Science Letters

Management Science Letters Management Science Letters 4 (2014) 905 912 Contents lists available at GrowingScience Management Science Letters homepage: www.growingscience.com/msl Measuring customer loyalty using an extended RFM and

More information

Lecture 10: Regression Trees

Lecture 10: Regression Trees Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

More information

A Comparison of Genotype Representations to Acquire Stock Trading Strategy Using Genetic Algorithms

A Comparison of Genotype Representations to Acquire Stock Trading Strategy Using Genetic Algorithms 2009 International Conference on Adaptive and Intelligent Systems A Comparison of Genotype Representations to Acquire Stock Trading Strategy Using Genetic Algorithms Kazuhiro Matsui Dept. of Computer Science

More information

Practical Applications of Evolutionary Computation to Financial Engineering

Practical Applications of Evolutionary Computation to Financial Engineering Hitoshi Iba and Claus C. Aranha Practical Applications of Evolutionary Computation to Financial Engineering Robust Techniques for Forecasting, Trading and Hedging 4Q Springer Contents 1 Introduction to

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

Cellular Automaton: The Roulette Wheel and the Landscape Effect

Cellular Automaton: The Roulette Wheel and the Landscape Effect Cellular Automaton: The Roulette Wheel and the Landscape Effect Ioan Hălălae Faculty of Engineering, Eftimie Murgu University, Traian Vuia Square 1-4, 385 Reşiţa, Romania Phone: +40 255 210227, Fax: +40

More information

Optimization of sampling strata with the SamplingStrata package

Optimization of sampling strata with the SamplingStrata package Optimization of sampling strata with the SamplingStrata package Package version 1.1 Giulio Barcaroli January 12, 2016 Abstract In stratified random sampling the problem of determining the optimal size

More information

A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II

A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II 182 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 6, NO. 2, APRIL 2002 A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II Kalyanmoy Deb, Associate Member, IEEE, Amrit Pratap, Sameer Agarwal,

More information

Solving Three-objective Optimization Problems Using Evolutionary Dynamic Weighted Aggregation: Results and Analysis

Solving Three-objective Optimization Problems Using Evolutionary Dynamic Weighted Aggregation: Results and Analysis Solving Three-objective Optimization Problems Using Evolutionary Dynamic Weighted Aggregation: Results and Analysis Abstract. In this paper, evolutionary dynamic weighted aggregation methods are generalized

More information

Why do statisticians "hate" us?

Why do statisticians hate us? Why do statisticians "hate" us? David Hand, Heikki Mannila, Padhraic Smyth "Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data

More information

Local outlier detection in data forensics: data mining approach to flag unusual schools

Local outlier detection in data forensics: data mining approach to flag unusual schools Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential

More information

System Identification for Acoustic Comms.:

System Identification for Acoustic Comms.: System Identification for Acoustic Comms.: New Insights and Approaches for Tracking Sparse and Rapidly Fluctuating Channels Weichang Li and James Preisig Woods Hole Oceanographic Institution The demodulation

More information

A Service Revenue-oriented Task Scheduling Model of Cloud Computing

A Service Revenue-oriented Task Scheduling Model of Cloud Computing Journal of Information & Computational Science 10:10 (2013) 3153 3161 July 1, 2013 Available at http://www.joics.com A Service Revenue-oriented Task Scheduling Model of Cloud Computing Jianguang Deng a,b,,

More information

Keywords revenue management, yield management, genetic algorithm, airline reservation

Keywords revenue management, yield management, genetic algorithm, airline reservation Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Revenue Management

More information

PREDA S4-classes. Francesco Ferrari October 13, 2015

PREDA S4-classes. Francesco Ferrari October 13, 2015 PREDA S4-classes Francesco Ferrari October 13, 2015 Abstract This document provides a description of custom S4 classes used to manage data structures for PREDA: an R package for Position RElated Data Analysis.

More information

A Non-Linear Schema Theorem for Genetic Algorithms

A Non-Linear Schema Theorem for Genetic Algorithms A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland

More information

Mining Social-Network Graphs

Mining Social-Network Graphs 342 Chapter 10 Mining Social-Network Graphs There is much information to be gained by analyzing the large-scale data that is derived from social networks. The best-known example of a social network is

More information

Lab 4: 26 th March 2012. Exercise 1: Evolutionary algorithms

Lab 4: 26 th March 2012. Exercise 1: Evolutionary algorithms Lab 4: 26 th March 2012 Exercise 1: Evolutionary algorithms 1. Found a problem where EAs would certainly perform very poorly compared to alternative approaches. Explain why. Suppose that we want to find

More information

CHAPTER 3 SECURITY CONSTRAINED OPTIMAL SHORT-TERM HYDROTHERMAL SCHEDULING

CHAPTER 3 SECURITY CONSTRAINED OPTIMAL SHORT-TERM HYDROTHERMAL SCHEDULING 60 CHAPTER 3 SECURITY CONSTRAINED OPTIMAL SHORT-TERM HYDROTHERMAL SCHEDULING 3.1 INTRODUCTION Optimal short-term hydrothermal scheduling of power systems aims at determining optimal hydro and thermal generations

More information

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical

More information

Genetic Algorithms for Bridge Maintenance Scheduling. Master Thesis

Genetic Algorithms for Bridge Maintenance Scheduling. Master Thesis Genetic Algorithms for Bridge Maintenance Scheduling Yan ZHANG Master Thesis 1st Examiner: Prof. Dr. Hans-Joachim Bungartz 2nd Examiner: Prof. Dr. rer.nat. Ernst Rank Assistant Advisor: DIPL.-ING. Katharina

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear

More information

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu ABSTRACT This

More information

ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL

ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL Kardi Teknomo ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL Revoledu.com Table of Contents Analytic Hierarchy Process (AHP) Tutorial... 1 Multi Criteria Decision Making... 1 Cross Tabulation... 2 Evaluation

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Transportation Polytopes: a Twenty year Update

Transportation Polytopes: a Twenty year Update Transportation Polytopes: a Twenty year Update Jesús Antonio De Loera University of California, Davis Based on various papers joint with R. Hemmecke, E.Kim, F. Liu, U. Rothblum, F. Santos, S. Onn, R. Yoshida,

More information

Linear Codes. Chapter 3. 3.1 Basics

Linear Codes. Chapter 3. 3.1 Basics Chapter 3 Linear Codes In order to define codes that we can encode and decode efficiently, we add more structure to the codespace. We shall be mainly interested in linear codes. A linear code of length

More information

LESSON 3.5 WORKBOOK. How do cancer cells evolve? Workbook Lesson 3.5

LESSON 3.5 WORKBOOK. How do cancer cells evolve? Workbook Lesson 3.5 LESSON 3.5 WORKBOOK How do cancer cells evolve? In this unit we have learned how normal cells can be transformed so that they stop behaving as part of a tissue community and become unresponsive to regulation.

More information

Big Data & Scripting Part II Streaming Algorithms

Big Data & Scripting Part II Streaming Algorithms Big Data & Scripting Part II Streaming Algorithms 1, 2, a note on sampling and filtering sampling: (randomly) choose a representative subset filtering: given some criterion (e.g. membership in a set),

More information

Reliable classification of two-class cancer data using evolutionary algorithms

Reliable classification of two-class cancer data using evolutionary algorithms BioSystems 72 (23) 111 129 Reliable classification of two-class cancer data using evolutionary algorithms Kalyanmoy Deb, A. Raji Reddy Kanpur Genetic Algorithms Laboratory (KanGAL), Indian Institute of

More information

Selection Procedures for Module Discovery: Exploring Evolutionary Algorithms for Cognitive Science

Selection Procedures for Module Discovery: Exploring Evolutionary Algorithms for Cognitive Science Selection Procedures for Module Discovery: Exploring Evolutionary Algorithms for Cognitive Science Janet Wiles (j.wiles@csee.uq.edu.au) Ruth Schulz (ruth@csee.uq.edu.au) Scott Bolland (scottb@csee.uq.edu.au)

More information

Learning in Abstract Memory Schemes for Dynamic Optimization

Learning in Abstract Memory Schemes for Dynamic Optimization Fourth International Conference on Natural Computation Learning in Abstract Memory Schemes for Dynamic Optimization Hendrik Richter HTWK Leipzig, Fachbereich Elektrotechnik und Informationstechnik, Institut

More information

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon Integrating DNA Motif Discovery and Genome-Wide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland

More information

A Study of Crossover Operators for Genetic Algorithm and Proposal of a New Crossover Operator to Solve Open Shop Scheduling Problem

A Study of Crossover Operators for Genetic Algorithm and Proposal of a New Crossover Operator to Solve Open Shop Scheduling Problem American Journal of Industrial and Business Management, 2016, 6, 774-789 Published Online June 2016 in SciRes. http://www.scirp.org/journal/ajibm http://dx.doi.org/10.4236/ajibm.2016.66071 A Study of Crossover

More information

A Genetic Algorithm Processor Based on Redundant Binary Numbers (GAPBRBN)

A Genetic Algorithm Processor Based on Redundant Binary Numbers (GAPBRBN) ISSN: 2278 1323 All Rights Reserved 2014 IJARCET 3910 A Genetic Algorithm Processor Based on Redundant Binary Numbers (GAPBRBN) Miss: KIRTI JOSHI Abstract A Genetic Algorithm (GA) is an intelligent search

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Three Effective Top-Down Clustering Algorithms for Location Database Systems

Three Effective Top-Down Clustering Algorithms for Location Database Systems Three Effective Top-Down Clustering Algorithms for Location Database Systems Kwang-Jo Lee and Sung-Bong Yang Department of Computer Science, Yonsei University, Seoul, Republic of Korea {kjlee5435, yang}@cs.yonsei.ac.kr

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

Compact Representations and Approximations for Compuation in Games

Compact Representations and Approximations for Compuation in Games Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions

More information

14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) Overview Kyrre Glette kyrrehg@ifi INF3490 Swarm Intelligence Particle Swarm Optimization Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) 3 Swarms in nature Fish, birds,

More information

Effect of Using Neural Networks in GA-Based School Timetabling

Effect of Using Neural Networks in GA-Based School Timetabling Effect of Using Neural Networks in GA-Based School Timetabling JANIS ZUTERS Department of Computer Science University of Latvia Raina bulv. 19, Riga, LV-1050 LATVIA janis.zuters@lu.lv Abstract: - The school

More information

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Why? A central concept in Computer Science. Algorithms are ubiquitous. Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS

ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS Michael Affenzeller (a), Stephan M. Winkler (b), Stefan Forstenlechner (c), Gabriel Kronberger (d), Michael Kommenda (e), Stefan

More information

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* Rakesh Nagi Department of Industrial Engineering University at Buffalo (SUNY) *Lecture notes from Network Flows by Ahuja, Magnanti

More information

Analysis of gene expression data. Ulf Leser and Philippe Thomas

Analysis of gene expression data. Ulf Leser and Philippe Thomas Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:

More information

Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction

Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction Lecture 11 Dynamic Programming 11.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

High-Dimensional Data Visualization by PCA and LDA

High-Dimensional Data Visualization by PCA and LDA High-Dimensional Data Visualization by PCA and LDA Chaur-Chin Chen Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan Abbie Hsu Institute of Information Systems & Applications,

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information