type The annotations of the 62 samples with respect to the cancer types FL, CLL, DLBCL-A, DLBCL-G.

Size: px

Start display at page:

Download "type The annotations of the 62 samples with respect to the cancer types FL, CLL, DLBCL-A, DLBCL-G."

Willis Leonard
8 years ago
Views:

1 alizadeh Samle a from a lymhoma/leukemia gene exression study Samle a for the ISIS method Format x A 2000 x 62 gene exression a matrix of log-ratio values. 2,000 genes with the highest variance across the samles have been selected from the set of 4,026 genes considered by the original authors. tye The annotations of the 62 samles with resect to the cancer tyes FL, CLL, DLBCL-A, DLBCL-G. Source The a are from the lymhoma/leukemia study of A. Alizadeh et al., Nature 403: (2000), htt://llm.nih.gov/lymhoma/a/figure1/figure1.cdt gotomax Local search function Finds local maxima of the score function tscore by greedy hill climbing. gotomax(, slit, = 50,.offs = 0, min.slit.size = ceiling(0.1*ncol())) slit The gene exression a matrix. Rows corresond to genes, and columns corresond to samles. Either a binary vector encoding a biartition of the set of samles, or a binary matrix with every row encoding a biartition of the set of samles. The number of best discriminating genes selected for comuting the score of a biartition. 1

tye The annotations of the 62 samles with resect to the cancer tyes FL, CLL, DLBCL-A, DLBCL-G. Source The a are from the lymhoma/leukemia study of A. Alizadeh et al.

2 .offs min.slit.size The number of very best discriminating genes that are ignored when comuting the score of a artition. Thus only the genes with ranks.offs+1,..., are used to comute the score. The minimal size either class of a biartition must have to be acceted as a local maximum. A binary matrix where each column (excet the last one) reresents a samle (in the original order). The rows encode the found local maximum biartitions. The last entry in each row gives the score of the artition. slit <- re(0, ncol(alizadeh$x)) slit[which(alizadeh$tye=="fl")] <- 1 y <- gotomax(alizadeh$x, slit) isis Class discovery for microarray a Finds biartitions of a set of tissue samles that show clear searation regarding the exression of a subset of genes. isis(x, = 50,.offs = 0, max.nr.cand.slits = 500) x The gene exression a matrix. Rows corresond to genes, and columns corresond to samles. The values in x may be normalized logarithms of intensities or log-ratios. The number of best discriminating genes selected for comuting the score of a biartition..offs The number of very best discriminating genes that are ignored when comuting the score of a artition. Thus only the genes with ranks.offs+1,..., are used to comute the score. A value >0 may imrove the robustness of the score, sorting out ossibly surious artitions with very few genes that discriminate very well between the two grous. max.nr.cand.slits After the generation of candie biartitions, only the max.nr.cand.slits artitions with highest scores are further investigated. This is used in order to control the running time of the rogram. 2

The rows encode the found local maximum biartitions. The last entry in each row gives the score of the artition.

3 Details It may be useful to first discard genes from the a matrix that dislay consistently low variance across the samles or consistently low exression intensity. For the generation of candie biartitions, the a matrix is enlarged by adding average exression rofiles of all clusters of genes resulting from a centroid linkage hierarchical clustering based on the correlation coefficient as similarity measure. In order to reduce comutation time, similar candie artitions are merged through hierarchical clustering before alying the local search to find local maximum artitions (see gotomax ). A matrix where each column (excet the last one) reresents a samle (in the original order). The rows reresent the found biartitions with highest scores, coded as 0/1-vectors. The last entry in each row gives the score of the artition. Only artitions with each grou containing at least 10 Author(s) Anja von Heydebreck and Wolfgang Huber htt:// References Identifying Slits with Clear Searation: A New Class Discovery Method for Gene Exression Data. Anja von Heydebreck, Wolfgang Huber, Annemarie Poustka, and Martin Vingron, ISMB 2001, Bioinformatics 17, Sul. 1 (2001) cutime <- system.time( y <- isis(alizadeh$x) ) cat("time used:", cutime[1], "sec on CPU,", cutime[3], "sec on wallclock.\n") thresh Simulated quantiles of t-statistic for ordered normal random vectors. For samles of at most 150 i.i.d standard normal random variables, the simulated quantiles of the two-samle t-statistic comaring the a values below and above any cutoint is given. a(thresh) 3

correlation coefficient as similarity measure.

4 Format A 150 x 150 matrix with the row index indicating the samle size and the column index indicating the cutoint. tscore Score function for biartitions Comutes a score for a given biartition of the set of samles that measures how well the two classes are searated by the exression levels of a suitable subset of genes. tscore(, slit, = 50,.offs = 0) slit.offs The gene exression a matrix. Rows corresond to genes, and columns corresond to samles. The values in x may be normalized logarithms of intensities or log-ratios. Either a binary vector encoding a biartition of the set of samles, or a binary matrix with every row encoding a biartition of the set of samles. The number of best discriminating genes selected for comuting the score of a biartition. The number of very best discriminating genes that are ignored when comuting the score of a artition. Thus only the genes with ranks.offs+1,..., are used to comute the score. Details First, the genes with ranks.offs+1,..., according to the two-samle t-statistic are selected. Then, a discriminant axis for a naive Bayes classifier (ML classification rule for two normal distributions with identical, diagonal covariance matrix) based on these genes is comuted, and the score is comuted as the t-statistic for the coordinates of the samle vectors rojected onto this axis. A vector with the scores of the biartitions encoded in slit. slit <- re(0, ncol(alizadeh$x)) slit[which(alizadeh$tye=="fl")] <- 1 z <- tscore(alizadeh$x, slit) 4

genes. tscore(, slit, = 50,.offs = 0) slit.offs The gene exression a matrix. Rows corresond to genes, and columns corresond to samles.

5 ttesttwo two-samle t-statistic Comutes the two-samle t-statistic for each row of a a matrix, according to a biartition of the set of columns. ttesttwo(, slit) slit A numeric a matrix. A binary vector encoding a biartition of the set of columns of. A vector with the t-statistic values for each row of, according to the biartition. slit <- re(0, ncol(alizadeh$x)) slit[which(alizadeh$tye=="fl")] <- 1 t <- ttesttwo(alizadeh$x, slit) 5

A binary vector encoding a biartition of the set of columns of.

Exploratory data analysis for microarray data

Exploratory data analysis for microarray data Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany heydebre@molgen.mpg.de Visualization