Package DCG. R topics documented: June 8, 2016. Type Package



Similar documents
Package retrosheet. April 13, 2015

Package LexisPlotR. R topics documented: April 4, Type Package

Package sendmailr. February 20, 2015

Package OECD. R topics documented: January 17, Type Package Title Search and Extract Data from the OECD Version 0.2.

Package uptimerobot. October 22, 2015

Package EstCRM. July 13, 2015

Package tagcloud. R topics documented: July 3, 2015

Package decompr. August 17, 2016

Package MDM. February 19, 2015

Package bigrf. February 19, 2015

Package syuzhet. February 22, 2015

Package ggrepel. February 8, 2016

Package searchconsoler

Package empiricalfdr.deseq2

Package optirum. December 31, 2015

Package erp.easy. September 26, 2015

Package changepoint. R topics documented: November 9, Type Package Title Methods for Changepoint Detection Version 2.

Package survpresmooth

Package jrvfinance. R topics documented: October 6, 2015

Package urltools. October 11, 2015

Package png. February 20, 2015

Package hoarder. June 30, 2015

Package missforest. February 20, 2015

Package neuralnet. February 20, 2015

Package AzureML. August 15, 2015

DATA ANALYSIS II. Matrix Algorithms

Package HHG. July 14, 2015

Package sjdbc. R topics documented: February 20, 2015

Package whoapi. R topics documented: June 26, Type Package Title A 'Whoapi' API Client Version Date Author Oliver Keyes

Package globe. August 29, 2016

Package dunn.test. January 6, 2016

Package packrat. R topics documented: March 28, Type Package

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Package dsmodellingclient

Package hazus. February 20, 2015

Package trimtrees. February 20, 2015

Practical Graph Mining with R. 5. Link Analysis

Package R4CDISC. September 5, 2015

Package SHELF. February 5, 2016

MULTIPLE-OBJECTIVE DECISION MAKING TECHNIQUE Analytical Hierarchy Process

Package hier.part. February 20, Index 11. Goodness of Fit Measures for a Regression Hierarchy

Package Rdsm. February 19, 2015

Package mcmcse. March 25, 2016

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp

Package MBA. February 19, Index 7. Canopy LIDAR data

Package TSfame. February 15, 2013

CBA Fractions Student Sheet 1

Package benford.analysis

Package polynom. R topics documented: June 24, Version 1.3-8

Package lmertest. July 16, 2015

Package plan. R topics documented: February 20, 2015

Package MixGHD. June 26, 2015

Package DiallelAnalysisR

Package TRADER. February 10, 2016

Package DSsim. September 25, 2015

Environmental Remote Sensing GEOG 2021

Scatter Plots with Error Bars

Package bizdays. March 13, 2015

Package wordcloud. R topics documented: February 20, 2015

MATH 140 Lab 4: Probability and the Standard Normal Distribution

EXPLORING SPATIAL PATTERNS IN YOUR DATA

Linear Algebra and TI 89

QGIS LAB SERIES GST 102: Spatial Analysis Lab 6: Vector Data Analysis - Network Analysis

1. Classification problems

Package brewdata. R topics documented: February 19, Type Package

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Package metafuse. November 7, 2015

Package SurvCorr. February 26, 2015

5: Magnitude 6: Convert to Polar 7: Convert to Rectangular

MHI3000 Big Data Analytics for Health Care Final Project Report

Package saccades. March 18, 2015

Package GenBinomApps

Getting Started with R and RStudio 1

Package pinnacle.api

Package hive. January 10, 2011

Package PACVB. R topics documented: July 10, Type Package

Package tvm. R topics documented: August 21, Type Package Title Time Value of Money Functions Version Author Juan Manuel Truppia

Editing Common Polygon Boundary in ArcGIS Desktop 9.x

Package fimport. February 19, 2015

Optimization of sampling strata with the SamplingStrata package

Package EasyHTMLReport

Package dggridr. August 29, 2016

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration

7 Time series analysis

Data Mining Lab 5: Introduction to Neural Networks

Package CoImp. February 19, 2015

Exploratory data analysis (Chapter 2) Fall 2011

Package hive. July 3, 2015

IBM SPSS Direct Marketing 23

Transcription:

Type Package Package DCG June 8, 2016 Title Data Cloud Geometry (DCG): Using Random Walks to Find Community Structure in Social Network Analysis Version 0.9.2 Date 2016-05-09 Depends R (>= 2.14.0) Data cloud geometry (DCG) applies random walks in finding community structures for social networks. License GPL (>= 2) Copyright Fushing Lab & McCowan Lab, University of California, Davis Imports stats LazyData TRUE Suggests testthat, knitr, rmarkdown, devtools, lattice, png RoxygenNote 5.0.1 VignetteBuilder knitr NeedsCompilation no Author Chen Chen [aut], Jian Jin [aut, cre], Brianne Beisner [aut], Brenda McCowan [aut, cph], Hsieh Fushing [aut, cph] Maintainer Jian Jin <jinjian.pku@gmail.com> Repository CRAN Date/Publication 2016-06-08 05:47:08 R topics documented: as.similaritymatrix..................................... 2 as.symmetricadjacencymatrix............................... 3 geteigenvaluelist...................................... 4 1

2 as.similaritymatrix getens............................................ 4 getenslist.......................................... 5 GetSim........................................... 6 monkeygrooming...................................... 7 plotclusters....................................... 7 plotmultieigenvalues.................................... 8 plotthecluster....................................... 10 temperaturesample..................................... 10 Index 12 as.similaritymatrix Convert a matrix to a similarity matrix. as.similaritymatrix convert an adjacency matrix to a similarity matrix. Convert a matrix to a similarity matrix. as.similaritymatrix convert an adjacency matrix to a similarity matrix. as.similaritymatrix(mat) mat a symmetric adjacency matrix a similarity matrix. Examples symmetricmatrix <- as.symmetricadjacencymatrix(monkeygrooming, weighted = TRUE, rule = "weak") similaritymatrix <- as.similaritymatrix(symmetricmatrix)

as.symmetricadjacencymatrix 3 as.symmetricadjacencymatrix convert to a symmetric adjacency matrix as.symmetricadjacencymatrix convert an edgelist or a raw matrix to a symmetric adjacency matrix. as.symmetricadjacencymatrix(data, weighted = FALSE, rule = "weak") Data weighted rule either a dataframe or a matrix, representing raw interactions using either an edgelist or a matrix. Frequency of interactions for each dyad can be represented either by multiple occurrences of the dyad for a 2-column edgelist, or by a third column specifying the frequency of the interaction for a 3-column edgelist. If the edgelist is a 3-column edgelist in which weight was specified by frequency, use weighted = TRUE. a character vector of length 1, being one of "weak", "strong", "upper", or "lower". Ways of symmetrizing the matrix. See details for more information. There are ways of symmetrizing a matrix. The "weak" rule symmetrize the matrix by building an edge between nodes [i, j] and [j, i] if there is an edge either from i to j OR from j to i. The "strong" rule symmetrize the matrix by building an edge between nodes [i, j] and [j, i] if there is an edge BOTH from i to j AND from j to i. The "upper" and the "lower" rule symmetrize the matrix by using the "upper" or the "lower" triangle respectively. Note, when using a 3-column edgelist (e.g. a weighted edgelist) to represent raw interactions, each dyad must be unique. If more than one rows are found with the same Initiator and recipient, sum of the frequencies will be taken to represent the freqency of interactions between this unique dyad. A warning message will prompt your attention to the accuracy of your raw data when duplicated dyads were found in a three-column edgelist. a named matrix with the [i,j]th entry equal to the number of times i grooms j. Examples symmetricmatrix <- as.symmetricadjacencymatrix(monkeygrooming, weighted = TRUE, rule = "weak")

4 getens geteigenvaluelist generate eigenvalues for all ensemble matrices geteigenvaluelist get eigenvalues from ensemble matrices generate eigenvalues for all ensemble matrices geteigenvaluelist get eigenvalues from ensemble matrices geteigenvaluelist(enslist) EnsList a list of ensemble matrices a list of eigenvalues for each of the ensemble matrix in the ensemble matrices list. getens generate ensemble matrix getens get ensemble matrix from given similarity matrix and temperature generate ensemble matrix getens get ensemble matrix from given similarity matrix and temperature getens(simmat, temperature, MaxIt = 1000, m = 5) simmat temperature MaxIt m a similarity matrix a numeric vector of length 1, indicating the temperature used to transform the similarity matrix to ensemble matrix number of iterations for regulated random walks maxiumnum number of time a node can be visited during random walks This function involves two steps. It first generate similarity matrices of different variances by taking the raw similarity matrix to the power of each temperature. Then it called the function EstClust to perform random walks in the network to identify clusters.

getenslist 5 a matrix. getenslist generating a list of ensemble matrices based on the similarity matrix and temperatures getenslist get ensemble matrices from given similarity matrix at all temperatures getenslist(simmat, temperatures, MaxIt = 1000, m = 5) simmat temperatures MaxIt m a similarity matrix temperatures selected number of iterations for regulated random walks maxiumnum number of time a node can be visited during random walks This step is crucial in finding community structure based on the similarity matrix of the social network. For each temperatures, the similarity matrix was taken to the power of temperature as saved as a new similarity matrix. This allows the random walk to explore the similarity matrix at various variations. Random walks are then performed in similarity matrices of various temperatures. In order to prevent random walks being stucked in a locale, the parameter m was set (to 5 by default) to remove a node after m times of visits of the node. An ensemble matrix is generated at each temperature in which values represent likelihood of two nodes being in the same community. a list of ensemble matrices References Fushing, H., & McAssey, M. P. (2010). Time, temperature, and data cloud geometry. Physical Review E, 82(6), 061110. Chen, C., & Fushing, H. (2012). Multiscale community geometry in a network and its application. Physical Review E, 86(4), 041120. Fushing, H., Wang, H., VanderWaal, K., McCowan, B., & Koehl, P. (2013). Multi-scale clustering by building a robust and self correcting ultrametric topology on data points. PloS one, 8(2), e56259.

6 GetSim Examples symmetricmatrix <- as.symmetricadjacencymatrix(monkeygrooming, weighted = TRUE, rule = "weak") Sim <- as.similaritymatrix(symmetricmatrix) temperatures <- temperaturesample(start = 0.01, end = 20, n = 20, method = 'random') ## Not run: # Note: It takes a while to run the getenslist example. Ens_list <- getenslist(sim, temperatures, MaxIt = 1000, m = 5) ## End(Not run) GetSim GetSim get similarity matrix from a distance matrix GetSim get similarity matrix from a distance matrix GetSim(D, T) D T A distance matrix Temperature. temperaturesample the similarity matrix is calculated at each temperature T. References Fushing, H., & McAssey, M. P. (2010). Time, temperature, and data cloud geometry. Physical Review E, 82(6), 061110. Chen, C., & Fushing, H. (2012). Multiscale community geometry in a network and its application. Physical Review E, 86(4), 041120. Fushing, H., Wang, H., VanderWaal, K., McCowan, B., & Koehl, P. (2013). Multi-scale clustering by building a robust and self correcting ultrametric topology on data points. PloS one, 8(2), e56259.

monkeygrooming 7 monkeygrooming Grooming network data A dataset containing grooming edgelist among monkeys. monkeygrooming Format A data frame with 1595 rows and 2 variables: Initiator Grooming Initiator ID Recipient Grooming Recipient ID Groom Grooming Frequency... plotclusters generate tree plots for each ensemble matrix plotclusters plot all cluster trees generate tree plots for each ensemble matrix plotclusters plot all cluster trees plotclusters(enslist, mfrow, mar = c(1, 1, 1, 1), line = -1.5, cex = 0.5,...) EnsList a list in which elements are ensemble matrices. mfrow A vector of the form c(nr, nc) passed to par. mar plotting parameters with useful defaults (par) line plotting parameters with useful defaults (par) cex plotting parameters with useful defaults (par)... further plotting parameters

8 plotmultieigenvalues plotclusters plots all cluster trees with each tree corresponding to each ensemble matrix in the list of ens_list. EnsList is the output from getenslist. mfrow determines the arrangement of multiple plots. It takes the form of c(nr, nc) with the first parameter being the number of rows and the second parameter being the number of columns. When deciding parameters for mfrow, one should take into considerations size of the plotting device and number of cluster plots. For example, there are 20 cluster plots, mfrow can be set to c(4, 5) or c(2, 10) depending on the size and shape of the plotting area. a graph containing all tree plots with each tree plot corresponding to the community structure from each of the ensemble matrix. References Fushing, H., & McAssey, M. P. (2010). Time, temperature, and data cloud geometry. Physical Review E, 82(6), 061110. Chen, C., & Fushing, H. (2012). Multiscale community geometry in a network and its application. Physical Review E, 86(4), 041120. Fushing, H., Wang, H., VanderWaal, K., McCowan, B., & Koehl, P. (2013). Multi-scale clustering by building a robust and self correcting ultrametric topology on data points. PloS one, 8(2), e56259. See Also getenslist Examples symmetricmatrix <- as.symmetricadjacencymatrix(monkeygrooming, weighted = TRUE, rule = "weak") Sim <- as.similaritymatrix(symmetricmatrix) temperatures <- temperaturesample(start = 0.01, end = 20, n = 20, method = 'random') ## Not run: # for illustration only. skip CRAN check because it ran forever. Ens_list <- getenslist(sim, temperatures, MaxIt = 1000, m = 5) ## End(Not run) plotclusters(enslist = Ens_list, mfrow = c(2, 10), mar = c(1, 1, 1, 1)) plotmultieigenvalues plot eigenvalues plotmultieigenvalues plot eigenvalues to determine number of communities by finding the elbow point plot eigenvalues plotmultieigenvalues plot eigenvalues to determine number of communities by finding the elbow point

plotmultieigenvalues 9 plotmultieigenvalues(ens_list, mfrow, mar = c(2, 2, 2, 2), line = -1.5, cex = 0.5,...) Ens_list a list in which elements are numeric vectors representing eigenvalues. mfrow A vector of the form c(nr, nc) passed to par. mar plotting parameters with useful defaults (par) line plotting parameters with useful defaults (par) cex plotting parameters with useful defaults (par)... further plotting parameters plotmultieigenvalues plot multiple eigenvalue plots. The dark blue colored dots indicate eigenvalue greater than 0. Each of the ensemble matrices is decomposed into eigenvalues which is used to determine appropriate number of communities. Plotting out eigenvalues allow us to see where the elbow point is. The curve starting from the elbow point flatten out. The number of points above (excluding) the elbow point indicates number of communities. mfrow determines the arrangement of multiple plots. It takes the form of c(nr, nc) with the first parameter being the number of rows and the second parameter being the number of columns. When deciding parameters for mfrow, one should take into considerations size of the plotting device and number of plots. For example, there are 20 plots, mfrow can be set to c(4, 5) or c(2, 10) depending on the size and shape of the plotting area. a pdf file in the working directory containing all eigenvalue plots References Fushing, H., & McAssey, M. P. (2010). Time, temperature, and data cloud geometry. Physical Review E, 82(6), 061110. Chen, C., & Fushing, H. (2012). Multiscale community geometry in a network and its application. Physical Review E, 86(4), 041120. Fushing, H., Wang, H., VanderWaal, K., McCowan, B., & Koehl, P. (2013). Multi-scale clustering by building a robust and self correcting ultrametric topology on data points. PloS one, 8(2), e56259. See Also plotclusters, getenslist

10 temperaturesample Examples symmetricmatrix <- as.symmetricadjacencymatrix(monkeygrooming, weighted = TRUE, rule = "weak") Sim <- as.similaritymatrix(symmetricmatrix) temperatures <- temperaturesample(start = 0.01, end = 20, n = 20, method = 'random') ## Not run: # for illustration only. skip CRAN check because it ran forever. Ens_list <- getenslist(sim, temperatures, MaxIt = 1000, m = 5) ## End(Not run) plotmultieigenvalues(ens_list = Ens_list, mfrow = c(10, 2), mar = c(1, 1, 1, 1)) plotthecluster generate tree plots for selected ensemble matrix plottrees plot one cluster tree generate tree plots for selected ensemble matrix plottrees plot one cluster tree plotthecluster(enslist, index,...) EnsList a list in which elements are ensemble matrices. index an integer. index of which ensemble matrix you want to plot.... plotting parameters passed to par. a tree plot temperaturesample generate temperatures temperaturesample generate tempatures based on either random or fixed intervals generate temperatures temperaturesample generate tempatures based on either random or fixed intervals

temperaturesample 11 temperaturesample(start = 0.01, end = 20, n = 20, method = "random") start end n a numeric vector of length 1, indicating the lowest temperature a numeric vector of length 1, indicating the highest temperature an integer between 10 to 30, indicating the number of temperatures (more explanations on what temperatures are). method a character vector indicating the method used in selecting temperatures. It should take either random or fixedinterval, case-sensitive. In using random walks to find community structure, each normalized similarity matrix is evaluated at different temperatures. This allows greater variations in the normalized similarity matrices. It is recommended to try out 20-30 temperatures to allow for a thorough exploration of the matrices. A range of temperatures which lead to stable community structures should be considered as reliable. The temperature in the middle of the range should be selected. a numeric vector of length n representing temperatures sampled. References Fushing, H., & McAssey, M. P. (2010). Time, temperature, and data cloud geometry. Physical Review E, 82(6), 061110. Chen, C., & Fushing, H. (2012). Multiscale community geometry in a network and its application. Physical Review E, 86(4), 041120. Fushing, H., Wang, H., VanderWaal, K., McCowan, B., & Koehl, P. (2013). Multi-scale clustering by building a robust and self correcting ultrametric topology on data points. PloS one, 8(2), e56259. See Also getenslist Examples symmetricmatrix <- as.symmetricadjacencymatrix(monkeygrooming, weighted = TRUE, rule = "weak") Sim <- as.similaritymatrix(symmetricmatrix) temperatures <- temperaturesample(start = 0.01, end = 20, n = 20, method = 'random')

Index Topic datasets monkeygrooming, 7 as.similaritymatrix, 2 as.symmetricadjacencymatrix, 3 geteigenvaluelist, 4 getens, 4 getenslist, 5, 8, 9, 11 GetSim, 6 monkeygrooming, 7 par, 7, 9, 10 plotclusters, 7, 9 plotmultieigenvalues, 8 plotthecluster, 10 temperaturesample, 6, 10 12