Package wordcloud. R topics documented: February 20, 2015



Similar documents
Package tagcloud. R topics documented: July 3, 2015

Text Mining with R Twitter Data Analysis 1

Package MBA. February 19, Index 7. Canopy LIDAR data

Data Mining with R. Text Mining. Hugh Murrell

Package treemap. February 15, 2013

Package plan. R topics documented: February 20, 2015

Package ChannelAttribution

Package fastghquad. R topics documented: February 19, 2015

Package survpresmooth

Package polynom. R topics documented: June 24, Version 1.3-8

How To Write A Blog Post In R

Package empiricalfdr.deseq2

Package bigdata. R topics documented: February 19, 2015

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

Package MDM. February 19, 2015

Package retrosheet. April 13, 2015

Package GSA. R topics documented: February 19, 2015

Package biganalytics

Package PACVB. R topics documented: July 10, Type Package

Analysis of System Performance IN2072 Chapter M Matlab Tutorial

Package smoothhr. November 9, 2015

a. First Drag Position from Measures to Dimensions b. Drag Position into Rows c. Drag Candidate into Columns

Package EstCRM. July 13, 2015

Package cgdsr. August 27, 2015

Package TSfame. February 15, 2013

Package neuralnet. February 20, 2015

Vectors VECTOR PRODUCT. Graham S McDonald. A Tutorial Module for learning about the vector product of two vectors. Table of contents Begin Tutorial

IEOR 4404 Homework #2 Intro OR: Deterministic Models February 14, 2011 Prof. Jay Sethuraman Page 1 of 5. Homework #2

Graphics in R. Biostatistics 615/815

Package missforest. February 20, 2015

Package SHELF. February 5, 2016

Package RIGHT. March 30, 2015

Package CoImp. February 19, 2015

Package DCG. R topics documented: June 8, Type Package

Package urltools. October 11, 2015

Portal Connector Fields and Widgets Technical Documentation

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Each function call carries out a single task associated with drawing the graph.

9. Text & Documents. Visualizing and Searching Documents. Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08

Package ggrepel. February 8, 2016

Drawing a histogram using Excel

TIBCO Spotfire Network Analytics 1.1. User s Manual

Charts for SharePoint

Package hoarder. June 30, 2015

Blender Notes. Introduction to Digital Modelling and Animation in Design Blender Tutorial - week 9 The Game Engine

Package MixGHD. June 26, 2015

Package hier.part. February 20, Index 11. Goodness of Fit Measures for a Regression Hierarchy

Package treemap. October 14, 2015

KaleidaGraph Quick Start Guide

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Package cpm. July 28, 2015

Package globe. August 29, 2016

A. OPENING POINT CLOUDS. (Notepad++ Text editor) (Cloud Compare Point cloud and mesh editor) (MeshLab Point cloud and mesh editor)

SPINDLE ERROR MOVEMENTS MEASUREMENT ALGORITHM AND A NEW METHOD OF RESULTS ANALYSIS 1. INTRODUCTION

Package CRM. R topics documented: February 19, 2015

HW Set VI page 1 of 9 PHYSICS 1401 (1) homework solutions

Package optirum. December 31, 2015

R Graphics Cookbook. Chang O'REILLY. Winston. Tokyo. Beijing Cambridge. Farnham Koln Sebastopol

Create Charts in Excel

Package LexisPlotR. R topics documented: April 4, Type Package

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Package GWmodel. February 19, 2015

Package R4CDISC. September 5, 2015

2 intervals-package. Index 33. Tools for working with points and intervals

Hands-On Data Science with R Text Mining. Graham.Williams@togaware.com. 5th November 2014 DRAFT

Visualization Quick Guide

PHYS 211 FINAL FALL 2004 Form A

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA

Package Rdsm. February 19, 2015

Creating Interactive PDF Forms

Plotting: Customizing the Graph

Package SurvCorr. February 26, 2015

v w is orthogonal to both v and w. the three vectors v, w and v w form a right-handed set of vectors.

Orbital Mechanics. Angular Momentum

Package sendmailr. February 20, 2015

The process components and related data characteristics addressed in this document are:

Package png. February 20, 2015

DEA implementation and clustering analysis using the K-Means algorithm

Package erp.easy. September 26, 2015

Exploratory data analysis (Chapter 2) Fall 2011

Package changepoint. R topics documented: November 9, Type Package Title Methods for Changepoint Detection Version 2.

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Package bigrf. February 19, 2015

Design of LDPC codes

Automated Optical Inspection is one of many manufacturing test methods common in the assembly of printed circuit boards. This list includes:

Introduction to basic Text Mining in R.

Getting Started with R and RStudio 1

Digital Photogrammetric System. Version USER MANUAL. Block adjustment

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

The Moon. Nicola Loaring, SAAO

Transcription:

Package wordcloud February 20, 2015 Type Package Title Word Clouds Version 2.5 Date 2013-04-11 Author Ian Fellows Maintainer Ian Fellows <ian@fellstat.com> Pretty word clouds. License LGPL-2.1 LazyLoad yes Depends methods, RColorBrewer Imports slam, Rcpp (>= 0.9.4) Suggests tm (>= 0.6) URL http://blog.fellstat.com/?cat=11 http://www.fellstat.com http://research.cens.ucla.edu/ LinkingTo Rcpp NeedsCompilation yes Repository CRAN Date/Publication 2014-06-13 00:56:24 R topics documented: commonality.cloud..................................... 2 comparison.cloud...................................... 3 SOTU............................................ 4 textplot........................................... 4 wordcloud.......................................... 5 wordlayout......................................... 7 Index 9 1

2 commonality.cloud commonality.cloud Plot a commonality cloud Plot a cloud of words shared across documents commonality.cloud(term.matrix,comonality.measure=min,max.words=300,...) term.matrix A term frequency matrix whose rows represent words and whose columns represent documents. comonality.measure A function taking a vector of frequencies for a single term, and returning a common frequency max.words Maximum number of words to be plotted. least frequent terms dropped... Additional parameters to be passed to wordcloud. nothing if(require(tm)){ data(sotu) corp <- SOTU corp <- tm_map(corp, removepunctuation) corp <- tm_map(corp, content_transformer(tolower)) corp <- tm_map(corp, removenumbers) corp <- tm_map(corp, function(x)removewords(x,stopwords())) term.matrix <- TermDocumentMatrix(corp) term.matrix <- as.matrix(term.matrix) colnames(term.matrix) <- c("sotu 2010","SOTU 2011") comparison.cloud(term.matrix,max.words=40,random.order=false) commonality.cloud(term.matrix,max.words=40,random.order=false) }

comparison.cloud 3 comparison.cloud Plot a comparison cloud Plot a cloud comparing the frequencies of words across documents. comparison.cloud(term.matrix,scale=c(4,.5),max.words=300, random.order=false,rot.per=.1, colors=brewer.pal(ncol(term.matrix),"dark2"), use.r.layout=false,title.size=3,...) Details term.matrix scale max.words random.order rot.per colors use.r.layout title.size A term frequency matrix whose rows represent words and whose columns represent documents. A vector of length 2 indicating the range of the size of the words. Maximum number of words to be plotted. least frequent terms dropped plot words in random order. If false, they will be plotted in decreasing frequency proportion words with 90 degree rotation color words from least to most frequent if false, then c++ code is used for collision detection, otherwise R is used Size of document titles... Additional parameters to be passed to text (and strheight,strwidth). Let p i,j be the rate at which word i occurs in document j, and p j be the average across documents( i p i,j/ndocs). The size of each word is mapped to its maximum deviation ( max i (p i,j p j ) ), and its angular position is determined by the document where that maximum occurs. nothing if(require(tm)){ data(sotu) corp <- SOTU corp <- tm_map(corp, removepunctuation) corp <- tm_map(corp, content_transformer(tolower)) corp <- tm_map(corp, removenumbers)

4 textplot corp <- tm_map(corp, function(x)removewords(x,stopwords())) term.matrix <- TermDocumentMatrix(corp) term.matrix <- as.matrix(term.matrix) colnames(term.matrix) <- c("sotu 2010","SOTU 2011") comparison.cloud(term.matrix,max.words=40,random.order=false) commonality.cloud(term.matrix,max.words=40,random.order=false) } SOTU United States State of the Union Addresses (2010 and 2011) Transcripts of the state of the union speeches. saved as a tm Corpus. data(sotu) Author(s) Barack Obama textplot Text Plot An x y plot of non-overlapping text textplot(x, y, words, cex=1,new=true, show.lines=true,...) x y words cex new show.lines x coordinates y coordinates the text to plot font size should a new plot be created if true, then lines are plotted between x,y and the word, for those words not covering their x,y coordinates... Additional parameters to be passed to wordlayout and text.

wordcloud 5 nothing #calculate standardized MDS coordinates dat <- sweep(usarrests,2,colmeans(usarrests)) dat <- sweep(dat,2,sqrt(diag(var(dat))),"/") loc <- cmdscale(dist(dat)) #plot with no overlap textplot(loc[,1],loc[,2],rownames(loc)) #scale by urban population size textplot(loc[,1],loc[,2],rownames(loc),cex=usarrests$urbanpop/max(usarrests$urbanpop)) #x limits sets x bounds of plot, and forces all words to be in bounds textplot(loc[,1],loc[,2],rownames(loc),xlim=c(-3.5,3.5)) #compare to text (many states unreadable) plot(loc[,1],loc[,2],type="n") text(loc[,1],loc[,2],rownames(loc)) wordcloud Plot a word cloud Plot a word cloud wordcloud(words,freq,scale=c(4,.5),min.freq=3,max.words=inf, random.order=true, random.color=false, rot.per=.1, colors="black",ordered.colors=false,use.r.layout=false, fixed.asp=true,...) words freq scale min.freq max.words random.order the words their frequencies A vector of length 2 indicating the range of the size of the words. words with frequency below min.freq will not be plotted Maximum number of words to be plotted. least frequent terms dropped plot words in random order. If false, they will be plotted in decreasing frequency

6 wordcloud Details random.color rot.per colors choose colors randomly from the colors. If false, the color is chosen based on the frequency proportion words with 90 degree rotation color words from least to most frequent ordered.colors if true, then colors are assigned to words in order use.r.layout fixed.asp if false, then c++ code is used for collision detection, otherwise R is used if TRUE, the aspect ratio is fixed. Variable aspect ratio only supported if rot.per==0... Additional parameters to be passed to text (and strheight,strwidth). If freq is missing, then words can either be a character vector, or Corpus. If it is a vector and freq is missing, standard stop words will be removed prior to plotting. nothing See Also text wordcloud(c(letters, LETTERS, 0:9), seq(1, 1000, len = 62)) if(require(tm)){ ##### from character ##### wordcloud( "Many years ago the great British explorer George Mallory, who was to die on Mount Everest, was asked why did he want to climb it. He said, \"Because it is there.\" Well, space is there, and we re going to climb it, and the moon and the planets are there, and new hopes for knowledge and peace are there. And, therefore, as we set sail we ask God s blessing on the most hazardous and dangerous and greatest adventure on which man has ever embarked.",,random.order=false) ## Not run: data(crude) crude <- tm_map(crude, removepunctuation) crude <- tm_map(crude, function(x)removewords(x,stopwords())) ##### from corpus ##### wordcloud(crude)

wordlayout 7 ##### from frequency counts ##### tdm <- TermDocumentMatrix(crude) m <- as.matrix(tdm) v <- sort(rowsums(m),decreasing=true) d <- data.frame(word = names(v),freq=v) wordcloud(d$word,d$freq) #A bigger cloud with a minimum frequency of 2 wordcloud(d$word,d$freq,c(8,.3),2) #Now lets try it with frequent words plotted first wordcloud(d$word,d$freq,c(8,.5),2,,false,.1) ##### with colors ##### if(require(rcolorbrewer)){ pal <- brewer.pal(9,"bugn") pal <- pal[-(1:4)] wordcloud(d$word,d$freq,c(8,.3),2,,false,,.15,pal) pal <- brewer.pal(6,"dark2") pal <- pal[-(1)] wordcloud(d$word,d$freq,c(8,.3),2,,true,,.15,pal) #random colors wordcloud(d$word,d$freq,c(8,.3),2,,true,true,.15,pal) } ##### with font ##### wordcloud(d$word,d$freq,c(8,.3),2,,true,,.15,pal, vfont=c("gothic english","plain")) wordcloud(d$word,d$freq,c(8,.3),2,100,true,,.15,pal,vfont=c("script","plain")) wordcloud(d$word,d$freq,c(8,.3),2,100,true,,.15,pal,vfont=c("serif","plain")) ## End(Not run) } wordlayout Word Layout finds text plot layout coordinates such that no text overlaps

8 wordlayout wordlayout(x, y, words, cex=1, rotate90 = FALSE, xlim=c(-inf,inf), ylim=c(-inf,inf), tstep=.1, rstep=.1,...) x y words cex rotate90 xlim ylim tstep rstep x coordinates y coordinates the text to plot font size a value or vector indicating whether words should be rotated 90 degrees x axis bounds for text y axis bounds for text the angle (theta) step size as the algorithm spirals out the radius step size (in standard deviations) as the algorithm spirals out... Additional parameters to be passed to strwidth and strheight. A matrix with columns representing x, y width and height. #calculate standardized MDS coordinates dat <- sweep(usarrests,2,colmeans(usarrests)) dat <- sweep(dat,2,sqrt(diag(var(dat))),"/") loc <- cmdscale(dist(dat)) x <- loc[,1] y <- loc[,2] w <- rownames(loc) #plot with no overlap and all words visible plot(x,y,type="n",xlim=c(-3,3),ylim=c(-3,2)) lay <- wordlayout(x,y,w,xlim=c(-3,3),ylim=c(-3,2)) text(lay[,1]+.5*lay[,3],lay[,2]+.5*lay[,4],w) #notice north dakota is only partially visible textplot(x,y,w)

Index commonality.cloud, 2 comparison.cloud, 3 SOTU, 4 text, 6 textplot, 4 wordcloud, 5 wordlayout, 7 9