Research and Practice Associate Editor s Column Dave L. Edyburn, University of Wisconsin Milwaukee Word Clouds: Valuable Tools When You Can t See the Ideas Through the Words Columnist: Dave L. Edyburn You ve probably heard the statement, He can t see the forest for the trees. It describes a situation where one is so close to the problem that it is not possible to discern the big picture. Consider the struggling reader who is able to demonstrate some level of oral fluency but has little comprehension of what he or she has just read. At another level, consider the graduate student who has collected a wealth of textbased data but now has trouble extracting themes from the large data files. The individuals in these situations are overwhelmed with data. As a result, they are drowning in a sea of words because they don t have a way of understanding the key ideas. Many experts have noted that information overload is a formidable problem for citizens living in the Information Age (Willinksy, 1999; Wurman, 2001). Generally, society has made slow progress in addressing the need for new tools to manage the glut of information that grows exponentially each year. The purpose of this article is to highlight tools and applications of a text analysis technology known as word clouds. We will conclude by exploring possible applications of these tools for practitioners and researchers. Text Analysis Technologies A core design principle that emerged in conjunction with Web 2.0 applications involves allowing users to add meta-tags to documents, images, and files. Involving end users in this task, rather than relying on a single point of authoritative review (i.e., controlled vocabulary), means that tagging can be an ongoing activity. In addition, by allowing user communities to do the tagging, the workload of the information provider is significantly reduced. Perhaps the most important contribution of meta-tags is that they provide new ways to use search engines for locating and retrieving relevant information. For example, consider how Flickr (http://flickr.com) and Google (http://google.com) can now be used to search for images by content (e.g., red wagon), date, or location (i.e., GPS coordinates) because of the way meta-tags can be assigned to images. Keyword searching has always involved text. However, this system fails when the desired word is not used explicitly in the text. Meta-tags allow users to add meaning to a text document by describing the content in ways that go beyond the actual words in that text. Word clouds involve text analysis algorithms. Three general approaches are used: analyzing meta-tags that have been assigned to a blog posting, analyzing text that has been extracted from a Web site via an RSS feed, or copying and pasting a corpus of text to be analyzed. The source information is then subjected to word frequency analysis, and the most frequently found words are highlighted using different font sizes and colors. The result is a word picture, or word cloud, of the source information (see Figure 1). The larger and louder (i.e., bolder) a term is, the more important it is presumed to be. A number of tools have been created that offer serious 68 JSET 2010 Volume 25, Number 2
Figure 1 Word cloud created using text from the first three paragraphs of this article using Wordle (http://wordle.net). scientific analysis; there also are tools that make it easy and interesting to turn a text document into art. Word Cloud Tools A number of free online tools are available for creating word clouds. The information that follows briefly describes eight common text analysis tools and key features of each product that may influence decisions about which tool to use. Wordle (http://wordle.net) allows users to paste text or create a wordle from a URL for any blog, blog feed, or Atom/RSS feed. The basic interface is easy to use (see Figure 2). The advanced features interface allows for customization of the size, color, and word frequency inclusion/exclusion rules. The output can be printed, posted to a public gallery, or captured using screen capture software. Images created by the Wordle.net Web application are licensed under a Creative Commons Attribution 3.0 United States License. TagCrowd (http://tagcrowd.com) allows users to paste text (up to 3 MB), create a word cloud from a URL, or upload a file (up to 6 MB). The results can be saved as html code to embed in a Web page, printed, or saved as a PDF file. TagCloud Generator (http://www.tag-cloud.de) is a tool used for creating word clouds using text found on a Web page. Users can choose two forms of output. One option is to download a Flash file that features the word cloud as a movie. The other option is to download html code to embed the word cloud in a Web page. JSET 2010 Volume 25, Number 2 69
Figure 2 Creation interface of Wordle (http://wordle.net). TextTagCloud (http://www.artviper.net/texttagcloud/) allows users to paste text and then control various output variables. The output is html code that is used to embed the word cloud in a Web page. A major drawback to this tool is the lack of a preview option for viewing your word cloud. Tagxedo (http://www.tagxedo.com/) is powerful and easy to use. The interface offers an extensive collection of customization tools (e.g., font, color, theme, text direction, shape, and history) similar to a design studio. It requires installation of Microsoft Silverlight. Output can be downloaded and saved as a JPEG or PNG file. Tagul (http://tagul.com/) seeks to be more than a word toy. The core technology is an application programming interface (API) that is licensed to Web site developers. Once installed on a Web page, it permits the user to access hyperlinks embedded in words found in the cloud (e.g., link to a definition or resource). Tag Cloud Generator (http://www.tag-cloud-generator. com/?l=2) is a design studio that is used to create tags of a Web page. Users save the output as html code to embed in a Web page. Many Eyes, a research and development tool created by IBM (http://manyeyes.alphaworks.ibm.com/manyeyes/), requires users to establish a free account. Data can be uploaded and results can be viewed in several ways: word tree, word cloud, phrase net, and tag cloud. Readers interested in learning more about the history of word clouds and the extensive array of free tools that can 70 JSET 2010 Volume 25, Number 2
be used to create word clouds or reviewing creative uses for these tools are encouraged to explore several noteworthy resources (Friedman, 2007; Lamantia, 2006; Tag Cloud, 2010). Applications for Practice Students may find word clouds useful for academic activities, visual arts, and a variety of personal and social activities (e.g., art for their locker). Creating word clouds as a prereading activity may be useful for many students to ensure that they have the proper vocabulary and conceptual understanding to read a text passage. Creating word clouds also can be a useful reading comprehension activity to support students in the process of identifying themes and patterns in single or multiple texts. Students also may find word clouds useful for obtaining feedback on their own writing. Clearly, word clouds have significant potential for helping students learn about visual design and the aesthetics associated with meaningful words. Teachers may find word clouds useful for analyzing readings to identify key vocabulary words that may need preteaching. The various features of the many tools will challenge teachers to think about the best tool for the task (e.g., create a word cloud from information in a text file that can be saved on a Web page). Finally, teachers will want to think creatively about how these word cloud tools can be used to engage students in meaningful understanding of text and written expression (see Figure 3). Applications for Research Little is presently known about the use of word clouds in the K 12 or teacher education curriculum. As a result, Figure 3 A thematic Wordle Love, Love, LOVE by Anonymous. Created using Wordle (http://wordle.net). JSET 2010 Volume 25, Number 2 71
survey research could help the profession understand the current status of technology integration for these types of tools. Researchers engaged in text-based data sets may find word cloud tools useful for quick analysis of simple word frequency patterns within one or more texts. Consideration of the source file (e.g., RSS feed, Web page, copy and paste text) as well as the intended use of the output file (e.g., simply view on screen, save as html, Flash, PDF, etc.) will impact the selection of the tool. Most likely, researchers will find the need for more specific tools for qualitative data analysis. References Friedman, V. (2007, November 7). Examples and best practices. Retrieved May 27, 2010, from http://www.smashingmagazine. com/2007/11/07/tag-clouds-gallery-examples-and-good-practices/ Lamantia, J. (2006, February 22). Understanding tag clouds. Retrieved May 27, 2010, from http://www.joelamantia.com/ideas/ tag-clouds-evolve-understanding-tag-clouds Tag cloud. (2010, May 20). Retrieved May 27, 2010, from http:// en.wikipedia.org/wiki/tag_cloud Willinksy, J. (1999). Technologies of knowing. Boston: Beacon Press. Wurman, R. S. (2001). Information anxiety 2. Indianapolis, IN: Que. Author Notes If you have a research and practice topic that you would like to see covered or if you are interested in being a guest writer, please send your comments to: Dave L. Edyburn, Department of Exceptional Education, University of Wisconsin Milwaukee, P.O. Box 413, Milwaukee, WI 53201-0413. Email to edyburn@uwm.edu 72 JSET 2010 Volume 25, Number 2