Exercises for Data Visualisation for Analysis in Scholarly Research



Similar documents
Learning Services IT Guide. Access 2013

MicroStrategy Desktop

Google Drive: Access and organize your files

Human Resources Website Drupal User Guide

Choose the Reports Tab and then the Export/Ad hoc file button. Export Ad-hoc to Excel - 1

Introduction to Microsoft Access 2013

Creating a Newsletter with Microsoft Word

MicroStrategy Quick Guide: Running the PI Report ITU Data Mart Support Group Go to reporting.gmu.edu and click on Login to Microstrategy

Piktochart 101 Create your first infographic in 15 minutes

Reduced Quality Sample

Frog VLE Update. Latest Features and Enhancements. September 2014

Chapter 4 Displaying and Describing Categorical Data

Finance Reporting. Millennium FAST. User Guide Version 4.0. Memorial University of Newfoundland. September 2013

NDSU Technology Learning & Media Center. Introduction to Google Sites

Creating a Gradebook in Excel

1. Contents What is AGITO Translate? Supported formats Translation memory & termbase Access, login and support...

Logging in to Google Chrome

Excel Intermediate Session 2: Charts and Tables

BID2WIN Workshop. Advanced Report Writing

1. Right click using your mouse on the desktop and select New Shortcut.

Create Mailing Labels from an Electronic File

So you want to create an a Friend action

Introduction to Microsoft Access XP

Appointment Scheduler

EndNote Introduction for Referencing and Citing. EndNote Online Practical exercises

emarketing Manual- Creating a New

Rochester Institute of Technology. Finance and Administration. Drupal 7 Training Documentation

Chapter 14: Links. Types of Links. 1 Chapter 14: Links

Microsoft Access 2010 Overview of Basics

LEARNING RESOURCE CENTRE. Guide to Microsoft Office Online and One Drive

Using the NuView Employee Self-Service Module Time Attendance / Time Entry. Introduction. Logging In

Joomla! 2.5.x Training Manual

How To Avoid Spam On Mailchimp

TRIM: Web Tool. Web Address The TRIM web tool can be accessed at:

BI 4.1 Quick Start Java User s Guide

Test Generator. Creating Tests

WebSphere Business Monitor V6.2 Business space dashboards

Using Excel as a Management Reporting Tool with your Minotaur Data. Exercise 1 Customer Item Profitability Reporting Tool for Management

How to Excel with CUFS Part 2 Excel 2010

Introduction To Microsoft Office PowerPoint Bob Booth July 2008 AP-PPT5

Microsoft Excel Basics

MICROSOFT ACCESS STEP BY STEP GUIDE

Google Sites: Site Creation and Home Page Design

Introduction to Microsoft Access 2010

Creating a website using Voice: Beginners Course. Participant course notes

Jadu Content Management Systems Web Publishing Guide. Table of Contents (click on chapter titles to navigate to a specific chapter)

Advanced Presentation Features and Animation

ADA Applicant Business Process Guide

History Explorer. View and Export Logged Print Job Information WHITE PAPER

Microsoft Access 2007

Excel Using Pivot Tables

WebSphere Business Monitor V7.0 Business space dashboards

Magenta CMS Training: RAF Station/ RAF Sport websites

Quick and Easy Web Maps with Google Fusion Tables. SCO Technical Paper

Learning Management System User Guide. version

6. If you want to enter specific formats, click the Format Tab to auto format the information that is entered into the field.

Microsoft Access 2007 Introduction

OUTLOOK WEB APP 2013 ESSENTIAL SKILLS

Word 2007: Basics Learning Guide

STC: Descriptive Statistics in Excel Running Descriptive and Correlational Analysis in Excel 2013

Google Docs A Tutorial

EXCEL PIVOT TABLE David Geffen School of Medicine, UCLA Dean s Office Oct 2002

A) What Web Browser do I need? B) Why I cannot view the most updated content? C) What can we find on the school website? Index Page Layout:

Cascade Content Management System Training

Software Application Tutorial

Microsoft Access 2010 handout

EXPORTING THE SCHOOL ROSTER INTO AN EXCEL SPREADSHEET

SellerDeck 2013 Reviewer's Guide

In this example, Mrs. Smith is looking to create graphs that represent the ethnic diversity of the 24 students in her 4 th grade class.

Concession FTP User Guide May 2011 Version 1.2

MS Word 2007 practical notes

Chapter 1 Kingsoft Office for Android: A Close Look. Compatible with Microsoft Office: With Kingsoft Office for Android, users are allowed to create,

Setting Up Person Accounts

INTRODUCTION PARKING REPORTING: OVERVIEW PARKING REPORTING: STATISTICS MY DOMAINS: PARKING OPTIMIZATIONS...

Office 365 Training. Contents

MailChimp Instruction Manual

Intellect Platform - Tables and Templates Basic Document Management System - A101

User Guide. User Guide Title Page Page i

OnBase Client for Department Admins GEISEL FINANCE CENTER NOVEMBER 2015

Windows Movie Maker 2012

CHROMEBOOK TIPS & TRICKS

REUTERS/TIM WIMBORNE SCHOLARONE MANUSCRIPTS COGNOS REPORTS

Single Property Website Quickstart Guide

Office 2013 files: Storing, accessing and sharing on the network and the cloud

Switching to Gmail from Microsoft Outlook

Using Excel for descriptive statistics

Google Docs Basics Website:

Support/ User guide HMA Content Management System

Bosco Internet Setup Guide For Zest Apartments. Windows XP, Vista and Mac OS X

Business Objects Version 5 : Introduction

Downloading & Using Data from the STORET Warehouse: An Exercise

Mail Chimp Basics. Glossary

Workspaces Creating and Opening Pages Creating Ticker Lists Looking up Ticker Symbols Ticker Sync Groups Market Summary Snap Quote Key Statistics

Business Objects. Report Writing - CMS Net and CCS Claims

Search help. More on Office.com: images templates

VDF Query User Manual

UCL INFORMATION SERVICES DIVISION INFORMATION SYSTEMS. Silva. Introduction to Silva. Document No. IS-130

MicroStrategy Quick Guide: Reconciliation Expense Report. Contents

Create a survey using Google Forms

Chapter 12 Creating Web Pages

Transcription:

Exercises for Data Visualisation for Analysis in Scholarly Research Exercise 1: exploring network visualisations 1. In your browser, go to http://bit.ly/11qqxuj 2. Scroll down the page to the network graph. 3. Take a few minutes to explore the visualisation: try holding the cursor over items, clicking, dragging, etc. 4. You can see the same data represented in other graphs below. 5. Discuss with your neighbour: does interacting with the network graph give you more or less information than the other visualisations? Does it open up new questions? Time: approx 5 minutes Exercise 2: comparing N-gram tools NB: in both tools, copyright affects the availability of 20th century books. Transcription errors may also affect results, particularly for older books (e.g. the 'long s' vs f) 1. Think of two words or phrases you d like to compare over time (e.g. Burma, Burmah). 2. Open two browser windows 3. In one, go to http://books.google.com/ngrams 4. In the other, go to http://bookworm.culturomics.org 5. Enter your words or phrases in each and compare the results 6. Discuss with your neighbour: what differences did you find, and what might have caused them? Google Ngram tips: http://books.google.com/ngrams/info Background information on Bookworm: http://bookworm.culturomics.org/ol.html Bookworm tips: click the 'cog' icon next to the 'i' to change the time period or click the underlined words next to the search term to change which books are searched (e.g. subject, language, country, gender of author). [Image on slide] Time: approx 5 minutes 1

Exercise 3: scholarly visualisations In pairs, you will be asked to pair with your neighbour and explore and discuss one of the following visualisations: University of Richmond, Visualizing Emancipation http://www.americanpast.org/emancipation/ Further information: http://dsl.richmond.edu/emancipation/about/ http://dirt.terrypbrock.com/2012/04/visualizing- emancipation- examining- its- process- through- digital- tools/ Stanford "Mapping the Republic of Letters" http://www.stanford.edu/group/toolingup/rplviz/rplviz.swf Further information: http://openglam.org/2012/03/21/mapping- the- republic- of- letters/, http://danbri.org/words/2010/11/22/603, http://republicofletters.stanford.edu/tools/ GAPVis http://gap.alexandriaarchive.org/gapvis/index.html#index Further information: http://googleancientplaces.wordpress.com/ Digital Harlem :: Everyday Life 1915-1930 http://www.acl.arts.usyd.edu.au/harlem/ Further information: http://digitalharlemblog.wordpress.com/, http://writinghistory.trincoll.edu/evidence/robertson- 2012- spring/ Instructions 1. In your browser, go to suggested site 2. Take a few minutes to explore the visualisation 3. Discuss with your neighbour: what do you think is being presented here? What stories or trends can you start to see? Does it work better at one scale over another? Do you find it more effective at aggregate or detail level? Does it present an argument or provide a space to develop and explore one? If it was designed to present an argument or investigate a particular question, what do you think that was? What have you learned from visualisation that you might not have learned from looking at the data or reading text about it? 4. Report back to the group: summarise the site's purpose, visualisation formats and data types in a sentence, then share the most interesting parts of your discussion Time: approx 10 minutes 2

Exercise 4: cleaning data in Refine In this exercise, we will be cleaning an existing dataset to prepare it for visualisation as a chart and map in the next exercise. We want to investigate different disciplines within this data later, so we will review the data, remove rows that are missing certain values (in real life we would try to do some research to fill in the gaps before resorting to this!), and make the remaining values more consistent. Don't worry if you make a mistake, you can click 'Undo/Redo' and go back as many steps in your history as you need. 1. Start the Refine application. (If there isn't a shortcut for it on the computer desktop, go to the Start menu and search for Refine). Double- click the.exe file (on Windows) to start the application. It will open a page in your browser. Tip: if it opens in Internet Explorer, copy the page address from the URL/location bar and paste it into Chrome or Firefox. 2. Create a new project. [See slide 'Exercise 4: data cleaning in Refine'] a. On the left- hand side of the browser page, click on 'Create project'. Under 'Get data from' select 'This Computer'. Browse to the file Inspiring Women through History_April2013.xlsx, load and select 'Next' b. If the preview rows look ok, click 'Create Project'. c. Spend some time looking over the data. You will notice that not every field is been filled in for each row, and that the level of detail varies within a column (e.g. locations). 3. Start 'cleaning' the data: remove rows where 'Discipline' has been left blank. a. Click the down arrow in the Discipline column; select Facet; Text Facet [see slide] b. Refine should open a new block on the left- hand side of the screen, listing all the different disciplines in the dataset. c. Scroll to the bottom of the box and click on '(blank)'. d. This should change the rows on the right- hand side so only rows without a discipline are listed. e. To remove these rows from the dataset, click the down arrow next to 'All' (start of the row); select Edit rows; Remove all matching rows. f. This should remove 23 rows. 4. Let Refine find close matches in column values. a. Click 'reset' in the blue bar of the left- hand Discipline box (this resets the previously selected rows on the right- hand side). b. Click 'Cluster' on the left- hand Discipline box. c. A new block should pop- up, showing 'Painter' and 'Painter.', which Refine has detected as being close matches. d. Tick the 'merge' box and then click 'Merge selected and close'. e. Congratulations! You have merged some messy records! 3

5. Manually review and simplify the number of disciplines listed. In this exercise we will manually reduce the number of Disciplines in the dataset by merging them so that it's easier for them to be displayed in a simple visualisation. a. Click on 'count' in the line 'Sort by name count'. b. The most common disciplines should be at the top. c. Review the disciplines. There is one row each for Biologist, Botanist, Chemist, Natural Historian and Inventor. We will merge each of them with 'Scientist'. d. Hover your cursor over 'Inventor' so that 'edit' appears as an option. Click 'edit'. e. In the box that appears, type 'Scientist' and click 'Apply'. f. Repeat for the rows for Biologist, Botanist, Chemist and Natural Historian. g. There should now be 13 rows for Scientist. h. Edit any other Disciplines as you wish. You can review the values by clicking on the Discipline name and scrolling through the rows to work out which values to simplify. For example, Poet might become Writer, Surgeon and Doctor might both become Medicine. 6. When you're done tidying the values for Discipline, click the 'x' next to Discipline in the left- hand side box to 'remove this facet'. 7. Repeat the steps above to remove rows where 'Birth_place' is missing. 8. If you wish, repeat step 4 (Let Refine find close values) on the Birth place field to tidy up uncertain values like 'London?' 9. Optional: remove numeric rows with missing values. Because we need a year to build a timeline, we'll remove rows without a currently known birth year. a. Scroll sideways through the columns until 'Birth_year' is on the screen. b. Click the down arrow next to 'Birth_year'; select Facet; Numeric facet. c. A 'Birth_year' block should appear on the left- hand side d. Untick the 'numeric' box. e. Only rows without a number in 'Birth_year' should be shown now. f. Scroll back to the left until 'All' is showing. To remove these rows from the dataset, click the down arrow next to 'All' (start of the row); select Edit rows; Remove all matching rows. g. Close the 'Birth_year' box on the left- hand side by clicking the 'x'. 10. To export your data (ready for use in another application), click on 'Export' on the right- hand side, select an option like Excel or comma- separated value and save the file. Congratulations, you have cleaned data with Refine! We now have data we can use to make a nice timeline. You can use the same principles to clean or reduce the complexity of huge datasets. You can find out more at http://freeyourmetadata.org/cleanup/ or https://github.com/openrefine/openrefine/wiki/documentation- For- Users Time: approx 20 minutes 4

Exercise 5: trying entity recognition 1. In your browser, go to http://nlp.stanford.edu:8080/corenlp/process 2. Find a short paragraph of text (e.g. from a news site or digitised text) to paste into the box 3. How many of the things (concepts, people, places, events, references to time or dates, etc) you recognise did it pick up? Is any of the other information presented useful? Time: approx 5 minutes Exercise 6: create a pie chart using Google Fusion Tables 1. Go to https://drive.google.com/ and log into Google (if you aren't already) 2. Go to http://bit.ly/xw0znj (or https://www.google.com/fusiontables/datasource?dsrcid=implicit&redirectpat h=data&usp=apps_start&hl=en) to access Fusion from your account. 3. You should see a screen 'Import new table' and 'From this computer'. 4. Click 'Choose file', find the file you exported from Refine on your computer and upload it. 5. Leave the default options and click 'Next' until the screen asks you to choose a table name, 'attribute data to', enter a description, etc. Fill in the values as appropriate and click 'Finish'. 6. Check that you are in 'Classic' rather than 'New' mode in Google Drive. If the menu items along the page say 'File', 'View', 'Edit', 'Visualize', 'Merge', 'Labs', you are in Classic mode. If you are in New mode, click on the 'Help' menu and select 'Back to Classic look'. 7. From the Visualize menu, choose 'Pie'. [See slide 'Pre- aggregation view of a pie chart'] 8. Click 'options' then 'Aggregate'. 9. In the 'Aggregated by' section, select 'Discipline', and click 'Apply'. [Slide: 'Change options, aggregate by Discipline'] 10. Set the Entity to Discipline and the value to Count. 11. You should have a pie chart of your data! Time: approx 5 minutes 5

Exercise 7: geocoding data and creating a map using Google Fusion Tables Google Fusion Tables can geocode data directly from the table, but it sometimes needs some help. Fusion will have recognised some columns as containing location data, but it will not know much about those locations. The best way to see how it copes with a dataset is to geocode it and look over the resulting map to check that records have ended up in the right place. Before you begin, remove any aggregated fields or filters set in the last exercise: click 'options', then 'Clear filter' in the Filter tab view and 'Clear aggregation' on the 'Aggregate' tab view. Change into 'New' from 'Classic' mode by clicking 'Switch to new look' on the top right- hand corner of the browser window. If you can't see that option, let me know! Geocode your data 1. From the Visualize menu, choose 'Table' 2. From the File menu, choose 'Geocode' 3. Change the value for 'Location column' to 'Combined_birthplace_location' and click 'Begin geocoding'. (The Combined_birthplace_location field hopefully contains enough information to stop London, England, being confused with London, Ontario.) 4. Wait a bit... 5. Congratulations, you have geocoded some data! Create a map from your geocoded data 6. Click the '+' at the end of the row that starts with the File option 7. Click the down arrow next to the new maps name and change the value for 'Select location' to the field Combined_birthplace_location. 8. Congratulations! You've created a map! If you have extra time or curiosity, try changing the options under 'Change info window layout...' and 'Change map styles...'. Time: approx 10 minutes 6

Exercise 8: Choose your own adventure Options: explore and analyse more visualisations try making different visualisations with the current dataset try creating visualisations with your own data Exploring and analysing more visualisations There are links to visualisation blogs and other specialist sites on the Resources post at http://bit.ly/ujwgez (i.e. http://www.miaridge.com/resources- for- data- visualisation- for- analysis- in- scholarly- research/) For each visualisation, you might like to consider: what sources have they used? Do they explain how they've prepared them? What effect have their choices of visualisation formats and tools had? What data or queries are prioritised, and which are more difficult or impossible? If you have a particular type of data, process, format or audience in mind, ask for suggestions for sites to look at! Making more visualisations ManyEyes is a useful tool for learning, but as it uses Java it can be tricky in classroom situations. You can create visualisations from data other people have uploaded without signing up at ManyEyes: http://www- 958.ibm.com/software/analytics/manyeyes/datasets You can also try visualising the data from the British Library Pin- a- tale project, available in Google Docs at http://bit.ly/wt1ai5 Try a visualisation and evaluate the results. Is more cleaning or transformation needed? You may need to iterate with different versions of your data after cleaning or enhancing it. If you have your own dataset, review the 'planning' slides. What do you want to learn or express about your data? The ManyEyes site provides some guidance on the best visualisation types for different types of data: http://www- 958.ibm.com/software/analytics/manyeyes/page/Visualization_Options.html which can also be useful for planning visualisations in Google Fusion. More data cleaning and linking (reconciling) to other data You may want to go back to Refine and do more cleaning, or try reconciling the data to pull in more related information. 7

For example, you could find rows with more than one Spouse or Place lived listed, and separate the values into different columns. You could experiment with different levels of location values - does it matter whether England or United Kingdom is listed? You can find out more about cleaning inconsistent values at https://github.com/openrefine/openrefine/wiki/clustering and https://github.com/openrefine/openrefine/wiki/clustering- In- Depth There are some instructions for reconciling data at http://freeyourmetadata.org/reconciliation/ 8