1 Team Members: Christopher Copper Philip Eittreim Jeremiah Jekich Andrew Reisdorph Client: Brian Krzys June 17, 2014
2 Introduction Newmont Mining is a resource extraction company with a research and development office located in Englewood, Colorado. Founded in 1921, Newmont has acquired a large quantity of data over the years and has an interest in analyzing such data to inform business decisions. An increasingly effective means of enabling this analysis is data visualization. The main goal of this summer field session project is to develop an instant visualization tool that gathers and analyzes user identified data to provide the user with a best guess visualization of that data that is both informative and interactive. Once provided with the initial visualization, the user should then be able to select alternative visualizations to explore various methods of displaying the data. Requirements Listed below are the functional and non functional requirements of the Instant Visualization tool. Functional Create a website with a textbox requesting a URL. Scrape the URL provided for tabular data. For each set of tabular data found analyze and make a best guess for the best way to display this data. Produce an interactive graph based on the best guess of data provided. Display additional information when the user mouses over a section of the graph, such as the data associated with the selected data point. Allow the user to select alternate visualizations if the best guess display is not optimal. Redisplay the data scraped alongside the visualization. Allow the user to customize aesthetic properties of the visualization. Provide a means for the user to download a static version of the visualization they create. Non-Functional Graphs are produced using a client side library. GitHub is used for source control.
3 Documentation of the software is done primarily with the use of JSDoc. The system consists of four separate modules which perform the following: Parsing and scraping the data. Formatting the data and making graphing decisions. Displaying the graphs. Facilitating of communication between the other modules. Tabular data is parsed into JSON for further processing (visualization production). Testing and deployment is performed on an Amazon EC2 server, provided by Newmont. Selecting alternate visualizations is accomplished client side in real time. Comprehensive integration and unit testing has been performed. System Architecture Figure 1: Instant Visualization Project Architecture
4 Figure 1 shows the overall system architecture of the Instant Visualization tool. The system consists of four primary modules: Controller, Parser, Analyzer, and Visualizer. Additionally, the web page component exists in two different states: Entry and Visualization. Entry Page Figure 2: Entry Web Page The Entry web page for the Instant Visualization tool, which can be viewed in Figure 2, is the user s first view of the tool. It consists of the following: The title/logo of the tool. Instructions on how to submit data to the tool. A textbox into which the user can enter a URL.
6 Analyzer The Analyzer module is responsible for taking data from the Parser and generating instructions for the Visualizer. These instructions specify what columns to use for each applicable visualization type for each dataset. To make decisions, the Analyzer relies heavily on the analysis of column data types. To do this the analyzer uses a sophisticated type handling system. The type handler is responsible for tasks like correctly typing all of the data. It can handle a variety of types, such as integers, floating point values, and strings. In addition to checking for types, cells with no data are given default values and completely empty rows are removed from the dataset. Once a table has been fully typed, it is checked for valid dimensions and values. Tables that only contain a single row or column are discarded. Similarly, tables that only contain string data are also discarded because string data cannot be used as a dependent variable. The other primary responsibility of the Analyzer is data analysis. The Analyzer looks at the newly formatted data and attempts to determine which subsets of the data are useful and capable of being visualized. Many web pages contain more than one set of data. It is the Analyzer s responsibility to identify these sets and make judgements about which ones can be adequately visualized. The uniqueness of each column is taken into account when making these decisions. Uniqueness is determined by counting unique entries in the given column and calculating them as percent of the total entries. Some visualization types work best with the independent variable as the least unique variable, such as the bar chart. Other visualization types make use of the most unique variable as the dependent variable, such as the line graph. Once the Analyzer has identified relevant data sets, it determines the various ways the data can be visualized. Of the possibilities generated for each dataset, the Analyzer determines which option is most likely to be considered by humans to be the best visualization method. This is called the best guess visualization. The Analyzer packages all relevant table and visualization data and sends it to the Visualizer module via the Controller. The data is passed to the Visualizer according to the API in Figures 5 and 6.
8 Visualization Page Figure 3: Visualization Web Page The Visualization web page, which can be viewed in Figure 3, displays to the user the visualizations generated by the Visualizer. It consists of the following: The title/logo of the tool. A selection box that allows the user to select which table of data from the source URL to visualize. A data table that redisplays the data collected from the selected table. An icon that allows the user to toggle the data table on and off. A set of icons that allow the user to switch between different visualization types. A pane containing options to customize aesthetic properties of the visualization, including colors, size, and margin dimensions. An icon that allows the user to toggle the customization pane on and off. The best guess visualization for the currently selected table. An icon that, when selected, will download an image of the current visualization. The Visualization web page is the final state of the tool. Here, the user can interact with the visualization and the data table, and the system pipeline has reached its end.
9 Technical Design Two of the Instant Visualization tool s more unique components are the server side web page parsing and the visualization rendering. Web Page Parsing Figure 4: Server Side Architecture The first step in creating visualizations for a given URL is to scrape and parse that URL for data. The Instant Visualization tool scrapes URLs for tabular data, and returns this data to the client for visualization. First, the server receives a REST (representational state transfer) API call from a client, which contains the URL to parse. REST is a standard for creating APIs using a web server and HTTP GET, POST, PUT and DELETE to perform the API functions. The API endpoint is a PHP script that will validate the request. If it is valid it will then launch a PhantomJS process to do the parsing/scraping of the URL.
11 Visualization Rendering Figure 5: Visualization Object Definition Figure 6: Data Object Definition The final stage in the Instant Visualization tool pipeline is the Visualizer module. This component is responsible for drawing the visualization to the web page based on the instructions (Figure 5) and data (Figure 6) it received from the Controller, which were originally produced by the Analyzer. The Visualizer needed to be able to produce a set of six different graph types: bar chart, line graph, scatter plot, bubble chart, pie chart, and a basic treemap. It also needed to be able to present either one or two data sets, so as to allow the user to make comparisons between various parts of the data. Additionally, given the wide range of potential inputs the Instant Visualization tool could receive, the Visualizer needed to be able to draw visualizations regardless of the size of the data set. A final requirement of the Visualizer
16 Results The Newmont Instant Visualization tool has been tested on Firefox (version 30), Chrome (version 35), QupZilla (using WebKit 534), and Internet Explorer 11. No differences were found in the tool across the various browsers. All features worked as expected regardless of the client in which it was running. The tool s aim is to handle any valid and accessible URL, and as such, the potential number of URLs that could be provided to the system is extremely large. Additionally, there are a variety of ways in which an HTML table can be formed. The tool has been given a wide range of URLs with varying table formats as test cases, and it proved to work well within the project requirements. The goals of the project were successfully met. The tool consists of a front end web page allowing the user to enter a URL into a textbox. The tool then verifies the provided URL is accessible and, if it is, scrapes the web page located at the URL for tabular data. If useable data is found and scraped, each set of tabular data is analyzed and estimations are made indicating which portions of each data set are most useful for visualization. Based on the analysis, an interactive graph visualizing the data is produced, and additional information is displayed based on the user interaction. Finally, the user is able to select alternate visualizations of the data if they are not satisfied with initial graph. The product is hosted on a machine that is controlled by the client. Throughout development, the core aspects of the design remained intact. However, items such as the user interface and the level of control the user has over the data changed over time. Such additions are listed below. Alongside the visualization, the user is presented with a data table displaying the information scraped from the source URL. The user is able to interact with this table, altering the table title, column titles, and even individual cell values. Additionally, the user can delete entire rows of the table. These changes are reflected in the visualization, which updates as the table is altered. The user is also able to change the color pallette of the visualization by choosing a color scheme from a selection of presets displayed on the web page.
17 The user interface evolved over the course of development. At first, the interface simply presented data from a single table. It was then refactored to display data from multiple tables. The interface then changed a final time to accommodate the inclusion of the data table alongside the visualization. Due to the modular nature of the project new components can easily be added such as new visualization types or new parsers for other data sources such as CSV or a database server. Given the scope of URLs and tables the tool could encounter, testing the system is an ongoing process. Other areas for future work include expanding support for older browsers such as ones that are not HTML 5 compatible and allowing the user to save and share the dataset as a whole and not just a static visualization.
Deposit Identification Utility and Visualization Tool Colorado School of Mines Field Session Summer 2014 David Alexander Jeremy Kerr Luke McPherson Introduction Newmont Mining Corporation was founded in
Datalab Seminar Introduction to D3.js Interactive Data Visualization in the Web Browser Dr. Philipp Ackermann Sample Code: http://github.engineering.zhaw.ch/visualcomputinglab/cgdemos 2016 InIT/ZHAW Visual
KWizCom Corporation Charts for SharePoint Admin Guide Copyright 2005-2015 KWizCom Corporation. All rights reserved. Company Headquarters 95 Mural Street, Suite 600 Richmond Hill, ON L4B 3G2 Canada E-mail:
Power Tools for Pivotal Tracker Pivotal Labs Dezmon Fernandez Victoria Kay Eric Dattore June 16th, 2015 Power Tools for Pivotal Tracker 1 Client Description Pivotal Labs is an agile software development
Table of contents: Access Data for Analysis Data file types Format assumptions Data from Excel Information links Add multiple data tables Create & Interpret Visualizations Table Pie Chart Cross Table Treemap
MicroStrategy Desktop Quick Start Guide MicroStrategy Desktop is designed to enable business professionals like you to explore data, simply and without needing direct support from IT. 1 Import data from
Skills for Employment Investment Project (SEIP) Standards/ Curriculum Format For Web Design Course Duration: Three Months 1 Course Structure and Requirements Course Title: Web Design Course Objectives:
Visualizing a Neo4j Graph Database with KeyLines Introduction 2! What is a graph database? 2! What is Neo4j? 2! Why visualize Neo4j? 3! Visualization Architecture 4! Benefits of the KeyLines/Neo4j architecture
QualysGuard WAS Getting Started Guide Version 3.3 March 21, 2014 Copyright 2011-2014 by Qualys, Inc. All Rights Reserved. Qualys, the Qualys logo and QualysGuard are registered trademarks of Qualys, Inc.
JavaFX Session Agenda 1 Introduction RIA, JavaFX and why JavaFX 2 JavaFX Architecture and Framework 3 Getting Started with JavaFX 4 Examples for Layout, Control, FXML etc Current day users expect web user
Pivot Charting in SharePoint Page 1 of 10 Pivot Charting in SharePoint with Nevron Chart for SharePoint The need for Pivot Charting in SharePoint... 1 Pivot Data Analysis... 2 Functional Division of Pivot
Lab 2: Visualization with d3.js SDS235: Visual Analytics 30 September 2015 Introduction & Setup In this lab, we will cover the basics of creating visualizations for the web using the d3.js library, which
RP301: Ad Hoc Business Intelligence Reporting State of Kansas As of April 28, 2010 Final TABLE OF CONTENTS Course Overview... 4 Course Objectives... 4 Agenda... 4 Lesson 1: Reviewing the Data Warehouse...
Sisense Product Highlights Introduction Sisense is a business intelligence solution that simplifies analytics for complex data by offering an end-to-end platform that lets users easily prepare and analyze
Washington DC 2013 Visualizing MongoDB Objects in Concept and Practice https://github.com/cvitter/ikanow.mongodc2013.presentation Introduction Do you have a MongoDB database full of BSON documents crying
Business Insight Report Authoring Getting Started Guide Version: 6.6 Written by: Product Documentation, R&D Date: February 2011 ImageNow and CaptureNow are registered trademarks of Perceptive Software,
SharePoint List Filter Plus Web Part Installation & User Guide Copyright 2005-2011 KWizCom Corporation. All rights reserved. Company Headquarters KWizCom 50 McIntosh Drive, Unit 109 Markham, Ontario ON
Intro to HTML5 Brian May IBM i Modernization Specialist Profound Logic Software Technical Editor iprodeveloper Overview History HTML What is HTML5? New Features Features Removed Resources HTML A look back
Oracle Application Express 3 The Essentials and More Develop Native Oracle database-centric web applications quickly and easily with Oracle APEX Arie Geller Matthew Lyon J j enterpririse PUBLISHING BIRMINGHAM
Table of contents Table of contents... 1 About HTML5 Data Bindings SEO... 2 Features in Detail... 3 The Basics: Insert HTML5 Data Bindings SEO on a Page and Test it... 7 Video: Insert HTML5 Data Bindings
COURSE SYLLABUS Advanced PHP and MySQL Industrial Training (3 MONTHS) PH: 0481 2411122, 09495112288 E-Mail: firstname.lastname@example.org www.faithinfosys.com Marette Tower Near No. 1 Pvt. Bus Stand Vazhoor Road
Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision
Table of Contents Table of Contents... 1 SharePoint Content Installed by ME... 3 Mobile Entrée Base Feature... 3 Mobile PerformancePoint Application Feature... 3 Mobile Entrée My Sites Feature... 3 Site
MicroStrategy Analytics Express User Guide Analyzing Data with MicroStrategy Analytics Express Version: 4.0 Document Number: 09770040 CONTENTS 1. Getting Started with MicroStrategy Analytics Express Introduction...
ProNav 0.1.0 Magento User Manual Overview ProNav from Brandammo Commerce works seamlessly with Magento Static blocks to deliver a fully customizable mega drop-down navigation for your online store. Browser
June, 2015 1 Embedded BI made easy DashXML makes it easy for developers to embed highly customized reports and analytics into applications. DashXML is a fast and flexible framework that exposes Yellowfin
SharePoint List Filter Plus Web Part Installation & User Guide Copyright 2005-2009 KWizCom Corporation. All rights reserved. Company Headquarters P.O. Box #38514 North York, Ontario M2K 2Y5 Canada E-mail:
Portal Connector Fields and Widgets Technical Documentation 1 Form Fields 1.1 Content 1.1.1 CRM Form Configuration The CRM Form Configuration manages all the fields on the form and defines how the fields
Paper CS07 Create interactive web graphics out of your SAS or R datasets Patrick René Warnat, HMS Analytical Software GmbH, Heidelberg, Germany ABSTRACT Several commercial software products allow the creation
DKAN Data Warehousing, Visualization, and Mapping Acknowledgements We d like to acknowledge the NuCivic team, led by Andrew Hoppin, which has done amazing work creating open source tools to make data available
Blaise On-the-Go: Using Blaise IS With Mobile Devices Alerk Amin (CentERdata, Tilburg, The Netherlands) Arnaud Wijnant (CentERdata, Tilburg, The Netherlands) 1 Introduction Blaise IS provides a mechanism
Table of Contents Table of Contents... 1 Installation... 2 Obtaining the Installer... 2 Installation Using the Installer... 2 Site Configuration... 2 Feature Activation... 2 Definition of a Mobile Application
XML Processing and Web Services Chapter 17 Textbook to be published by Pearson Ed 2015 in early Pearson 2014 Fundamentals of http://www.funwebdev.com Web Development Objectives 1 XML Overview 2 XML Processing
DocuSign Quick Start Guide Installing and Sending with DocuSign for NetSuite v2.2 This guide provides information on installing and sending documents for signature with DocuSign for NetSuite. It also includes
Monitoring Infrastructure (MIS) Software Architecture Document Version 1.1 Revision History Date Version Description Author 28-9-2004 1.0 Created Peter Fennema 8-10-2004 1.1 Processed review comments Peter
What you will do: Explore the features of Excel 2002 Create a blank workbook and a workbook from a template Format a workbook Apply formulas to a workbook Create a chart Import data to a workbook Share
SharePoint List Booster Features Contents Overview... 5 Supported Environment... 5 User Interface... 5 Disabling List Booster, Hiding List Booster Menu and Disabling Cross Page Queries for specific List
Front-End Performance Testing and Optimization Abstract Today, web user turnaround starts from more than 3 seconds of response time. This demands performance optimization on all application levels. Client
DIPLOMA IN WEBDEVELOPMENT Prerequisite skills Basic programming knowledge on C Language or Core Java is must. # Module 1 Basics and introduction to HTML Basic HTML training. Different HTML elements, tags
14 October 2010 ATL-DAQ-SLIDE-2010-397 TDAQ Analytics Dashboard A real time analytics web application Outline Messages in the ATLAS TDAQ infrastructure Importance of analysis A dashboard approach Architecture
An introduction to creating Web 2.0 applications in Rational Application Developer Version 8.0 September 2010 Copyright IBM Corporation 2010. 1 Overview Rational Application Developer, Version 8.0, contains
Shipbeat Magento Module Installation and user guide This guide explains how the Shipbeat Magento Module is installed, used and uninstalled from your Magento Community Store. If you have questions or need
Course Outline Web Technologies 130.279 IE Class Web Design Curriculum Unit 1: Foundations s The Foundation lessons will provide students with a general understanding of computers, how the internet works,
DataPA OpenAnalytics End User Training DataPA End User Training Lesson 1 Course Overview DataPA Chapter 1 Course Overview Introduction This course covers the skills required to use DataPA OpenAnalytics
Bitrix Site Manager 4.1 User Guide 2 Contents REGISTRATION AND AUTHORISATION...3 SITE SECTIONS...5 Creating a section...6 Changing the section properties...8 SITE PAGES...9 Creating a page...10 Editing
HTML, XHTML and CSS for Web Designers & Developers Course ISI-1254 - Five Days - Instructor-led - Hands on Introduction This course will give any web programmer an extra edge of effectiveness on the job.
%\ ^/ CS> v% Sr KRIS JAMSA, PhD, MBA y» A- JONES & BARTLETT LEARNING Brief Contents Acknowledgments Preface Getting Started with HTML Integrating Images Using Hyperlinks to Connect Content Presenting Lists
Industry Collaboration: Remote Monitoring of a Cloud-Based System Using Open Source Tools Ana E. Goulart, Nishanth Prabhu, Tyler Covington Electronic Systems Engineering Technology Program Texas A&M University
Web Design Technology Terms Found in web design front end Found in web development back end Browsers Uses HTTP to communicate with Web Server Browser requests a html document Web Server sends a html document
ADOBE AIR HTML Security Legal notices Legal notices For legal notices, see http://help.adobe.com/en_us/legalnotices/index.html. iii Contents The security challenges of RIAs.........................................................................................
Market Pricing Override MARKET PRICING OVERRIDE Market Pricing: Copy Override Market price overrides can be copied from one match year to another Market Price Override can be accessed from the Job Matches
INFOPATH FORMS FOR OUTLOOK, SHAREPOINT, OR THE WEB GINI COURTER, TRIAD CONSULTING Like most people, you probably fill out business forms on a regular basis, including expense reports, time cards, surveys,
D3.JS: Data-Driven Documents Roland van Dierendonck Leiden University email@example.com Sam van Tienhoven Leiden University firstname.lastname@example.org Thiago Elid Leiden University email@example.com
10CS73:Web Programming Question Bank Fundamentals of Web: 1.What is WWW? 2. What are domain names? Explain domain name conversion with diagram 3.What are the difference between web browser and web server
CMS Training Prepared for the Nature Conservancy March 2012 Session Objectives... 3 Structure and General Functionality... 4 Section Objectives... 4 Six Advantages of using CMS... 4 Basic navigation...
XOData is a light-weight, practical, easily accessible and generic OData API visualizer / data explorer that is useful to developers as well as business users, business-process-experts, Architects etc.
DreamFactory & Modus Create Case Study By Michael Schwartz Modus Create April 1, 2013 Introduction DreamFactory partnered with Modus Create to port and enhance an existing address book application created
Rich-Internet Anwendungen auf Basis von ColdFusion und Ajax Sven Ramuschkat SRamuschkat@herrlich-ramuschkat.de München & Zürich, März 2009 A bit of AJAX history XMLHttpRequest introduced in IE5 used in
Fast track to HTML & CSS 101 (Web Design) Level: Introduction Duration: 5 Days Time: 9:30 AM - 4:30 PM Cost: 997.00 Overview Fast Track your HTML and CSS Skills HTML and CSS are the very fundamentals of
Visualization with Excel Tools and Microsoft Azure Introduction Power Query and Power Map are add-ins that are available as free downloads from Microsoft to enhance the data access and data visualization
QUICK START GUIDE Cloud based Web Load, Stress and Functional Testing Performance testing for the Web is vital for ensuring commercial success. JAR:Load is a Web Load Testing Solution delivered from the