High-Volume Hypothesis Testing for Large-Scale Web Log Analysis
|
|
|
- Augustine Chase
- 9 years ago
- Views:
Transcription
1 High-Volume Hypothesis Testing for Large-Scale Web Log Analysis Sana Malik Human-Computer Interaction Lab University of Maryland College Park, MD 20742, USA Eunyee Koh Adobe Research San Jose, CA 95113, USA Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for thirdparty components of this work must be honored. For all other uses, contact the Owner/Author. Copyright is held by the owner/author(s). CHI'16 Extended Abstracts, May 07-12, 2016, San Jose, CA, USA ACM /16/05. Abstract Time-stamped event sequence data is being generated across many domains: shopping transactions, web traffic logs, medical histories, etc. Oftentimes, analysts are interested in comparing the similarities and differences between two or more groups of event sequences to better understand processes that lead to different outcomes (e.g., a customer did or did not make a purchase). CoCo is a visual analytics tool for Cohort Comparison that combines automated highvolume hypothesis testing (HVHT) with and interactive visualization and user interface for improved exploratory data analysis. This paper covers the first case study of CoCo for large-scale web log analysis and the challenges that arise when scaling a visual analytics tool to large datasets. The direct contributions of this paper are: (1) solutions to 7 challenges of scaling a visual analytics tool to larger datasets, and (2) a case study with three real-world analysts with these solutions implemented. Author Keywords Hypothesis testing; visual analytics; time-stamped event sequences; cohort comparison ACM Classification Keywords H.5.2. Information interfaces and presentation (e.g., HCI): User interfaces
2 Key Terms We define some key terms in the context of event sequence analysis. Event type. The category of a time-stamped occurrence. Event. A specific instance of an event type associated with a timestamp. Record. All events in a single user s history. Sequence. Two or more events. Consecutive. A sequence of events uninterrupted by other events. Concurrent. Two or more events that occur at exactly the same timestamp. Cohort. A group of records. Prevalence. The percent of records containing an event type or sequence. Frequency. The number of times per record an event type or sequence occurs. Introduction Time-stamped event sequence data is generated across many domains. Business analysts use customer transaction histories to predict future purchases. Medical researchers and doctors use Electronic Health Records (EHRs) to find patterns in side effects among a group of patients. Oftentimes, analysts are interested in comparing two or more groups of event sequences to better understand the similarities and differences between them and how these histories affect outcomes. With more complex data in larger volumes than ever, exploratory data analysis is becoming prevalent and visual analytics are increasingly important. Visualization plays an important role in allowing analysts to explore their data more organically, allowing them to discover things they did not know existed, easily identify anomalies, and generate hypotheses. CoCo, for Cohort Comparison, is a visual analytics tool for exploring results of high-volume hypothesis tests with an interactive interface, curated sorted and filtering, and a pre-defined taxonomy of metrics. Previous work introduced CoCo and evaluated its efficacy in the medical domain through case studies [3,10]. This paper covers the first case study of CoCo to demonstrate the value of high-volume hypothesis testing when paired with an interactive user interface in the web log analysis domain. The direct contributions of this paper are: 1. Proposed solutions to 7 challenges when extending HVHT to larger datasets, and 2. A case study with real-world analysts using the solutions implemented in a visual analytics tool. We begin by reviewing related work in event sequence analysis and visualization. We briefly describe CoCo. From there, we describe the challenges of scaling highvolume hypothesis testing for large-scale web logs and our iterative approach in solving these issues. Lastly, we implement these changes in CoCo and evaluate their efficacy in a case study with three analysts. Related Work Temporal Data Mining Automated hypothesis testing is closely related to data mining. Traditional data mining algorithms include frequent sequence mining [7,9] and association rule (itemset) mining [1]. Most of these techniques mine sequences in a single dataset rather than compare differences and similarities across two datasets. While a data mining technique can be used in tandem to facilitate similar comparisons (e.g. comparing frequent sequence results between two datasets), more specialized methods are needed to answer questions about which sequences occur significantly differently between datasets? Bay and Pazzani introduce contrast mining sets [2], an algorithm for detecting differences between groups based on record attributes (i.e., age, gender, or occupation). In addition to record attributes, we look at differences in the event sequences themselves, based on both occurrence and timestamps. Data mining algorithms are often a blackbox, allowing little user involvement during the process. Recent work has been done on interactive sequence mining [5,8,12,14], though these system focus primarily on mining frequent patterns in a single dataset, not differences between two datasets.
3 Overview of CoCo CoCo is comprised of five main panels that enable users to systematically explore results of high-volume hypothesis tests: (1) A scattergram that displays all sequences found in the dataset and their occurrence in each of the cohorts provides a way to control sample size, (2) an aggregated overview of each of the cohorts allows users to see broad trends, (3) flexible methods for filtering and sorting the result set based on p-value or metric enable users to easily highlight potentially meaningful results, (4) a visualization to easily scan the result set, and (5) details-ondemand allow users to explore a selected result in more detail. Figure 1. All data shown is synthetic. An extended description is given in the sidebar. Event Sequence Visualization & Comparison Gleicher et al. [6] provide an extensive survey of visual comparison techniques categorized into three metods: juxtaposition, superposition, and explicit encoding. Visualization tools have been designed for event sequence data [11,13], however there has been little research on visualizing event sequence comparisons. Zhao et al. [16] design MatrixWave to compare the flow of users in clickstream datasets. MatrixWave focuses on differences in the occurrence of immediate, pairwise steps in the event stream, whereas we generalize to differences in sequences of any length, as well as differences dealing with time. Vrotsou et al. [14] introduce a set of event sequence similarity metrics; we focus on difference metrics, but believe our work can applied to these metrics as well. Towards large-scale event analytics and visualization, Wongsuphasawat and Lin [15] develop two interactive visualizations for common analysis tasks and Du et al. [4] propose 14 strategies for coping with volume and variety in large event sequence datasets. Overview of System In this section, we provide a brief overview of the CoCo s metrics and interface (Figure 1). A complete taxonomy is discussed in previous work [10], in addition to the motivation for the interface design.
4 Figure 2. A table of the p-value distribution across metrics provides a way for analysts to understand if the dataset contains more than expected false positives. The table also serves as a filter, where users can click to remove a p-value group or a metric group. CoCo includes five metrics in its current version: prevalence of events, sequences, and non-consecutive event pairs; duration of consecutive event pairs; and frequency of events. Interface Top, left: A scatterplot of all sequences, colored by consecutive or non-consecutive. Each axis is the number of records that contain the record in alpha cohort and beta cohort. Top, center: Aggregated EventFlow [11] overviews. Top, right: Methods for filtering and sorting the result set. Filtering is possible by event type, record coverage, significance, metric type, or sequence type. Users can sort results based by ratio and significance, significance only, ratio only, alpha value, or beta value. Users may use the default p-values or apply a correction. Bottom, left: The results are displayed in a ranked list. Each row represents a result, and the hypothesis is shown in the middle. Icons to the left indicate metric type: Percent (%) for record coverage, clock for time, and round arrow for frequency. The sequence is shown to the right of the metric. Each bar represents an event, colored according to the legend (left). A bar grows to the left or right indicating the ratio of values between the two groups. The bars are colored by significance (black is p < 0.01). Bottom, right: Details-on-demand of the result are shown to the right when a user clicks a specific event. For average metrics (e.g., time or frequency), users can see a distribution of the value in each cohort. Scaling HVHT to Larger Datasets Previous versions of CoCo had only been used on relatively small datasets of up to 2,000 records per cohort and up to 50 event types. Web log datasets, on the other hand, record millions of users who access the website per day and hundreds of clickstream events per user. The increased volume of records and variety of event types presented new challenges for CoCo on both the front- and back-ends. Through an iterative process over six weeks, analysts highlighted challenges they faced with CoCo s current implementation, we proposed solutions, implemented them into CoCo, and received further feedback from analysts about these solutions and additional challenges. We cover the seven major challenges and their solutions, divided into two main areas: efficiency on the backend (1 3) and the analytical process on the frontend (4 7). CHALLENGE 1: LONG WAIT TIMES FOR COMPUTATION Long wait times can cause an analyst to lose concentration and incur more time recalling their task. In an effort to minimize long waits, we implemented two timesaving changes. The original version of CoCo counted every sequence that appeared in the loaded datasets. However, in the clickstream data, there were some records that had as many as 320 events. Based on observations of the previous three case studies, we found the analysts often did not look at results of sequences of length more than four. Typically, longer sequences were more obscure and analysts were not able to derive meaningful insights from them. We chose to implement a sequence length limit of 10, to be adequately long enough for the longer sequences found in clickstreams. In the use of this new limited version, analysts still looked mostly at sequences of length 4 5 at most, so there was no need to extend the range beyond length 10. We did not further reduce the length limit because performance at this stage was reasonable. Limiting the
5 sequence length offered a speed up of about 15x. If future datasets would benefit from a shorter limit, we will leave determining the ideal limit for future work. CHALLENGE 2: BROWSER DATA TRANSFER LIMITATIONS Larger datasets require more hypotheses to be tested, thus larger result sets to return to the browser. Due to some browser limitations, it is not possible to send data over a certain size. Thus, to reduce the volume of the result set, we automatically filter those sequences that occur in less than 1% of the records. CHALLENGE 3: PRECISION LOSS WITH CORRECTIONS When a statistical correction, such as Bonferroni Corrections, is applied, the p-values are divided by the number of hypotheses that are tested. This results in very small values. Previously, CoCo showed p-value precision up to two decimal points, but now shows four. CHALLENGE 4: VISUALIZING BOTH COHORTS With a large dataset, an analyst may not know what his or her data looks like. We embedded EventFlow [11] displays for each cohort to provide this overview. EventFlow was chosen because many of our users are familiar with it, and its aggregate display provides an overview of the most frequent patterns across the cohorts in a compact view that will scale to large datasets without using more space. CHALLENGE 5: CHANCE OF FALSE POSITIVES IS INCREASED We highlight the potential for false positives by providing the distribution of p-values to the user in a filterable table (Figure 2). Two statistical experts that we consulted with suggested this, because with any statistical test that is applied many times to a single dataset, there is some likelihood of false positives. By providing the user the distribution of the resulting p- values, the user can see if the actual distribution of p- values is what would be expected by random chance or if it is in fact affected by the content of the dataset. CHALLENGE 6: LARGE NUMBER OF EVENT TYPES By default, CoCo starts by showing only the results for single event types (sequences length 1) so analysts can make informed decisions about which events occur frequently and which might be important to the analysis. After determining if any events can be dismissed, analysts can filter out those events that they deem unrelated or unimportant to their questions. CHALLENGE 7: BEGINNING ANALYSIS IS DAUNTING With hundreds of thousands of hypothesis results, it might be daunting for analysts to know where to start with their analysis. To simplify this process, we suggest two methods. First, we recommend a process model and arrange the layout to match this process. We rearranged the panels on CoCo to suggest the order in which analysts should explore their dataset. We first provide methods for seeing an overview of the all the data (scattergram and cohort overviews) on the top left, followed by more detailed views of the result set. Controls for filtering and sorting this list are prominently displayed on the top right. Second, we provide default values for all filters and sorting methods. While these filters are customizable, the default values provide the simplest starting point for the users. It is important that the default values are carefully chosen. For example, for sequence length, we decided to start with length 1, since users are often
6 overwhelmed after looking for at the long results list. Starting with length 1 allows users to get a bearing on the events in their cohorts and allowing them to choose when they are ready to move onto the next result set. Case Study Three analysts used CoCo with a real-world event sequence dataset. One analyst was an experienced user of CoCo and two were novice users. The dataset contained users events on a product website, such as viewing the display ads, signing up for promotions or free trials, and purchasing products, and all three analysts used the same dataset to compare the group of users who purchased the products without using trials versus with using product trials. In particular, the analysts explored the occurrence of the display ads and retargeting events (e.g., an ad for a product the user has already viewed). By exploring events that are statistically significant in the result panel, analysts found one group viewed display ads more than the other group, and that group also contained more retargeting events than the other group. By investigating more on other events such as product trial and adoption using CoCo, analysts hypothesized that the first group, who viewed the display ads more, seemed fairly new to the websites product offerings ( explorers ) while the other group, who were exposed to fewer display ads and retargeting, seemed to have good knowledge about the websites products and offerings ( experienced users ). Since the datasets contained many events (over 150), the analysts found the event filtering most helpful and they were able to focus the analysis on specific events. In addition, the reduced metric calculation time provided a much better user experience for data analysis, as the analysts did not need to wait for CoCo to load data and finish hypothesis testing before they could begin their explorations. Analysts all mentioned that the results were a bit linear, and they would prefer to have more freeform exploration and interactions to explore individual sequences. Conclusions and Future Work This work explores the applicability of high-volume hypothesis testing (HVHT) to large-scale web logs. Through an iterative design process, we identify 7 challenges for extending a visual analytics tool, CoCo, to larger datasets and proposed solutions. Though we apply these solutions to a single analytics tool, we feel this can be extended to any tool that involves a large number of computations and statistical uncertainty. Providing more control over finding the sequences and allowing users to select sequences of interests would provide a more freeform analysis. Additionally, there are a limited amount of colors that can be used for the event types, which may cause ambiguity without the use or patterns, borders, or shapes. Lastly, we use a specific set of metrics for event sequence comparison, but the interface could be extended to display results of more traditional data mining techniques or metrics for datasets besides event sequences (e.g., networks). Acknowledgements The authors would like to thank Ben Shneiderman, Catherine Plaisant, and Fan Du for their feedback during the design and implementation of CoCo. We also gratefully acknowledge the support of Adobe, Oracle, and the University of Maryland s Center for Healthrelated Informatics and Bioimaging.
7 References 1. Rakesh Agrawal, Tomasz Imieli nski, and Arun Swami Mining association rules between sets of items in large databases. Proceedings of the 1993 ACM International Conference on Management of Data, ACM, Stephen D Bay and Michael J Pazzani Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery 5, 3: M Bjarnadottir, S Malik, E Onukwugha, T Gooden, and C Plaisant Understanding Adherence and Prescription Patterns Using Large Scale Claims Data. PharmacoEconomics. 4. Fan Du, Ben Shneiderman, Catherine Plaisant, Sana Malik, and Adam Perer Coping with Volume and Variety in Temporal Event Sequences: Strategies for Sharpening Analytic Focus. Visualization and Computer Graphics, IEEE Transactions on. 5. Paolo Federico, Jürgen Unger, Albert Amor- Amorós, Lucia Sacchi, Denis Klimov, and Silvia Miksch Gnaeus: Utilizing clinical guidelines for knowledge-assisted visualisation of EHR cohorts. Proceedings of the EuroVis Workshop on Visual Analytics (EuroVA 15), The Eurographics Association Michael Gleicher, Danielle Albers, Rick Walker, I. Jusufi, C. D. Hansen, and Jonathan C Roberts Visual comparison for information visualization. Information Visualization 10, 4: Jiawei Han, Hong Cheng, Dong Xin, and Xifeng Yan Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery 15, 1: Tim Lammarsch, Wolfgang Aigner, Alessio Bertone, Silvia Miksch, and Alexander Rind Special Section on Visual Analytics: Mind the time: Unleashing temporal aspects in pattern discovery. Computer Graphics 38: Nizar R Mabroukeh and Christie I Ezeife A taxonomy of sequential pattern mining algorithms. ACM Computing Surveys 43, 1: 3:1 3: Sana Malik, Fan Du, Megan Monroe, Eberechukwu Onukwugha, Catherine Plaisant, and Ben Shneiderman Cohort comparison of event sequences with balanced integration of visual analytics and statistics. Proceedings of the 20th International Conference on Intelligent User Interfaces, ACM, Megan Monroe, Rongjian Lan, Hanseung Lee, Catherine Plaisant, and Ben Shneiderman Temporal event sequence simplification. IEEE Transactions on Visualization and Computer Graphics 19, 12: Adam Perer and Fei Wang Frequence: Interactive mining and visualization of temporal frequent event sequences. Proceedings of the 19th International Conference on Intelligent User Interfaces, ACM, Charles D Stolper, Adam Perer, and David Gotz Progressive Visual Analytics: User-Driven Visual Exploration of In-Progress Analytics. {IEEE} Trans. Vis. Comput. Graph., Katerina Vrotsou and Aida Nordman Interactive visual sequence mining based on pattern-growth IEEE Conference on Visual
8 Analytics Science and Technology (VAST 14), Krist Wongsuphasawat and Jimmy Lin Using Visualizations to Monitor Changes and Harvest Insights from a Global-Scale Logging Infrastructure at Twitter IEEE Conference on Visual Analytics Science and Technology (VAST): Jian Zhao, Zhicheng Liu, Mira Dontcheva, Aaron Hertzmann, and Alan Wilson MatrixWave: Visual comparison of event sequence data. Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, ACM, For more information about CoCo, please visit:
THE growing volume and variety of data presents both
1 Coping with Volume and Variety in Temporal Event Sequences: Strategies for Sharpening Analytic Focus Fan Du, Ben Shneiderman, Catherine Plaisant, Sana Malik, and Adam Perer Abstract The growing volume
Interactive Information Visualization of Trend Information
Interactive Information Visualization of Trend Information Yasufumi Takama Takashi Yamada Tokyo Metropolitan University 6-6 Asahigaoka, Hino, Tokyo 191-0065, Japan [email protected] Abstract This paper
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du [email protected] University of British Columbia
Understanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
Big Data in Pictures: Data Visualization
Big Data in Pictures: Data Visualization Huamin Qu Hong Kong University of Science and Technology What is data visualization? Data visualization is the creation and study of the visual representation of
ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL
International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR
White Paper April 2006
White Paper April 2006 Table of Contents 1. Executive Summary...4 1.1 Scorecards...4 1.2 Alerts...4 1.3 Data Collection Agents...4 1.4 Self Tuning Caching System...4 2. Business Intelligence Model...5
Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results
, pp.33-40 http://dx.doi.org/10.14257/ijgdc.2014.7.4.04 Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results Muzammil Khan, Fida Hussain and Imran Khan Department
Microsoft Office Live Meeting Events User s Guide
Microsoft Office Live Meeting Events User s Guide Information in this document, including URL and other Internet Web site references, is subject to change without notice. Unless otherwise noted, the companies,
Website Personalization using Data Mining and Active Database Techniques Richard S. Saxe
Website Personalization using Data Mining and Active Database Techniques Richard S. Saxe Abstract Effective website personalization is at the heart of many e-commerce applications. To ensure that customers
VACA: A Tool for Qualitative Video Analysis
VACA: A Tool for Qualitative Video Analysis Brandon Burr Stanford University 353 Serra Mall, Room 160 Stanford, CA 94305 USA [email protected] Abstract In experimental research the job of analyzing data
A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS
A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS Stacey Franklin Jones, D.Sc. ProTech Global Solutions Annapolis, MD Abstract The use of Social Media as a resource to characterize
Executive Dashboard. User Guide
Executive Dashboard User Guide 2 Contents Executive Dashboard Overview 3 Naming conventions 3 Getting started 4 Welcome to Socialbakers Executive Dashboard! 4 Comparison View 5 Setting up a comparison
Investigating Clinical Care Pathways Correlated with Outcomes
Investigating Clinical Care Pathways Correlated with Outcomes Geetika T. Lakshmanan, Szabolcs Rozsnyai, Fei Wang IBM T. J. Watson Research Center, NY, USA August 2013 Outline Care Pathways Typical Challenges
How can you unlock the value in real-world data? A novel approach to predictive analytics could make the difference.
How can you unlock the value in real-world data? A novel approach to predictive analytics could make the difference. What if you could diagnose patients sooner, start treatment earlier, and prevent symptoms
OECD.Stat Web Browser User Guide
OECD.Stat Web Browser User Guide May 2013 May 2013 1 p.10 Search by keyword across themes and datasets p.31 View and save combined queries p.11 Customise dimensions: select variables, change table layout;
How To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]
Static Data Mining Algorithm with Progressive Approach for Mining Knowledge
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive
Divide-n-Discover Discretization based Data Exploration Framework for Healthcare Analytics
for Healthcare Analytics Si-Chi Chin,KiyanaZolfaghar,SenjutiBasuRoy,AnkurTeredesai,andPaulAmoroso Institute of Technology, The University of Washington -Tacoma,900CommerceStreet,Tacoma,WA980-00,U.S.A.
Fairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
Innovative Information Visualization of Electronic Health Record Data: a Systematic Review
Innovative Information Visualization of Electronic Health Record Data: a Systematic Review Vivian West, David Borland, W. Ed Hammond February 5, 2015 Outline Background Objective Methods & Criteria Analysis
Mining an Online Auctions Data Warehouse
Proceedings of MASPLAS'02 The Mid-Atlantic Student Workshop on Programming Languages and Systems Pace University, April 19, 2002 Mining an Online Auctions Data Warehouse David Ulmer Under the guidance
Build Your First Web-based Report Using the SAS 9.2 Business Intelligence Clients
Technical Paper Build Your First Web-based Report Using the SAS 9.2 Business Intelligence Clients A practical introduction to SAS Information Map Studio and SAS Web Report Studio for new and experienced
Search Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
Improving SAS Global Forum Papers
Paper 3343-2015 Improving SAS Global Forum Papers Vijay Singh, Pankush Kalgotra, Goutam Chakraborty, Oklahoma State University, OK, US ABSTRACT Just as research is built on existing research, the references
Types of Studies. Systematic Reviews and Meta-Analyses
Types of Studies Systematic Reviews and Meta-Analyses Important medical questions are typically studied more than once, often by different research teams in different locations. A systematic review is
Visualizing Repertory Grid Data for Formative Assessment
Visualizing Repertory Grid Data for Formative Assessment Kostas Pantazos 1, Ravi Vatrapu 1, 2 and Abid Hussain 1 1 Computational Social Science Laboratory (CSSL) Department of IT Management, Copenhagen
3D Interactive Information Visualization: Guidelines from experience and analysis of applications
3D Interactive Information Visualization: Guidelines from experience and analysis of applications Richard Brath Visible Decisions Inc., 200 Front St. W. #2203, Toronto, Canada, [email protected] 1. EXPERT
Principles of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015
Principles of Data Visualization for Exploratory Data Analysis Renee M. P. Teate SYS 6023 Cognitive Systems Engineering April 28, 2015 Introduction Exploratory Data Analysis (EDA) is the phase of analysis
Welcome to Harcourt Mega Math: The Number Games
Welcome to Harcourt Mega Math: The Number Games Harcourt Mega Math In The Number Games, students take on a math challenge in a lively insect stadium. Introduced by our host Penny and a number of sporting
Dynamic Visualization and Time
Dynamic Visualization and Time Markku Reunanen, [email protected] Introduction Edward Tufte (1997, 23) asked five questions on a visualization in his book Visual Explanations: How many? How often? Where? How
Scatter Plots with Error Bars
Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each
TIBCO Spotfire Guided Analytics. Transferring Best Practice Analytics from Experts to Everyone
TIBCO Spotfire Guided Analytics Transferring Best Practice Analytics from Experts to Everyone Introduction Business professionals need powerful and easy-to-use data analysis applications in order to make
Describing, Exploring, and Comparing Data
24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,
Spiel. Connect to people by sharing stories through your favorite discoveries
Spiel Connect to people by sharing stories through your favorite discoveries Addison Leong Joanne Jang Katherine Liu SunMi Lee Development & user Development & user Design & product Development & testing
Bitrix Site Manager 4.1. User Guide
Bitrix Site Manager 4.1 User Guide 2 Contents REGISTRATION AND AUTHORISATION...3 SITE SECTIONS...5 Creating a section...6 Changing the section properties...8 SITE PAGES...9 Creating a page...10 Editing
Last Updated: 08/27/2013. Measuring Social Media for Social Change A Guide for Search for Common Ground
Last Updated: 08/27/2013 Measuring Social Media for Social Change A Guide for Search for Common Ground Table of Contents What is Social Media?... 3 Structure of Paper... 4 Social Media Data... 4 Social
Hypothesis testing. c 2014, Jeffrey S. Simonoff 1
Hypothesis testing So far, we ve talked about inference from the point of estimation. We ve tried to answer questions like What is a good estimate for a typical value? or How much variability is there
SPSS Explore procedure
SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,
Distinguishing Humans from Robots in Web Search Logs: Preliminary Results Using Query Rates and Intervals
Distinguishing Humans from Robots in Web Search Logs: Preliminary Results Using Query Rates and Intervals Omer Duskin Dror G. Feitelson School of Computer Science and Engineering The Hebrew University
MicroStrategy Course Catalog
MicroStrategy Course Catalog 1 microstrategy.com/education 3 MicroStrategy course matrix 4 MicroStrategy 9 8 MicroStrategy 10 table of contents MicroStrategy course matrix MICROSTRATEGY 9 MICROSTRATEGY
MicroStrategy Analytics Express User Guide
MicroStrategy Analytics Express User Guide Analyzing Data with MicroStrategy Analytics Express Version: 4.0 Document Number: 09770040 CONTENTS 1. Getting Started with MicroStrategy Analytics Express Introduction...
Clinic + - A Clinical Decision Support System Using Association Rule Mining
Clinic + - A Clinical Decision Support System Using Association Rule Mining Sangeetha Santhosh, Mercelin Francis M.Tech Student, Dept. of CSE., Marian Engineering College, Kerala University, Trivandrum,
Adobe Insight, powered by Omniture
Adobe Insight, powered by Omniture Accelerating government intelligence to the speed of thought 1 Challenges that analysts face 2 Analysis tools and functionality 3 Adobe Insight 4 Summary Never before
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects
Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects Abstract: Build a model to investigate system and discovering relations that connect variables in a database
Tie Visualization in NodeXL
Tie Visualization in NodeXL Nick Gramsky ngramsky at cs.umd.edu CMSC 838C Social Computing University of Maryland College Park Abstract: The ability to visualize a network as it varies over time has become
INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER
INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. AGENDA Overview/Introduction to Data Mining
Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:
Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
WIMP: Windows, Icons, Menus (or mice), Pointers (or pull-down menus) Kathy Lynch and Julie Fisher 2004. Topic Overview. Suggested Readings.
IMS3470 Human-computer interaction WIMP: Windows, Icons, Menus (or mice), Pointers (or pull-down menus) Kathy Lynch and Julie Fisher 2004 Topic Overview WIMP or is it GUI? (interface model using direct
Frequence: Interactive Mining and Visualization of Temporal Frequent Event Sequences
Frequence: Interactive Mining and Visualization of Temporal Frequent Event Sequences Adam Perer IBM T.J. Watson Research Center Yorktown Heights, New York, USA [email protected] Fei Wang IBM T.J. Watson
ISSN: 2348 9510. A Review: Image Retrieval Using Web Multimedia Mining
A Review: Image Retrieval Using Web Multimedia Satish Bansal*, K K Yadav** *, **Assistant Professor Prestige Institute Of Management, Gwalior (MP), India Abstract Multimedia object include audio, video,
Visual Analysis of Massive Web Session Data
Visual Analysis of Massive Web Session Data Zeqian Shen ebay Research Labs Jishang Wei University of California, Davis Neel Sundaresan ebay Research Labs Kwan-Liu Ma University of California, Davis ABSTRACT
A Visualization Technique for Monitoring of Network Flow Data
A Visualization Technique for Monitoring of Network Flow Data Manami KIKUCHI Ochanomizu University Graduate School of Humanitics and Sciences Otsuka 2-1-1, Bunkyo-ku, Tokyo, JAPAPN [email protected]
1&1 SEO Tool Expert Call
1&1 SEO Tool Expert Call Introduction Search Engine Optimization (SEO) is one of the most effective marketing tactics to generate leads and sales for your business website. Here are some few important
Intelligent Process Management & Process Visualization. TAProViz 2014 workshop. Presenter: Dafna Levy
Intelligent Process Management & Process Visualization TAProViz 2014 workshop Presenter: Dafna Levy The Topics Process Visualization in Priority ERP Planning Execution BI analysis (Built-in) Discovering
Welcome to Ipswitch Instant Messaging
Welcome to Ipswitch Instant Messaging What is Instant Messaging (IM), anyway? In a lot of ways, IM is like its cousin: e-mail. E-mail, while it's certainly much faster than the traditional post office
Unit 9 Describing Relationships in Scatter Plots and Line Graphs
Unit 9 Describing Relationships in Scatter Plots and Line Graphs Objectives: To construct and interpret a scatter plot or line graph for two quantitative variables To recognize linear relationships, non-linear
UFORIA - A FLEXIBLE VISUALISATION PLATFORM FOR DIGITAL FORENSICS AND E-DISCOVERY
UFORIA - A FLEXIBLE VISUALISATION PLATFORM FOR DIGITAL FORENSICS AND E-DISCOVERY Arnim Eijkhoudt & Sijmen Vos Amsterdam University of Applied Sciences Amsterdam, The Netherlands [email protected], [email protected]
Sonatype CLM Server - Dashboard. Sonatype CLM Server - Dashboard
Sonatype CLM Server - Dashboard i Sonatype CLM Server - Dashboard Sonatype CLM Server - Dashboard ii Contents 1 Introduction 1 2 Accessing the Dashboard 3 3 Viewing CLM Data in the Dashboard 4 3.1 Filters............................................
SAS Visual Analytics dashboard for pollution analysis
SAS Visual Analytics dashboard for pollution analysis Viraj Kumbhakarna VP Sr. Analytical Data Consultant MUFG Union Bank N.A., San Francisco, CA Baskar Anjappan, VP SAS Developer MUFG Union Bank N.A.,
Visualization Quick Guide
Visualization Quick Guide A best practice guide to help you find the right visualization for your data WHAT IS DOMO? Domo is a new form of business intelligence (BI) unlike anything before an executive
SAS Analyst for Windows Tutorial
Updated: August 2012 Table of Contents Section 1: Introduction... 3 1.1 About this Document... 3 1.2 Introduction to Version 8 of SAS... 3 Section 2: An Overview of SAS V.8 for Windows... 3 2.1 Navigating
Empowering the Masses with Analytics
Empowering the Masses with Analytics THE GAP FOR BUSINESS USERS For a discussion of bridging the gap from the perspective of a business user, read Three Ways to Use Data Science. Ask the average business
Working with telecommunications
Working with telecommunications Minimizing churn in the telecommunications industry Contents: 1 Churn analysis using data mining 2 Customer churn analysis with IBM SPSS Modeler 3 Types of analysis 3 Feature
CentralMass DataCommon
CentralMass DataCommon User Training Guide Welcome to the DataCommon! Whether you are a data novice or an expert researcher, the CentralMass DataCommon can help you get the information you need to learn
InfiniteInsight 6.5 sp4
End User Documentation Document Version: 1.0 2013-11-19 CUSTOMER InfiniteInsight 6.5 sp4 Toolkit User Guide Table of Contents Table of Contents About this Document 3 Common Steps 4 Selecting a Data Set...
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
Remote Usability Evaluation of Mobile Web Applications
Remote Usability Evaluation of Mobile Web Applications Paolo Burzacca and Fabio Paternò CNR-ISTI, HIIS Laboratory, via G. Moruzzi 1, 56124 Pisa, Italy {paolo.burzacca,fabio.paterno}@isti.cnr.it Abstract.
A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains
A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains Dr. Kanak Saxena Professor & Head, Computer Application SATI, Vidisha, [email protected] D.S. Rajpoot Registrar,
Exercises for Data Visualisation for Analysis in Scholarly Research
Exercises for Data Visualisation for Analysis in Scholarly Research Exercise 1: exploring network visualisations 1. In your browser, go to http://bit.ly/11qqxuj 2. Scroll down the page to the network graph.
Web Advertising Personalization using Web Content Mining and Web Usage Mining Combination
8 Web Advertising Personalization using Web Content Mining and Web Usage Mining Combination Ketul B. Patel 1, Dr. A.R. Patel 2, Natvar S. Patel 3 1 Research Scholar, Hemchandracharya North Gujarat University,
DICON: Visual Cluster Analysis in Support of Clinical Decision Intelligence
DICON: Visual Cluster Analysis in Support of Clinical Decision Intelligence Abstract David Gotz, PhD 1, Jimeng Sun, PhD 1, Nan Cao, MS 2, Shahram Ebadollahi, PhD 1 1 IBM T.J. Watson Research Center, New
User Guide. Analytics Desktop Document Number: 09619414
User Guide Analytics Desktop Document Number: 09619414 CONTENTS Guide Overview Description of this guide... ix What s new in this guide...x 1. Getting Started with Analytics Desktop Introduction... 1
Teacher Dashboard User Guide. Instructional Support Systems
Instructional Support Systems Teacher Dashboard User Guide Instructional Support Systems 1 Table of Contents A STUDENT A DAY... 3 HOW TO ACCESS THE INSTRUCTIONAL SUPPORT SYSTEM... 3 BRIEF HISTORY OF THE
Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory
LA-UR-12-24572 Approved for public release; distribution is unlimited Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory Alicia Garcia-Lopez Steven R. Booth September 2012
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Job Streaming User Guide
Job Streaming User Guide By TOPS Software, LLC Clearwater, Florida Document History Version Edition Date Document Software Trademark Copyright First Edition 08 2006 TOPS JS AA 3.2.1 The names of actual
Context-aware Library Management System using Augmented Reality
International Journal of Electronic and Electrical Engineering. ISSN 0974-2174 Volume 7, Number 9 (2014), pp. 923-929 International Research Publication House http://www.irphouse.com Context-aware Library
Access Tutorial 13: Event-Driven Programming Using Macros
Access Tutorial 13: Event-Driven Programming Using Macros 13.1 Introduction: What is eventdriven programming? In conventional programming, the sequence of operations for an application is determined by
Adobe Analytics Business Practitioner Adobe Certified Expert Exam Guide. Exam number: 9A0-381
Adobe Analytics Business Practitioner Adobe Certified Expert Exam Guide Exam number: 9A0-381 Revised 24 June 2015 About Adobe Certified Expert Exams To be an Adobe Certified Expert is to demonstrate expertise
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Dr. Lisa White [email protected]
Dr. Lisa White [email protected] edu Associate Dean College of Science and Engineering San Francisco State University Purpose of a Poster To communicate/publicize to others your research/experiment results
Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing
Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing
