Big Data: Rethinking Text Visualization

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Big Data: Rethinking Text Visualization"

Transcription

1 Big Data: Rethinking Text Visualization Dr. Anton Heijs Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important for text analytics as done with the KMX technology of Treparel. Text visualization is most powerfull when it supports understanding complex patterns in data and support decision making. Statistical and machine learning techniques are used to find patterns and relationships that can then be visualized. Classification and clustering are two fundamental approaches in text analytics and the visualization of classified and clustered documents are thus two importent visualization approaches that are discussed here. 1 Introduction: seeing the unseen Visualization is the process of constructing a visual image in the mind to understand the data better. Although this is an accurate description of the word visualization instead of being a mental process the task of visualization has become more and more an external process. The fact that visualization has partly become an external process indicates that a broader definition of the term visualization seems to be needed, such as: Visualization is a method of computing. It transforms the symbolic into the geometric,enabling researchers to observe their simulations and computations. Visualisation offers a method for seeing the unseen. It enriches the process of scientific discovery and fosters profound and unexpected insights. Visualization enables man to comprehend large data sets, data sets which are too large to grasp by mental imagination. Visualization enables the discovery of previous unknown properties of the data set which may not have been anticipated. The perception of these properties or patterns can lead the user to develop new insights. Visualization often reveals inherent problems of the data, for instance errors and artifacts may be readily revealed. Visualization enables both the examination of the large scale features of the data set as well as the local features, allowing the user to see local features in a larger scale reference. Visualization allows the user to form hypothesis based on the (newly) observed phenomena or developed insights. Ideally visualization should be used to provide a means to overview, explore and navigate large multidimensional data sets. Visualization is needed to understand data and data is becoming more imporant but also more complex. We need to understand our data, in the unstructured form and the structured form, and extract all relevant patterns and trends from the data to obtain a clear picture and be able to make well judged decisions. For this we need data analytics and especially data/text mining and machine learning techniques that can analyze large complex data sets. Machine learning algorithms help us to model a pattern in the data set and describe it mathematically which is very powerful since a mathematical description of a pattern or

2 a trend is the most precise way of capturing and describing the data. We can then compare multiple patterns in the data at the level where on can draw conclusions taking into account all relevant information (captured by these models). This is the level where the model provides a meaning (interpretation) in the real world, the semantic level of the pattern or a trend in the data. This is also the level where reasoning on patterns in the data can start. Now what kind of patterns can there exist? This is asking what kind of models can we find when analyzing any kind of data. We can find linear models and non-linear models and the relationships between these models. Their mathematical description is well suited for further analysis but visualization is very powerfull to understand the models by seeing the pattern or trend in the data. The visualization techniques needed to support this are techniques that can show the type of relationships between the data points (documents in the case of text analytics) and these can be supervised (categorical), unsupervised (similarity relationships as in clusters) but also hierarchical relationships or relationships over time. There are visualizations of table data (rows and columns) visualization hierarchical data visualization classified data (categorical) visualization clustered data visualization data over time visualization correlation data visualization The relationships in the data that these visualization techniques can reveal can be linked which is very important for possible conclusions on the data and therefore interaction in one visualization is coupled in the other visualizations in KMX. This is called multiple coupled view visualization and is becoming recently more important since in data analysis one looks at all related data and this also means analyzing multiple data sets combined. In the scientific literature one can find many papers on details behind the visualization techniques we mention here. KMX technology uses these advanced visualizations as part of the analysis pipeline where we also have support for multiple selections of data points (which can be patents, research, legal or news documents) in different visualizations. These multiple coupled views and it basically means that when a user selects one or more documents this is shown in all available visualizations and the interaction is also supported from all visualizations. Figure 1: GUI showing multiple views with different visulizations of a large patent data set Let us first take a brief look at how exactly we arrive at a visualization from the original raw data. The visualization pipeline is the name of the sequence of processes to create a visual representation of data. Before the visualization pipeline is entered a quantity of data is generated either from databases or any other means of data collection. The visualization pipeline basically consists of four steps. Data analysis is the first step in the visualization process which consist of multiple steps in a pipeline. During data analysis the data is prepared for visualization. Basically this means 2

3 (a) (b) Figure 2: a: One example of a treemap showing patents on chemistry. With the interaction of the tree map visualization one of course also needs to have support for drill down into the data. If one want to see all patents in C07 and can update the visualization and show patents deeper in the C07 classification tree. This is one of the strength of the tree map algorithm. Fig b: The above example shows how coloring can be used to show a parameter like the number of patents in a class, shown from green (large number) to black (small number of patents in that sub class). that a number of operations can be performed on the data to make it more suitable for visualization. 1. After completing the data analysis step the raw data has been transformed to data which can be visualized. However this does not mean that all of the data is of interest. Only the portions of the data that are of interest should be visualized and hence the second step in the visualization pipeline is a data selection step to select the data of interest, so only focal data remains in the pipeline. Usually this part of the pipeline features some user-interaction to decide on the sections of interest. 2. Now that has been decided which data is the focus data, the next step is the mapping step of the visualization pipeline. In this part of the pipeline the data is mapped to render-able representations. These representations are geometric primitive like lines, surfaces, points, voxels with certain attributes like color, position, size, transparency, texture etc. 3. After the data mapping all that remains is the final rendering of the geometric data. Rendering is creating an image from a model. Operations performed here are viewing transformations, lighting calculations, hidden surface removal, scan conversion, anti aliasing etc. The final visualization is created and either written to file or displayed on the screen. The resulting visualization should ideally be expressive, effective and appropriate. Expressive meaning that the visualization should only display the relevant information of a data set. It should be effective in such a manner that it complements the users capabilities of perception and the mental image that a user has of the visualization. Finally an appropriate visualization is a visualization in which the efforts of creating the visualization do not outweigh the benefits of the resulting visualization. The first step in the visualization of patent data is often done by searching/filtering the data to extract the patterns text mining can strongly contribute to the visualization. Some important analysis tasks for a user are: the visualization of a document collection to a known set of classes the visualization of a document collection to a unknown set of classes the visualization of a document collection in the context of their hierarchy the visualization of a document collection over time 1.1 Treemap visualization 3

4 The first two task can be implemented using supervised and unsupervised machine learning techniques through which automatic classification and clustering of the data is done. This data is then processed in the visualization pipeline to provide insight in the classified and clustered patent data. Since patent data contains classification codes the data can be hierarchically ordered in for instance the IPC classification. To provide insight in a collection of patent data we also provide an approach to visualize hierarchical patent data using a tree map algorithm. The patent data also contains time stamp data through which a collection of patents can be analyzed over time. For this we implemented a visualization o the change of the number of patents from a patent collection which belong to a patent class over time. Tree mapping is a method for displaying tree-structured data using nested rectangles which provide overview and selection of data points. An example is given in figure 6 below where there are documents in class A and H and in the class A there are three sub categories (A1, A2 and A3) where one is selected and all documents in that class are in shown in red. Within the tree map the user has an overview of the classes and number of patents in those classes for the full collection. With a mouse over he can get additional information about the patent and he can add or remove one or more patents from the currently selected set. When one selects on box one selects on document (such as EP in the example). The tree map visualization is very powerful since in a fixed screen scape the tree mapping algorithm can show all hierarchical data points (patent documents) and provide and an overview and also a good selection mechanism. Figure 3: Combined use of two visualizations in KMX (tree map and clustering) to show the patent data hierarchically (tree map) and unsupervised (3D clustering where the height is the density of the patents especially prominent for the an-organic and organic chemistry) and the color is used to display the pattern in the patent data over time. We can also combine two visualizations, as shown below where the tree map coloring is used to show the patents over three years (2005,2006,2007) and the cluster visualization is showing the same documents but then their similarity as calculated by the machine learning algorithm for clustering in KMX. The clustering of documents helps to analyze a collection of patents and get insight in the natural grouping of the patents. In the cluster visualization, the user can easily select documents by brushing, i.e. selecting them using the 4

5 mouse. By brushing in a cluster or a parallel coordinates visualization the user gets feedback about the selected documents which greatly helps in the selection of documents, which is an example of the mentioned multiple coupled views support. One can use multiple brushes to have a rough selection and a more precise selection which provides the user feedback on a larger selected set of documents and also a smaller set. The use of multiple brushes also helps the analyst to explore the documents directly visualized in a tree map visualization and a visualization of the documents over time. This helps to understand if a brushed set of documents which are close together in a cluster visualization are also hierarchically close together in the tree map visualization. Additionally one can analyze this also over time which provides the user to analyze if documents which are clustered close together are also close together over time. If one wants to check if there is a trend on a certain technology over time this would be a logical way to analyze it and also to explore 1.2 Parallel coordinates visualization When we have a set of documents selected maybe by filtering or brushing (see right cluster image) we can show for the selected set of documents (in the example below the documents on Ebola, SARS and h5n1) the distribution of the classification score. This is done by using three parallel vertical oriented coordinates where the classification score is from 0.0 (bottom) to 100 (top) can be shown for each document and each document is a line going through the three axis. Immediately one can now see the document that are selected on one cluster and that have a high score on one class and a low score on the other classes. This is true in the below shown example of KMX for all classes and shows the high performance of the classifiers. Parallel coordinates is a very general visualization technique and can map multivariate data belonging to text data. Here we have explained it with an example related to clustering, classification and two types of visualizations. When we have classified all patents in KMX we can use the classification scores to calculate the correlation between all patents and visualize this. This provides valuable insights on aspects which one cannot determine in a query based approach, such as shown below. On the vertical and horizontal axes of the correlation visualization (matrix) we have the classification codes (IPC for instance) and therefore the visualization is symmetric. There are documents which are in different classes (like with pesticides) and although they are in different classes the still can share a strong correlation such as shown for patents in class C07K02 and A61K05 that have a correlation coefficient of 0,75 in the visualization below. Seeing where these strong correlation classes are is easy and valuable and this information cannot be determine by a query based approach. Also seeing where there are many of the correlating classes is seen directly in one picture which shows the strength of overview first and details on demand later when using visualizations. When we have classified all patents in KMX we can use the classification scores to calculate the correlation between all patents and visualize this. This provides valuable insights on aspects which one cannot determine in a query based approach, such as shown below. On the vertical and horizontal axes of the correlation visualization (matrix) we have the classification codes (IPC for instance) and therefore the visualization is symmetric. There are documents which are in different classes (like with pesticides) and although they are in different classes the still can share a strong correlation such as shown for patents in class C07K02 and A61K05 that have a correlation coefficient of 0,75 in the visualization below. Seeing where these strong correlation classes are is easy and valuable and this information cannot be determine by a query based approach. Also seeing where there are many of the correlating classes is seen directly in one picture which shows the strength of overview first and details on demand later when using visualizations. 1.3 Visualization of patterns over time in a document set When one wants to understand patent data over time it is valuable to be able to analyse them as part of a class capturing document about the same subject, classifications and concepts. This can be done using classification and/or clustering and then we can visualise the increase or decrease of the patents over time where the band with of the classes show the trends. This is shown below for patent classification classes but can also be done for instance on non patent literature for instance the MESH terms of pubmed documents. 5

6 (a) (b) Figure 4: Parellel coordinates visualization and cluster visualization of 3 Medline clusters (Ebola (purple), H5N1 (blue) and SARS (yellow) ). Figure 5: Here we show the use of parallel coordinates where we sorted the scores for the patents to the most important coordinate classes and the decay shows that all patents belong distinctively to the first shown class (first coordinate) and thereafter to one or two additional classes but dominant. The gray cylinders indicate the number of patents in that range of the classification score which helps to read and interpret the patent data. 6

7 (a) (b) Figure 6: a: Correlation visualization between many patent and their patent classes and Fig b: Trends of the patents over time for different patent classes. 7

8 2 Text Analytics visualizations Treparel s KMX big data text analytics solution is an client server based software platform. The KMX API makes the system open for integration with existing technologies. The client GUI is a native windows application of which a screen shot is shown below. The solution comes as a very flexible and scalable system in terms of performance and system management. Scalability of the solution allows to handle both the growing amount of data as well as the growing complexity of the data at hand at predictable cost. Figure 7: Overview of the KMX Patent Analytics GUI showing patent titles and their lables, the cluster visualization and a section of the full text of a selected patent (see cross hair in the visualization) and the brushes (green and red) indicating the training documents of the classifier. The classification score is shown from blue (positive) to yellow (negative) in the patent landscape. The training documents are indicated by the color of the brushes (green and red) About Treparel Treparel is a leading global software provider in Big Data Text Analytics and Visualization. The KMX platform allows organizations to enhance innovation processes, improve competitive advantage, mitigate litigation risk and cost and manage interactions with customers by gaining insights from numerous sources unstructured data (text, application notes, images, blogs, and patents). Global companies, government agencies, software vendors or data publishers are using Treparel KMX text analysis software to gain faster, reliable, precise insights in large complex unstructured data sets allowing them to make better informed decisions. For more information contact or go to 8

Visualization methods for patent data

Visualization methods for patent data Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Big Data Text Mining and Visualization. Anton Heijs

Big Data Text Mining and Visualization. Anton Heijs Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark

More information

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics Motivation Visual Data Mining Visualization for Data Mining Huge amounts of information Limited display capacity of output devices Chidroop Madhavarapu CSE 591:Visual Analytics Visual Data Mining (VDM)

More information

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools Paper by W. F. Cody J. T. Kreulen V. Krishna W. S. Spangler Presentation by Dylan Chi Discussion by Debojit Dhar THE INTEGRATION OF BUSINESS INTELLIGENCE AND KNOWLEDGE MANAGEMENT BUSINESS INTELLIGENCE

More information

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions What is Visualization? Information Visualization An Overview Jonathan I. Maletic, Ph.D. Computer Science Kent State University Visualize/Visualization: To form a mental image or vision of [some

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

Course Syllabus For Operations Management. Management Information Systems

Course Syllabus For Operations Management. Management Information Systems For Operations Management and Management Information Systems Department School Year First Year First Year First Year Second year Second year Second year Third year Third year Third year Third year Third

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

Data Mining with SQL Server Data Tools

Data Mining with SQL Server Data Tools Data Mining with SQL Server Data Tools Data mining tasks include classification (directed/supervised) models as well as (undirected/unsupervised) models of association analysis and clustering. 1 Data Mining

More information

HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS.

HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives

More information

Specific Usage of Visual Data Analysis Techniques

Specific Usage of Visual Data Analysis Techniques Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia

More information

Customer Analytics. Turn Big Data into Big Value

Customer Analytics. Turn Big Data into Big Value Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data

More information

Hierarchical Data Visualization

Hierarchical Data Visualization Hierarchical Data Visualization 1 Hierarchical Data Hierarchical data emphasize the subordinate or membership relations between data items. Organizational Chart Classifications / Taxonomies (Species and

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

ICT Perspectives on Big Data: Well Sorted Materials

ICT Perspectives on Big Data: Well Sorted Materials ICT Perspectives on Big Data: Well Sorted Materials 3 March 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations in

More information

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.

More information

Cleaned Data. Recommendations

Cleaned Data. Recommendations Call Center Data Analysis Megaputer Case Study in Text Mining Merete Hvalshagen www.megaputer.com Megaputer Intelligence, Inc. 120 West Seventh Street, Suite 10 Bloomington, IN 47404, USA +1 812-0-0110

More information

Interactive Data Mining and Visualization

Interactive Data Mining and Visualization Interactive Data Mining and Visualization Zhitao Qiu Abstract: Interactive analysis introduces dynamic changes in Visualization. On another hand, advanced visualization can provide different perspectives

More information

3D Data Visualization / Casey Reas

3D Data Visualization / Casey Reas 3D Data Visualization / Casey Reas Large scale data visualization offers the ability to see many data points at once. By providing more of the raw data for the viewer to consume, visualization hopes to

More information

An example. Visualization? An example. Scientific Visualization. This talk. Information Visualization & Visual Analytics. 30 items, 30 x 3 values

An example. Visualization? An example. Scientific Visualization. This talk. Information Visualization & Visual Analytics. 30 items, 30 x 3 values Information Visualization & Visual Analytics Jack van Wijk Technische Universiteit Eindhoven An example y 30 items, 30 x 3 values I-science for Astronomy, October 13-17, 2008 Lorentz center, Leiden x An

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

2015 Workshops for Professors

2015 Workshops for Professors SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

More information

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities April, 2013 gaddsoftware.com Table of content 1. Introduction... 3 2. Vendor briefings questions and answers... 3 2.1.

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

Visualization Techniques in Data Mining

Visualization Techniques in Data Mining Tecniche di Apprendimento Automatico per Applicazioni di Data Mining Visualization Techniques in Data Mining Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo di Milano

More information

Hierarchical Clustering Analysis

Hierarchical Clustering Analysis Hierarchical Clustering Analysis What is Hierarchical Clustering? Hierarchical clustering is used to group similar objects into clusters. In the beginning, each row and/or column is considered a cluster.

More information

International Journal of Electronics and Computer Science Engineering 1449

International Journal of Electronics and Computer Science Engineering 1449 International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

VISUALIZATION. Improving the Computer Forensic Analysis Process through

VISUALIZATION. Improving the Computer Forensic Analysis Process through By SHELDON TEERLINK and ROBERT F. ERBACHER Improving the Computer Forensic Analysis Process through VISUALIZATION The ability to display mountains of data in a graphical manner significantly enhances the

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA

INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA POLITECNICO DI MILANO GRADUATE SCHOOL OF BUSINESS BABD INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA Courses Description A JOINT PROGRAM WITH POLITECNICO DI MILANO SCHOOL OF MANAGEMENT PRE-COURSES

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

Introduction to the Event Analysis and Retention Dilemma

Introduction to the Event Analysis and Retention Dilemma Introduction to the Event Analysis and Retention Dilemma Introduction Companies today are encountering a number of business imperatives that involve storing, managing and analyzing large volumes of event

More information

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics SAP Brief SAP HANA Objectives Transform Your Future with Better Business Insight Using Predictive Analytics Dealing with the new reality Dealing with the new reality Organizations like yours can identify

More information

A Short Introduction to Computer Graphics

A Short Introduction to Computer Graphics A Short Introduction to Computer Graphics Frédo Durand MIT Laboratory for Computer Science 1 Introduction Chapter I: Basics Although computer graphics is a vast field that encompasses almost any graphical

More information

Visualisatie BMT. Introduction, visualization, visualization pipeline. Arjan Kok Huub van de Wetering (h.v.d.wetering@tue.nl)

Visualisatie BMT. Introduction, visualization, visualization pipeline. Arjan Kok Huub van de Wetering (h.v.d.wetering@tue.nl) Visualisatie BMT Introduction, visualization, visualization pipeline Arjan Kok Huub van de Wetering (h.v.d.wetering@tue.nl) 1 Lecture overview Goal Summary Study material What is visualization Examples

More information

MarkerView Software 1.2.1 for Metabolomic and Biomarker Profiling Analysis

MarkerView Software 1.2.1 for Metabolomic and Biomarker Profiling Analysis MarkerView Software 1.2.1 for Metabolomic and Biomarker Profiling Analysis Overview MarkerView software is a novel program designed for metabolomics applications and biomarker profiling workflows 1. Using

More information

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators

More information

A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities

A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities The first article of this series presented the capability model for business analytics that is illustrated in Figure One.

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014 Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

3D Interactive Information Visualization: Guidelines from experience and analysis of applications

3D Interactive Information Visualization: Guidelines from experience and analysis of applications 3D Interactive Information Visualization: Guidelines from experience and analysis of applications Richard Brath Visible Decisions Inc., 200 Front St. W. #2203, Toronto, Canada, rbrath@vdi.com 1. EXPERT

More information

Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008

Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008 Professional Organization Checklist for the Computer Science Curriculum Updates Association of Computing Machinery Computing Curricula 2008 The curriculum guidelines can be found in Appendix C of the report

More information

<no narration for this slide>

<no narration for this slide> 1 2 The standard narration text is : After completing this lesson, you will be able to: < > SAP Visual Intelligence is our latest innovation

More information

Adobe Insight, powered by Omniture

Adobe Insight, powered by Omniture Adobe Insight, powered by Omniture Accelerating government intelligence to the speed of thought 1 Challenges that analysts face 2 Analysis tools and functionality 3 Adobe Insight 4 Summary Never before

More information

Delivering Smart Answers!

Delivering Smart Answers! Companion for SharePoint Topic Analyst Companion for SharePoint All Your Information Enterprise-ready Enrich SharePoint, your central place for document and workflow management, not only with an improved

More information

Managing a Portfolio of Products

Managing a Portfolio of Products Managing a Portfolio of Products What is product portfolio management? Imagine you have six products. How should you allocate your limited marketing resources among them? Should you invest in each product

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

DATA ANALYTICS USING R

DATA ANALYTICS USING R DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

More information

IBM Social Media Analytics

IBM Social Media Analytics IBM Social Media Analytics Analyze social media data to better understand your customers and markets Highlights Understand consumer sentiment and optimize marketing campaigns. Improve the customer experience

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING

GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING Geoinformatics 2004 Proc. 12th Int. Conf. on Geoinformatics Geospatial Information Research: Bridging the Pacific and Atlantic University of Gävle, Sweden, 7-9 June 2004 GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL

More information

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju BIOMETRICS DOCUMENT ANALYSIS PATTERN RECOGNITION 8/24/2015 ICDAR- 2015 2 Towards a Globally Optimal Approach for Learning Deep Unsupervised

More information

Situational Awareness Through Network Visualization

Situational Awareness Through Network Visualization CYBER SECURITY DIVISION 2014 R&D SHOWCASE AND TECHNICAL WORKSHOP Situational Awareness Through Network Visualization Pacific Northwest National Laboratory Daniel M. Best Bryan Olsen 11/25/2014 Introduction

More information

SURVEY REPORT DATA SCIENCE SOCIETY 2014

SURVEY REPORT DATA SCIENCE SOCIETY 2014 SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

DATA MINING AND WAREHOUSING CONCEPTS

DATA MINING AND WAREHOUSING CONCEPTS CHAPTER 1 DATA MINING AND WAREHOUSING CONCEPTS 1.1 INTRODUCTION The past couple of decades have seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation

More information

SAS VISUAL ANALYTICS AN OVERVIEW OF POWERFUL DISCOVERY, ANALYSIS AND REPORTING

SAS VISUAL ANALYTICS AN OVERVIEW OF POWERFUL DISCOVERY, ANALYSIS AND REPORTING SAS VISUAL ANALYTICS AN OVERVIEW OF POWERFUL DISCOVERY, ANALYSIS AND REPORTING WELCOME TO SAS VISUAL ANALYTICS SAS Visual Analytics is a high-performance, in-memory solution for exploring massive amounts

More information

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents: Table of contents: Access Data for Analysis Data file types Format assumptions Data from Excel Information links Add multiple data tables Create & Interpret Visualizations Table Pie Chart Cross Table Treemap

More information

BioVisualization: Enhancing Clinical Data Mining

BioVisualization: Enhancing Clinical Data Mining BioVisualization: Enhancing Clinical Data Mining Even as many clinicians struggle to give up their pen and paper charts and spreadsheets, some innovators are already shifting health care information technology

More information

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

More information

Introduction to Computer Graphics

Introduction to Computer Graphics Introduction to Computer Graphics Torsten Möller TASC 8021 778-782-2215 torsten@sfu.ca www.cs.sfu.ca/~torsten Today What is computer graphics? Contents of this course Syllabus Overview of course topics

More information

TEXT ANALYTICS INTEGRATION

TEXT ANALYTICS INTEGRATION TEXT ANALYTICS INTEGRATION A TELECOMMUNICATIONS BEST PRACTICES CASE STUDY VISION COMMON ANALYTICAL ENVIRONMENT Structured Unstructured Analytical Mining Text Discovery Text Categorization Text Sentiment

More information

Making confident decisions with the full spectrum of analysis capabilities

Making confident decisions with the full spectrum of analysis capabilities IBM Software Business Analytics Analysis Making confident decisions with the full spectrum of analysis capabilities Making confident decisions with the full spectrum of analysis capabilities Contents 2

More information

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information

More information

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc. Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse

More information

Visual Data Mining with Pixel-oriented Visualization Techniques

Visual Data Mining with Pixel-oriented Visualization Techniques Visual Data Mining with Pixel-oriented Visualization Techniques Mihael Ankerst The Boeing Company P.O. Box 3707 MC 7L-70, Seattle, WA 98124 mihael.ankerst@boeing.com Abstract Pixel-oriented visualization

More information

Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov

Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

Visual Mining of E-Customer Behavior Using Pixel Bar Charts

Visual Mining of E-Customer Behavior Using Pixel Bar Charts Visual Mining of E-Customer Behavior Using Pixel Bar Charts Ming C. Hao, Julian Ladisch*, Umeshwar Dayal, Meichun Hsu, Adrian Krug Hewlett Packard Research Laboratories, Palo Alto, CA. (ming_hao, dayal)@hpl.hp.com;

More information

DataPA OpenAnalytics End User Training

DataPA OpenAnalytics End User Training DataPA OpenAnalytics End User Training DataPA End User Training Lesson 1 Course Overview DataPA Chapter 1 Course Overview Introduction This course covers the skills required to use DataPA OpenAnalytics

More information

For example, estimate the population of the United States as 3 times 10⁸ and the

For example, estimate the population of the United States as 3 times 10⁸ and the CCSS: Mathematics The Number System CCSS: Grade 8 8.NS.A. Know that there are numbers that are not rational, and approximate them by rational numbers. 8.NS.A.1. Understand informally that every number

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Towards Event Sequence Representation, Reasoning and Visualization for EHR Data

Towards Event Sequence Representation, Reasoning and Visualization for EHR Data Towards Event Sequence Representation, Reasoning and Visualization for EHR Data Cui Tao Dept. of Health Science Research Mayo Clinic Rochester, MN Catherine Plaisant Human-Computer Interaction Lab ABSTRACT

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

Information Visualization Multivariate Data Visualization Krešimir Matković

Information Visualization Multivariate Data Visualization Krešimir Matković Information Visualization Multivariate Data Visualization Krešimir Matković Vienna University of Technology, VRVis Research Center, Vienna Multivariable >3D Data Tables have so many variables that orthogonal

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Crime Pattern Analysis

Crime Pattern Analysis Crime Pattern Analysis Megaputer Case Study in Text Mining Vijay Kollepara Sergei Ananyan www.megaputer.com Megaputer Intelligence 120 West Seventh Street, Suite 310 Bloomington, IN 47404 USA +1 812-330-01

More information

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA ABSTRACT Current trends in data mining allow the business community to take advantage of

More information

Topographic Change Detection Using CloudCompare Version 1.0

Topographic Change Detection Using CloudCompare Version 1.0 Topographic Change Detection Using CloudCompare Version 1.0 Emily Kleber, Arizona State University Edwin Nissen, Colorado School of Mines J Ramón Arrowsmith, Arizona State University Introduction CloudCompare

More information

A Short Introduction on Data Visualization. Guoning Chen

A Short Introduction on Data Visualization. Guoning Chen A Short Introduction on Data Visualization Guoning Chen Data is generated everywhere and everyday Age of Big Data Data in ever increasing sizes need an effective way to understand them History of Visualization

More information

ifinder ENTERPRISE SEARCH

ifinder ENTERPRISE SEARCH DATA SHEET ifinder ENTERPRISE SEARCH ifinder - the Enterprise Search solution for company-wide information search, information logistics and text mining. CUSTOMER QUOTE IntraFind stands for high quality

More information

Principles of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015

Principles of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015 Principles of Data Visualization for Exploratory Data Analysis Renee M. P. Teate SYS 6023 Cognitive Systems Engineering April 28, 2015 Introduction Exploratory Data Analysis (EDA) is the phase of analysis

More information

Data Mining Techniques and Opportunities for Taxation Agencies

Data Mining Techniques and Opportunities for Taxation Agencies Data Mining Techniques and Opportunities for Taxation Agencies Florida Consultant In This Session... You will learn the data mining techniques below and their application for Tax Agencies ABC Analysis

More information

Fight fire with fire when protecting sensitive data

Fight fire with fire when protecting sensitive data Fight fire with fire when protecting sensitive data White paper by Yaniv Avidan published: January 2016 In an era when both routine and non-routine tasks are automated such as having a diagnostic capsule

More information

DeCyder Extended Data Analysis module Version 1.0

DeCyder Extended Data Analysis module Version 1.0 GE Healthcare DeCyder Extended Data Analysis module Version 1.0 Module for DeCyder 2D version 6.5 User Manual Contents 1 Introduction 1.1 Introduction... 7 1.2 The DeCyder EDA User Manual... 9 1.3 Getting

More information

Using Tableau Software with Hortonworks Data Platform

Using Tableau Software with Hortonworks Data Platform Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data

More information

Face detection is a process of localizing and extracting the face region from the

Face detection is a process of localizing and extracting the face region from the Chapter 4 FACE NORMALIZATION 4.1 INTRODUCTION Face detection is a process of localizing and extracting the face region from the background. The detected face varies in rotation, brightness, size, etc.

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Machine Learning and Data Mining. Fundamentals, robotics, recognition Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

Consumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis

Consumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis Consumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis (Version 1.17) For validation Document version 0.1 7/7/2014 Contents What is SAP Predictive Analytics?... 3

More information

Data Visualization Techniques

Data Visualization Techniques Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The

More information