Big Data: Rethinking Text Visualization
|
|
- Baldric Reynolds
- 8 years ago
- Views:
Transcription
1 Big Data: Rethinking Text Visualization Dr. Anton Heijs Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important for text analytics as done with the KMX technology of Treparel. Text visualization is most powerfull when it supports understanding complex patterns in data and support decision making. Statistical and machine learning techniques are used to find patterns and relationships that can then be visualized. Classification and clustering are two fundamental approaches in text analytics and the visualization of classified and clustered documents are thus two importent visualization approaches that are discussed here. 1 Introduction: seeing the unseen Visualization is the process of constructing a visual image in the mind to understand the data better. Although this is an accurate description of the word visualization instead of being a mental process the task of visualization has become more and more an external process. The fact that visualization has partly become an external process indicates that a broader definition of the term visualization seems to be needed, such as: Visualization is a method of computing. It transforms the symbolic into the geometric,enabling researchers to observe their simulations and computations. Visualisation offers a method for seeing the unseen. It enriches the process of scientific discovery and fosters profound and unexpected insights. Visualization enables man to comprehend large data sets, data sets which are too large to grasp by mental imagination. Visualization enables the discovery of previous unknown properties of the data set which may not have been anticipated. The perception of these properties or patterns can lead the user to develop new insights. Visualization often reveals inherent problems of the data, for instance errors and artifacts may be readily revealed. Visualization enables both the examination of the large scale features of the data set as well as the local features, allowing the user to see local features in a larger scale reference. Visualization allows the user to form hypothesis based on the (newly) observed phenomena or developed insights. Ideally visualization should be used to provide a means to overview, explore and navigate large multidimensional data sets. Visualization is needed to understand data and data is becoming more imporant but also more complex. We need to understand our data, in the unstructured form and the structured form, and extract all relevant patterns and trends from the data to obtain a clear picture and be able to make well judged decisions. For this we need data analytics and especially data/text mining and machine learning techniques that can analyze large complex data sets. Machine learning algorithms help us to model a pattern in the data set and describe it mathematically which is very powerful since a mathematical description of a pattern or
2 a trend is the most precise way of capturing and describing the data. We can then compare multiple patterns in the data at the level where on can draw conclusions taking into account all relevant information (captured by these models). This is the level where the model provides a meaning (interpretation) in the real world, the semantic level of the pattern or a trend in the data. This is also the level where reasoning on patterns in the data can start. Now what kind of patterns can there exist? This is asking what kind of models can we find when analyzing any kind of data. We can find linear models and non-linear models and the relationships between these models. Their mathematical description is well suited for further analysis but visualization is very powerfull to understand the models by seeing the pattern or trend in the data. The visualization techniques needed to support this are techniques that can show the type of relationships between the data points (documents in the case of text analytics) and these can be supervised (categorical), unsupervised (similarity relationships as in clusters) but also hierarchical relationships or relationships over time. There are visualizations of table data (rows and columns) visualization hierarchical data visualization classified data (categorical) visualization clustered data visualization data over time visualization correlation data visualization The relationships in the data that these visualization techniques can reveal can be linked which is very important for possible conclusions on the data and therefore interaction in one visualization is coupled in the other visualizations in KMX. This is called multiple coupled view visualization and is becoming recently more important since in data analysis one looks at all related data and this also means analyzing multiple data sets combined. In the scientific literature one can find many papers on details behind the visualization techniques we mention here. KMX technology uses these advanced visualizations as part of the analysis pipeline where we also have support for multiple selections of data points (which can be patents, research, legal or news documents) in different visualizations. These multiple coupled views and it basically means that when a user selects one or more documents this is shown in all available visualizations and the interaction is also supported from all visualizations. Figure 1: GUI showing multiple views with different visulizations of a large patent data set Let us first take a brief look at how exactly we arrive at a visualization from the original raw data. The visualization pipeline is the name of the sequence of processes to create a visual representation of data. Before the visualization pipeline is entered a quantity of data is generated either from databases or any other means of data collection. The visualization pipeline basically consists of four steps. Data analysis is the first step in the visualization process which consist of multiple steps in a pipeline. During data analysis the data is prepared for visualization. Basically this means 2
3 (a) (b) Figure 2: a: One example of a treemap showing patents on chemistry. With the interaction of the tree map visualization one of course also needs to have support for drill down into the data. If one want to see all patents in C07 and can update the visualization and show patents deeper in the C07 classification tree. This is one of the strength of the tree map algorithm. Fig b: The above example shows how coloring can be used to show a parameter like the number of patents in a class, shown from green (large number) to black (small number of patents in that sub class). that a number of operations can be performed on the data to make it more suitable for visualization. 1. After completing the data analysis step the raw data has been transformed to data which can be visualized. However this does not mean that all of the data is of interest. Only the portions of the data that are of interest should be visualized and hence the second step in the visualization pipeline is a data selection step to select the data of interest, so only focal data remains in the pipeline. Usually this part of the pipeline features some user-interaction to decide on the sections of interest. 2. Now that has been decided which data is the focus data, the next step is the mapping step of the visualization pipeline. In this part of the pipeline the data is mapped to render-able representations. These representations are geometric primitive like lines, surfaces, points, voxels with certain attributes like color, position, size, transparency, texture etc. 3. After the data mapping all that remains is the final rendering of the geometric data. Rendering is creating an image from a model. Operations performed here are viewing transformations, lighting calculations, hidden surface removal, scan conversion, anti aliasing etc. The final visualization is created and either written to file or displayed on the screen. The resulting visualization should ideally be expressive, effective and appropriate. Expressive meaning that the visualization should only display the relevant information of a data set. It should be effective in such a manner that it complements the users capabilities of perception and the mental image that a user has of the visualization. Finally an appropriate visualization is a visualization in which the efforts of creating the visualization do not outweigh the benefits of the resulting visualization. The first step in the visualization of patent data is often done by searching/filtering the data to extract the patterns text mining can strongly contribute to the visualization. Some important analysis tasks for a user are: the visualization of a document collection to a known set of classes the visualization of a document collection to a unknown set of classes the visualization of a document collection in the context of their hierarchy the visualization of a document collection over time 1.1 Treemap visualization 3
4 The first two task can be implemented using supervised and unsupervised machine learning techniques through which automatic classification and clustering of the data is done. This data is then processed in the visualization pipeline to provide insight in the classified and clustered patent data. Since patent data contains classification codes the data can be hierarchically ordered in for instance the IPC classification. To provide insight in a collection of patent data we also provide an approach to visualize hierarchical patent data using a tree map algorithm. The patent data also contains time stamp data through which a collection of patents can be analyzed over time. For this we implemented a visualization o the change of the number of patents from a patent collection which belong to a patent class over time. Tree mapping is a method for displaying tree-structured data using nested rectangles which provide overview and selection of data points. An example is given in figure 6 below where there are documents in class A and H and in the class A there are three sub categories (A1, A2 and A3) where one is selected and all documents in that class are in shown in red. Within the tree map the user has an overview of the classes and number of patents in those classes for the full collection. With a mouse over he can get additional information about the patent and he can add or remove one or more patents from the currently selected set. When one selects on box one selects on document (such as EP in the example). The tree map visualization is very powerful since in a fixed screen scape the tree mapping algorithm can show all hierarchical data points (patent documents) and provide and an overview and also a good selection mechanism. Figure 3: Combined use of two visualizations in KMX (tree map and clustering) to show the patent data hierarchically (tree map) and unsupervised (3D clustering where the height is the density of the patents especially prominent for the an-organic and organic chemistry) and the color is used to display the pattern in the patent data over time. We can also combine two visualizations, as shown below where the tree map coloring is used to show the patents over three years (2005,2006,2007) and the cluster visualization is showing the same documents but then their similarity as calculated by the machine learning algorithm for clustering in KMX. The clustering of documents helps to analyze a collection of patents and get insight in the natural grouping of the patents. In the cluster visualization, the user can easily select documents by brushing, i.e. selecting them using the 4
5 mouse. By brushing in a cluster or a parallel coordinates visualization the user gets feedback about the selected documents which greatly helps in the selection of documents, which is an example of the mentioned multiple coupled views support. One can use multiple brushes to have a rough selection and a more precise selection which provides the user feedback on a larger selected set of documents and also a smaller set. The use of multiple brushes also helps the analyst to explore the documents directly visualized in a tree map visualization and a visualization of the documents over time. This helps to understand if a brushed set of documents which are close together in a cluster visualization are also hierarchically close together in the tree map visualization. Additionally one can analyze this also over time which provides the user to analyze if documents which are clustered close together are also close together over time. If one wants to check if there is a trend on a certain technology over time this would be a logical way to analyze it and also to explore 1.2 Parallel coordinates visualization When we have a set of documents selected maybe by filtering or brushing (see right cluster image) we can show for the selected set of documents (in the example below the documents on Ebola, SARS and h5n1) the distribution of the classification score. This is done by using three parallel vertical oriented coordinates where the classification score is from 0.0 (bottom) to 100 (top) can be shown for each document and each document is a line going through the three axis. Immediately one can now see the document that are selected on one cluster and that have a high score on one class and a low score on the other classes. This is true in the below shown example of KMX for all classes and shows the high performance of the classifiers. Parallel coordinates is a very general visualization technique and can map multivariate data belonging to text data. Here we have explained it with an example related to clustering, classification and two types of visualizations. When we have classified all patents in KMX we can use the classification scores to calculate the correlation between all patents and visualize this. This provides valuable insights on aspects which one cannot determine in a query based approach, such as shown below. On the vertical and horizontal axes of the correlation visualization (matrix) we have the classification codes (IPC for instance) and therefore the visualization is symmetric. There are documents which are in different classes (like with pesticides) and although they are in different classes the still can share a strong correlation such as shown for patents in class C07K02 and A61K05 that have a correlation coefficient of 0,75 in the visualization below. Seeing where these strong correlation classes are is easy and valuable and this information cannot be determine by a query based approach. Also seeing where there are many of the correlating classes is seen directly in one picture which shows the strength of overview first and details on demand later when using visualizations. When we have classified all patents in KMX we can use the classification scores to calculate the correlation between all patents and visualize this. This provides valuable insights on aspects which one cannot determine in a query based approach, such as shown below. On the vertical and horizontal axes of the correlation visualization (matrix) we have the classification codes (IPC for instance) and therefore the visualization is symmetric. There are documents which are in different classes (like with pesticides) and although they are in different classes the still can share a strong correlation such as shown for patents in class C07K02 and A61K05 that have a correlation coefficient of 0,75 in the visualization below. Seeing where these strong correlation classes are is easy and valuable and this information cannot be determine by a query based approach. Also seeing where there are many of the correlating classes is seen directly in one picture which shows the strength of overview first and details on demand later when using visualizations. 1.3 Visualization of patterns over time in a document set When one wants to understand patent data over time it is valuable to be able to analyse them as part of a class capturing document about the same subject, classifications and concepts. This can be done using classification and/or clustering and then we can visualise the increase or decrease of the patents over time where the band with of the classes show the trends. This is shown below for patent classification classes but can also be done for instance on non patent literature for instance the MESH terms of pubmed documents. 5
6 (a) (b) Figure 4: Parellel coordinates visualization and cluster visualization of 3 Medline clusters (Ebola (purple), H5N1 (blue) and SARS (yellow) ). Figure 5: Here we show the use of parallel coordinates where we sorted the scores for the patents to the most important coordinate classes and the decay shows that all patents belong distinctively to the first shown class (first coordinate) and thereafter to one or two additional classes but dominant. The gray cylinders indicate the number of patents in that range of the classification score which helps to read and interpret the patent data. 6
7 (a) (b) Figure 6: a: Correlation visualization between many patent and their patent classes and Fig b: Trends of the patents over time for different patent classes. 7
8 2 Text Analytics visualizations Treparel s KMX big data text analytics solution is an client server based software platform. The KMX API makes the system open for integration with existing technologies. The client GUI is a native windows application of which a screen shot is shown below. The solution comes as a very flexible and scalable system in terms of performance and system management. Scalability of the solution allows to handle both the growing amount of data as well as the growing complexity of the data at hand at predictable cost. Figure 7: Overview of the KMX Patent Analytics GUI showing patent titles and their lables, the cluster visualization and a section of the full text of a selected patent (see cross hair in the visualization) and the brushes (green and red) indicating the training documents of the classifier. The classification score is shown from blue (positive) to yellow (negative) in the patent landscape. The training documents are indicated by the color of the brushes (green and red) About Treparel Treparel is a leading global software provider in Big Data Text Analytics and Visualization. The KMX platform allows organizations to enhance innovation processes, improve competitive advantage, mitigate litigation risk and cost and manage interactions with customers by gaining insights from numerous sources unstructured data (text, application notes, images, blogs, and patents). Global companies, government agencies, software vendors or data publishers are using Treparel KMX text analysis software to gain faster, reliable, precise insights in large complex unstructured data sets allowing them to make better informed decisions. For more information contact info@treparel.com or go to 8
Visualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationBig Data Text Mining and Visualization. Anton Heijs
Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark
More informationVisual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics
Motivation Visual Data Mining Visualization for Data Mining Huge amounts of information Limited display capacity of output devices Chidroop Madhavarapu CSE 591:Visual Analytics Visual Data Mining (VDM)
More information3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools
Paper by W. F. Cody J. T. Kreulen V. Krishna W. S. Spangler Presentation by Dylan Chi Discussion by Debojit Dhar THE INTEGRATION OF BUSINESS INTELLIGENCE AND KNOWLEDGE MANAGEMENT BUSINESS INTELLIGENCE
More informationWhat is Visualization? Information Visualization An Overview. Information Visualization. Definitions
What is Visualization? Information Visualization An Overview Jonathan I. Maletic, Ph.D. Computer Science Kent State University Visualize/Visualization: To form a mental image or vision of [some
More informationOLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP
Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key
More informationICT Perspectives on Big Data: Well Sorted Materials
ICT Perspectives on Big Data: Well Sorted Materials 3 March 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations in
More informationCourse Syllabus For Operations Management. Management Information Systems
For Operations Management and Management Information Systems Department School Year First Year First Year First Year Second year Second year Second year Third year Third year Third year Third year Third
More informationClustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
More informationCourse 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
More informationHow To Make Sense Of Data With Altilia
HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to
More informationData Mining with SQL Server Data Tools
Data Mining with SQL Server Data Tools Data mining tasks include classification (directed/supervised) models as well as (undirected/unsupervised) models of association analysis and clustering. 1 Data Mining
More informationChapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
More informationChapter 5. Warehousing, Data Acquisition, Data. Visualization
Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives
More informationCustomer Analytics. Turn Big Data into Big Value
Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationCleaned Data. Recommendations
Call Center Data Analysis Megaputer Case Study in Text Mining Merete Hvalshagen www.megaputer.com Megaputer Intelligence, Inc. 120 West Seventh Street, Suite 10 Bloomington, IN 47404, USA +1 812-0-0110
More informationIn this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
More informationSpecific Usage of Visual Data Analysis Techniques
Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia
More informationHierarchical Data Visualization
Hierarchical Data Visualization 1 Hierarchical Data Hierarchical data emphasize the subordinate or membership relations between data items. Organizational Chart Classifications / Taxonomies (Species and
More informationA Short Introduction to Computer Graphics
A Short Introduction to Computer Graphics Frédo Durand MIT Laboratory for Computer Science 1 Introduction Chapter I: Basics Although computer graphics is a vast field that encompasses almost any graphical
More informationTEXT ANALYTICS INTEGRATION
TEXT ANALYTICS INTEGRATION A TELECOMMUNICATIONS BEST PRACTICES CASE STUDY VISION COMMON ANALYTICAL ENVIRONMENT Structured Unstructured Analytical Mining Text Discovery Text Categorization Text Sentiment
More information2015 Workshops for Professors
SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market
More informationExtend Table Lens for High-Dimensional Data Visualization and Classification Mining
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia
More informationDATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
More informationTowards Event Sequence Representation, Reasoning and Visualization for EHR Data
Towards Event Sequence Representation, Reasoning and Visualization for EHR Data Cui Tao Dept. of Health Science Research Mayo Clinic Rochester, MN Catherine Plaisant Human-Computer Interaction Lab ABSTRACT
More informationStatistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees
Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationHow To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
More informationInteractive Data Mining and Visualization
Interactive Data Mining and Visualization Zhitao Qiu Abstract: Interactive analysis introduces dynamic changes in Visualization. On another hand, advanced visualization can provide different perspectives
More informationIntroduction to the Event Analysis and Retention Dilemma
Introduction to the Event Analysis and Retention Dilemma Introduction Companies today are encountering a number of business imperatives that involve storing, managing and analyzing large volumes of event
More informationProfessional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008
Professional Organization Checklist for the Computer Science Curriculum Updates Association of Computing Machinery Computing Curricula 2008 The curriculum guidelines can be found in Appendix C of the report
More informationSAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics
SAP Brief SAP HANA Objectives Transform Your Future with Better Business Insight Using Predictive Analytics Dealing with the new reality Dealing with the new reality Organizations like yours can identify
More informationAn example. Visualization? An example. Scientific Visualization. This talk. Information Visualization & Visual Analytics. 30 items, 30 x 3 values
Information Visualization & Visual Analytics Jack van Wijk Technische Universiteit Eindhoven An example y 30 items, 30 x 3 values I-science for Astronomy, October 13-17, 2008 Lorentz center, Leiden x An
More information3D Data Visualization / Casey Reas
3D Data Visualization / Casey Reas Large scale data visualization offers the ability to see many data points at once. By providing more of the raw data for the viewer to consume, visualization hopes to
More informationA Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities
A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities The first article of this series presented the capability model for business analytics that is illustrated in Figure One.
More informationDMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information
More informationVendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities
Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities April, 2013 gaddsoftware.com Table of content 1. Introduction... 3 2. Vendor briefings questions and answers... 3 2.1.
More informationBig Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014
Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions
More informationMarkerView Software 1.2.1 for Metabolomic and Biomarker Profiling Analysis
MarkerView Software 1.2.1 for Metabolomic and Biomarker Profiling Analysis Overview MarkerView software is a novel program designed for metabolomics applications and biomarker profiling workflows 1. Using
More informationVisualization Techniques in Data Mining
Tecniche di Apprendimento Automatico per Applicazioni di Data Mining Visualization Techniques in Data Mining Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo di Milano
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationData Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
More informationOracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.
Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse
More informationAdobe Insight, powered by Omniture
Adobe Insight, powered by Omniture Accelerating government intelligence to the speed of thought 1 Challenges that analysts face 2 Analysis tools and functionality 3 Adobe Insight 4 Summary Never before
More informationHierarchical Clustering Analysis
Hierarchical Clustering Analysis What is Hierarchical Clustering? Hierarchical clustering is used to group similar objects into clusters. In the beginning, each row and/or column is considered a cluster.
More informationVISUALIZATION. Improving the Computer Forensic Analysis Process through
By SHELDON TEERLINK and ROBERT F. ERBACHER Improving the Computer Forensic Analysis Process through VISUALIZATION The ability to display mountains of data in a graphical manner significantly enhances the
More informationDelivering Smart Answers!
Companion for SharePoint Topic Analyst Companion for SharePoint All Your Information Enterprise-ready Enrich SharePoint, your central place for document and workflow management, not only with an improved
More informationHow In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
More information3D Interactive Information Visualization: Guidelines from experience and analysis of applications
3D Interactive Information Visualization: Guidelines from experience and analysis of applications Richard Brath Visible Decisions Inc., 200 Front St. W. #2203, Toronto, Canada, rbrath@vdi.com 1. EXPERT
More informationNAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju
NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju BIOMETRICS DOCUMENT ANALYSIS PATTERN RECOGNITION 8/24/2015 ICDAR- 2015 2 Towards a Globally Optimal Approach for Learning Deep Unsupervised
More informationA Short Introduction on Data Visualization. Guoning Chen
A Short Introduction on Data Visualization Guoning Chen Data is generated everywhere and everyday Age of Big Data Data in ever increasing sizes need an effective way to understand them History of Visualization
More informationSURVEY REPORT DATA SCIENCE SOCIETY 2014
SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses
More informationDATA MINING AND WAREHOUSING CONCEPTS
CHAPTER 1 DATA MINING AND WAREHOUSING CONCEPTS 1.1 INTRODUCTION The past couple of decades have seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation
More informationTopographic Change Detection Using CloudCompare Version 1.0
Topographic Change Detection Using CloudCompare Version 1.0 Emily Kleber, Arizona State University Edwin Nissen, Colorado School of Mines J Ramón Arrowsmith, Arizona State University Introduction CloudCompare
More informationBioVisualization: Enhancing Clinical Data Mining
BioVisualization: Enhancing Clinical Data Mining Even as many clinicians struggle to give up their pen and paper charts and spreadsheets, some innovators are already shifting health care information technology
More informationPrinciples of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
More informationVisualisatie BMT. Introduction, visualization, visualization pipeline. Arjan Kok Huub van de Wetering (h.v.d.wetering@tue.nl)
Visualisatie BMT Introduction, visualization, visualization pipeline Arjan Kok Huub van de Wetering (h.v.d.wetering@tue.nl) 1 Lecture overview Goal Summary Study material What is visualization Examples
More informationUsing reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management
Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators
More informationTIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:
Table of contents: Access Data for Analysis Data file types Format assumptions Data from Excel Information links Add multiple data tables Create & Interpret Visualizations Table Pie Chart Cross Table Treemap
More informationIntroduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationDataPA OpenAnalytics End User Training
DataPA OpenAnalytics End User Training DataPA End User Training Lesson 1 Course Overview DataPA Chapter 1 Course Overview Introduction This course covers the skills required to use DataPA OpenAnalytics
More informationKeywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
More informationTRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS
9 8 TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS Assist. Prof. Latinka Todoranova Econ Lit C 810 Information technology is a highly dynamic field of research. As part of it, business intelligence
More informationVisual Mining of E-Customer Behavior Using Pixel Bar Charts
Visual Mining of E-Customer Behavior Using Pixel Bar Charts Ming C. Hao, Julian Ladisch*, Umeshwar Dayal, Meichun Hsu, Adrian Krug Hewlett Packard Research Laboratories, Palo Alto, CA. (ming_hao, dayal)@hpl.hp.com;
More informationEPSRC Cross-SAT Big Data Workshop: Well Sorted Materials
EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials 5th August 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationIBM Social Media Analytics
IBM Social Media Analytics Analyze social media data to better understand your customers and markets Highlights Understand consumer sentiment and optimize marketing campaigns. Improve the customer experience
More informationInformation Visualization WS 2013/14 11 Visual Analytics
1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationFight fire with fire when protecting sensitive data
Fight fire with fire when protecting sensitive data White paper by Yaniv Avidan published: January 2016 In an era when both routine and non-routine tasks are automated such as having a diagnostic capsule
More informationManaging a Portfolio of Products
Managing a Portfolio of Products What is product portfolio management? Imagine you have six products. How should you allocate your limited marketing resources among them? Should you invest in each product
More information<no narration for this slide>
1 2 The standard narration text is : After completing this lesson, you will be able to: < > SAP Visual Intelligence is our latest innovation
More informationAdvanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationData Visualization Techniques
Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The
More informationMachine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
More informationSAS VISUAL ANALYTICS AN OVERVIEW OF POWERFUL DISCOVERY, ANALYSIS AND REPORTING
SAS VISUAL ANALYTICS AN OVERVIEW OF POWERFUL DISCOVERY, ANALYSIS AND REPORTING WELCOME TO SAS VISUAL ANALYTICS SAS Visual Analytics is a high-performance, in-memory solution for exploring massive amounts
More informationUsing Tableau Software with Hortonworks Data Platform
Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data
More informationText Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC
Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC 1. Introduction A popular rule of thumb suggests that
More informationGEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING
Geoinformatics 2004 Proc. 12th Int. Conf. on Geoinformatics Geospatial Information Research: Bridging the Pacific and Atlantic University of Gävle, Sweden, 7-9 June 2004 GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL
More informationDigging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA
Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA ABSTRACT Current trends in data mining allow the business community to take advantage of
More informationData Mining Techniques and Opportunities for Taxation Agencies
Data Mining Techniques and Opportunities for Taxation Agencies Florida Consultant In This Session... You will learn the data mining techniques below and their application for Tax Agencies ABC Analysis
More informationSanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a
More informationSituational Awareness Through Network Visualization
CYBER SECURITY DIVISION 2014 R&D SHOWCASE AND TECHNICAL WORKSHOP Situational Awareness Through Network Visualization Pacific Northwest National Laboratory Daniel M. Best Bryan Olsen 11/25/2014 Introduction
More informationClustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
More informationFace detection is a process of localizing and extracting the face region from the
Chapter 4 FACE NORMALIZATION 4.1 INTRODUCTION Face detection is a process of localizing and extracting the face region from the background. The detected face varies in rotation, brightness, size, etc.
More informationHigh Productivity Data Processing Analytics Methods with Applications
High Productivity Data Processing Analytics Methods with Applications Dr. Ing. Morris Riedel et al. Adjunct Associate Professor School of Engineering and Natural Sciences, University of Iceland Research
More informationInformation Visualization Multivariate Data Visualization Krešimir Matković
Information Visualization Multivariate Data Visualization Krešimir Matković Vienna University of Technology, VRVis Research Center, Vienna Multivariable >3D Data Tables have so many variables that orthogonal
More informationMaking confident decisions with the full spectrum of analysis capabilities
IBM Software Business Analytics Analysis Making confident decisions with the full spectrum of analysis capabilities Making confident decisions with the full spectrum of analysis capabilities Contents 2
More informationGerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I
Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Data is Important because it: Helps in Corporate Aims Basis of Business Decisions Engineering Decisions Energy
More informationIntroduction to Computer Graphics
Introduction to Computer Graphics Torsten Möller TASC 8021 778-782-2215 torsten@sfu.ca www.cs.sfu.ca/~torsten Today What is computer graphics? Contents of this course Syllabus Overview of course topics
More informationPrinciples of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015
Principles of Data Visualization for Exploratory Data Analysis Renee M. P. Teate SYS 6023 Cognitive Systems Engineering April 28, 2015 Introduction Exploratory Data Analysis (EDA) is the phase of analysis
More informationData Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over
More informationDeCyder Extended Data Analysis module Version 1.0
GE Healthcare DeCyder Extended Data Analysis module Version 1.0 Module for DeCyder 2D version 6.5 User Manual Contents 1 Introduction 1.1 Introduction... 7 1.2 The DeCyder EDA User Manual... 9 1.3 Getting
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationMath Content by Strand 1
Patterns, Functions, and Change Math Content by Strand 1 Kindergarten Kindergarten students construct, describe, extend, and determine what comes next in repeating patterns. To identify and construct repeating
More informationUSING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS
USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS Koua, E.L. International Institute for Geo-Information Science and Earth Observation (ITC).
More information