FOUNDATIONAL SYSTEMS BRIDGING DATA MINING AND VISUAL ANALYTICS

Size: px
Start display at page:

Download "FOUNDATIONAL SYSTEMS BRIDGING DATA MINING AND VISUAL ANALYTICS"

Transcription

1 JAEGUL CHOO RESEARCH STATEMENT My primary research goal is to develop new methods and systems that firmly unify data mining and visual analytics for solving challenging problems in big data. Data mining has long been proposing scalable methods for big data. However, real-world data may not necessarily follow the assumptions and conditions required by these methods. Furthermore, given data, users often have little or no idea as to what problems to solve, making existing methods less useful. Visual analytics, a newly emerging discipline, can handle these situations by allowing users to explore and understand data via interactive visualization. However, visual analytics cannot easily accommodate big data due to the limited scalability in terms of human perception and computer screen space. An ideal solution is to combine these two complementary disciplines. Data mining methods can solve the scalability issue in visual analytics by summarizing large-scale complex data and extract intelligent information beyond raw data. Visual analytics can provide users with intuitive visual access to data mining outputs as well as interactive control over data mining methods for users intended tasks. In fact, the two areas have had little amalgamation so far. Based on my research across both, I think the main hurdles lie in (1) difficulties in understanding and interacting with data mining methods and their outputs and (2) significant computational time required by the methods. My research intends to remedy these issues via the following interrelated threads: (1) a foundational visual analytics system providing an easy access to a wide variety of data mining methods, (2) novel methodologies achieving flexible interactivity and real-time response of data mining methods, and (3) scalable visual analytics systems targeting real-world domains. Below I describe specific projects in each thread. FOUNDATIONAL SYSTEMS BRIDGING DATA MINING AND VISUAL ANALYTICS Big data, e.g., text documents, images, and biological data, are often represented in a high-dimensional space. In visual analytics for large-scale high-dimensional data, dimension reduction and clustering are key techniques in that the former visualizes high-dimensional data in a 2D/3D space while the latter reduces numerous data items to a small number of groups. Recent advancement in these methods from data mining and machine learning communities has not been fully transferred to many real-world applications. The Testbed system [1] is a foundational visual analytics system to fill this gap. It integrates more than 20 dimension reduction methods, including the two-stage methods I developed [2], and about 10 clustering methods, allowing users to effortlessly apply different methods to their own data and perform analysis with the most suitable methods. In order to facilitate intuitive comparisons, the system also offers aligning capabilities between outputs from different methods based on manifold alignment techniques [3]. The impact of the Testbed system is two-fold. First, it works as a base for experimenting and improving new dimension reduction and clustering methods in a visual analytic environment. Because of the flexible software Fig 1. The Testbed system providing a visual overview architecture of the system, one can seamlessly integrate and and data details on demand. evaluate new methods [4]. Second, it can be applied to a wide range of applications and provide deep insight

2 about data. For instance, I applied the system to a novel domain of protein disorder prediction [5], where the obtained knowledge via interactive visualization significantly improved the prediction performance over stateof-the-art methods. The system is currently being applied to many other domains such as healthcare and computer network in collaboration with Samsung Electronics and Prof. Nick Feamster at Georgia Tech. DATA MINING METHODS SUPPORTING FLEXIBLE AND REAL-TIME INTERACTIONS Significant noise in real-world data often causes data mining methods to generate unsatisfactory results. Being able to interact with the methods and the data is critical in steering the results in users own way to obtain the most meaningful output. However, most methods are not designed for incorporating the various needs of users. In addition, interaction with the methods may be inefficient since it is slow to compute them repetitively. Thus, I developed novel data mining methods and their integration framework in visual analytics for flexible and real-time interaction support. p-isomap. An essential interaction with data mining methods is to change their parameters. To make this interaction fast, I have proposed a dynamic parametric updating algorithm for a widely-used dimension reduction method, ISOMAP [4]. The proposed approach involves sophisticated algorithmic modules, such as efficient shortest-path update due to edge addition/removal, and it has achieved up to around 100x speed-up compared to the original ISOMAP. PIVE. I also developed a fundamental methodology called PIVE [6, 7], a Per-Iteration Visualization Environment, which enables continuous real-time interactions with data mining methods. PIVE exploits the fact that many modern data mining algorithms run iteratively until convergence and major changes in the solution occur mostly during an early stage of iterations. Motivated by this idea, PIVE visualizes intermediate results from algorithm iterations in real time, during which users can efficiently interact with the method without having to wait until its convergence. PIVE has great impact in that it changes a paradigm of interacting with data mining methods since in the past such continuous interactions in real time have been considered impractical due to the methods running too slow. To demonstrate the advantage with actual methods, we recently developed user interaction capabilities such as re-position of data items and cluster splitting/merging in t-distributed stochastic neighborhood embedding, k-means, and latent Dirichlet allocation [7]. Weakly Supervised NMF. Nonnegative matrix factorization (NMF) is a popular method in data mining tasks including clustering, collaborative filtering, outlier detection, etc. Weakly-supervised NMF (WS-NMF) [8] is a novel method that supports various user interactions in the context of clustering and topic modeling. Unlike other semi-supervised methods, the underlying philosophy in WS-NMF is to reflect semantically meaningful user feedback from users viewpoints instead of requiring method-centric constraints. We demonstrated the capabilities of WS-NMF, such as incorporating information from other sources, exemplar data items, and features of interest. This work is currently under review in a DMKD journal, and it has also led us to an interactive topic modeling system called UTOPIAN [9]. REAL-WORLD VISUAL ANALYTICS SYSTEMS Based on the above-mentioned foundational research, I have built mature visual analytics systems in diverse real-world applications. First, I have focused on two representative machine learning tasks: classification and clustering. These tasks are usually performed in a fully automated manner, but in practice, many algorithms do not properly handle noisy real-world data. ivisclassifier [10] and ivisclustering [11] are the systems that leverage human-in-the-loop processes in classification (e.g., facial recognition) and clustering (e.g., document

3 clustering), respectively. ivisclassifier, which uses regularized linear discriminant analysis to visualize data with class information, allows users to visually analyze the relationships between classes and interactively improve classifier performance. ivisclustering, by enhancing latent Dirichlet allocation (LDA), a popular document topic modeling method, supports various important interactions such as cluster keyword refinement and hierarchical cluster management. More recently, I have proposed a system called UTOPIAN (User-driven Topic modeling based on Interactive NMF) [9]. In general, it is burdensome, given a large-scale document corpus, to go through individual documents to make sense of them and find out those of users interest. Topic modeling is useful in this context, but derived topics are often unclear for real-world data. As a way to tackle this fundamental problem, UTOPIAN provides useful interaction capabilities in topic modeling, such as topic merging/splitting and topic creation via seed Fig 2. The UTOPIAN system visualizing a topic summary with various interaction capabilities. keywords/documents. This work also highlights the important advantages of NMF over LDA in terms of algorithmic consistency against noisy document data. Furthermore, the interactions offered by UTOPIAN are performed efficiently owing to the PIVE framework incorporated. Since UTOPIAN has been published in VAST 13/TVCG [9], the novel idea of bringing NMF in the visual analytics context has received enormous interest from many researchers, which has opened up collaboration opportunities with the research groups of Prof. Daniel Keim at University of Konstanz, Prof. Niklaus Elmqvist at Purdue University, and other researchers. RESEARCH AGENDA My long-term goal is to develop methods and systems that take the best advantage of both data mining and visual analytics for big data leveraging computational methods to sift through huge data to reveal underlying insight and enabling humans to exploit their visual perception and intuition to delve into data. Although I have taken the first steps toward this goal with my previous research, I plan to broaden and deepen this investigation, including both fundamental re-design of computational methods and application of visual analytics to unexplored domains. In the following, I describe a few of my research directions. Scaling up Visual Analytics. My future research will proactively scale up visual analytics. The scalability issues arise from the two perspectives: back-end computation and front-end interactive visualization. For the former, data mining methods have to scale up for large-scale data. On-going efforts include parallel distributed NMF algorithms that I currently work on as a co-pi of the DARPA XDATA project. For the latter, visual analytics systems should support fast interactive visualization of numerous data items. For example, an interactive visual document recommender system, VisIRR [12], which I am currently developing, handles about half a million documents. I plan to further explore various research problems in scalable visual analytics. Revolutionizing Computing Paradigms in Visual Analytics. Considering data mining methods are not originally designed for visual analytics, exploiting inherent characteristics of visual analytics could significantly decrease computational time. My future research will highly harness the fact that the human perception and screen space do not require fully accurate results from computations. I envision a completely new paradigm that allows computational methods to immediately generate approximate solutions and incrementally refine them until

4 users are satisfied. To realize this idea, I am looking into literature from other fields, e.g., adaptive mesh refinement in numerical analysis and wavelet transformation in image processing. I have recently published some of the promising results [6], and I will continue this investigation in my future research. Visualizing the Quality of Computational Output. When humans face computational outputs, it is crucial to inform them of the output quality. For instance, in dimension reduction, the output quality corresponds to how well given relationships are preserved in a low-dimensional space. In clustering, it would be how clear and meaningful the resulting clusters are. This notion of output quality can be further applied at different levels of an individual data item, a cluster, and a data set. The current practice of plugging data mining methods into visual analytics does not effectively reveal such information. However, a poor quality of an output could significantly mislead subsequent analyses. A 2D snapshot of high-dimensional data severely distorting their original relationships would not be helpful towards understanding data. Clusters computed from data with no real clusters, e.g., uniformly distributed data, do not convey any meaningful information. My research will focus on how to visualize this quality information along with the output to properly guide humans analyses. Building Visual Analytics for Data Comparison and Contrast. At the heart of analysis tasks is to compare and contrast between different data groups for acquiring comprehensive knowledge. I plan to develop fundamental data mining methods and visual analytics systems to support these analyses. One method I am currently working on is joint-discriminative topic modeling using NMF, which simultaneously identifies both common and distinct topics among multiple document data sets. Equipping it with a highly interactive visual environment, where users can dynamically create and compare between multiple data groups, will be a promising research direction. I, together with Prof. Haesun Park (Georgia Tech) and Prof. Chandan Reddy (Wayne State University), am preparing to submit an NSF proposal based on this idea on January Broadening Real-world Impact. I will continuously widen the real-world applicability of my research. I plan to carry this out by (1) pioneering novel domains and (2) developing web-based systems. For example, I have recently analyzed novel social media data about nonprofit micro-financing activities available at Kiva.org. This work, the papers about which were accepted in WSDM 14 [13] and WWW 14 [14], is one of the very first studies that applied machine learning techniques in this domain. I plan to perform deeper analysis on this application using visual analytics approaches as well. On the other hand, I am currently extending my visual analytics systems to web-based systems. Collaborating with Georgia Tech Research Institute, a web-based version of the Testbed system is under active development. Additionally, I am collaborating with Prof. Ji Soo Yi (Purdue University) and Dr. Bum Chul Kwon (University of Konstanz) in the project of building a website ( where users can interactively label positive and negative aspects with rich contents when writing reviews or answers. In this project, we also plan to integrate interactive topic modeling capabilities of UTOPIAN for the visual summary of reviews/answers. Cross-disciplinary research between data mining and visual analytics has given me deep interest and motivation, and I still see its tremendous potentials for big data. I have collaborated with more than 40 researchers and engineers in universities, national labs, and companies, who have constantly inspired me with new ideas and directions. I am also involved with various research funding proposals for NSF, DARPA, NIH, ONR, and industry. For example, we recently received $2.7 million award from the DARPA XDATA program for big data. In conclusion, my research seeks to find new methods and systems synthesizing data mining and visual analytics to accomplish interactive in-depth analysis of big data. I hope my unique experiences and insights spanning both fields to further grow, proving the true value of such synthesis.

5 SELECTED REFERENCES 1. An Interactive Visual Testbed System for Dimension Reduction and Clustering of Large-Scale High-Dimensional Data. Jaegul Choo, Hanseung Lee, Zhicheng Liu, John T. Stasko, Haesun Park. SPIE Conference on Visualization and Data Analysis (VDA) Software is available at 2. Two-stage Framework for Visualization of Clustered High- Dimensional Data. Jaegul Choo, Shawn Bohn, Haesun Park. IEEE Symposium on Visual Analytics Science and Technology (VAST) Heterogeneous Data Fusion via Space Alignment Using Nonmetric Multidimensional Scaling. Jaegul Choo, Shawn Bohn, Grant C. Nakamura, Amanda M. White, Haesun Park. SIAM International Conference on Data Mining (SDM) p-isomap: An Efficient Parametric Update for ISOMAP for Visual Analytics. Jaegul Choo, Hanseung Lee, Chandan K. Reddy, Haesun Park. SIAM International Conference on Data Mining (SDM) A Visual Analytics Approach for Protein Disorder Prediction. Jaegul Choo, Fuxin Li, Kihyung Joo, Haesun Park. SIAM Expanding the Frontiers of Visual Analytics and Visualization (Book Chapter) Screen Space- and Perception-Based Framework for Efficient Computational Algorithms in Large-Scale Visual Analytics. Jaegul Choo, Haesun Park. IEEE Computer Graphics and Applications (CG&A) PIVE: A Per-Iteration Visualization Environment for Supporting Real-time Interactions with Computational Methods. Jaegul Choo, Changhyun Lee, Haesun Park. Technical Report, Georgia Institute of Technology, Weakly Supervised Nonnegative Matrix Factorization for User-driven Clustering. Jaegul Choo, Changhyun Lee, Chandan K. Reddy, Haesun Park. Data Mining and Knowledge Discovery (DMKD) 2013, Under Review. 9. UTOPIAN: User-driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization. Jaegul Choo, Changhyun Lee, Chandan K. Reddy, Haesun Park. IEEE Transactions on Visualization and Computer Graphics (TVCG) ivisclassifier: An Interactive Visual Analytics System for Classification based on Supervised Dimension Reduction. Jaegul Choo, Hanseung Lee, Jaeyeon Kihm, Haesun Park. IEEE Conference on Visual Analytics Science and Technology (VAST) ivisclustering: An Interactive Visual Clustering for Documents via Topic Modeling. Hanseung Lee, Jaeyeon Kihm, Jaegul Choo, John T. Stasko, Haesun Park. Computer Graphics Forum (CGF) VisIRR: Interactive Visual Information Retrieval and Recommendation for Large-Scale Document Data. Jaegul Choo, Changhyun Lee, Edward Clarkson, Zhicheng Liu, Hanseung Lee, Duen Horng (Polo) Chau,,Fuxin Li, Ramakrishnan Kannan, Charles D. Stolper, David Inouye, Nishant Mehta,,Hua Ouyang, Subhojit Som, Alexander Gray, John T. Stasko, and Haesun Park. Computer Graphics Forum (Eurovis / CGF) 2014, Under Review. 13. Understanding and Promoting Micro-finance Activities in Kiva.org. Jaegul Choo, Changhyun Lee, Daniel Lee, Hongyuan Zha, Haesun Park. ACM Conference on Web Search and Data Mining (WSDM) 2014, Accepted. 14. To Gather Together for a Better World: Understanding and Leveraging Communities in Micro-Lending Recommendation. Jaegul Choo, Daniel Lee, Bistra Dilkina, Hongyuan Zha, Haesun Park. International Conference on World Wide Web (WWW) 2014, Accepted.

How To Make Visual Analytics With Big Data Visual

How To Make Visual Analytics With Big Data Visual Big-Data Visualization Customizing Computational Methods for Visual Analytics with Big Data Jaegul Choo and Haesun Park Georgia Tech O wing to the complexities and obscurities in large-scale datasets (

More information

Visual Analytics for Large-scale High Dimensional Data: from Algorithms to Software Systems

Visual Analytics for Large-scale High Dimensional Data: from Algorithms to Software Systems Visual Analytics for Large-scale High Dimensional Data: from Algorithms to Software Systems Haesun Park School of Computational Science and Engineering Georgia Institute of Technology Atlanta, GA, U.S.A.

More information

Information Visualization WS 2013/14 11 Visual Analytics

Information Visualization WS 2013/14 11 Visual Analytics 1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and

More information

Visual Analytics: Combining Automated Discovery with Interactive Visualizations

Visual Analytics: Combining Automated Discovery with Interactive Visualizations Visual Analytics: Combining Automated Discovery with Interactive Visualizations Daniel A. Keim, Florian Mansmann, Daniela Oelke, and Hartmut Ziegler University of Konstanz, Germany first.lastname@uni-konstanz.de,

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

The Value of Visualization for Understanding Data and Making Decisions

The Value of Visualization for Understanding Data and Making Decisions September 24, 2014 The Value of Visualization for Understanding Data and Making Decisions John Stasko School of Interactive Computing Georgia Institute of Technology stasko@cc.gatech.edu JISIC 2014 Data

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

A Framework of User-Driven Data Analytics in the Cloud for Course Management

A Framework of User-Driven Data Analytics in the Cloud for Course Management A Framework of User-Driven Data Analytics in the Cloud for Course Management Jie ZHANG 1, William Chandra TJHI 2, Bu Sung LEE 1, Kee Khoon LEE 2, Julita VASSILEVA 3 & Chee Kit LOOI 4 1 School of Computer

More information

Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University

Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum

More information

Visual Analytics and Information Fusion

Visual Analytics and Information Fusion Visual Analytics and Information Fusion Data in many real world applications may arise from multiple sources, and can be viewed from different aspects. It is a significant analytical challenge to extract

More information

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.

More information

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II

More information

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

More information

How To Create A Multidimensional Data Projection

How To Create A Multidimensional Data Projection Eurographics Conference on Visualization (EuroVis) (2013) M. Hlawitschka and T. Weinkauf (Editors) Short Papers Interactive Visualization and Feature Transformation for Multidimensional Data Projection

More information

NStreamAware: Real-Time Visual Analytics for Data Streams to Enhance Situational Awareness

NStreamAware: Real-Time Visual Analytics for Data Streams to Enhance Situational Awareness Symposium on Visualization for Cyber Security (VizSec 2014) 10th November 2014, Paris, France NStreamAware: Real-Time Visual Analytics for Data Streams to Enhance Situational Awareness Fabian Fischer and

More information

A Knowledge Management Framework Using Business Intelligence Solutions

A Knowledge Management Framework Using Business Intelligence Solutions www.ijcsi.org 102 A Knowledge Management Framework Using Business Intelligence Solutions Marwa Gadu 1 and Prof. Dr. Nashaat El-Khameesy 2 1 Computer and Information Systems Department, Sadat Academy For

More information

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

WebSphere Business Modeler

WebSphere Business Modeler Discovering the Value of SOA WebSphere Process Integration WebSphere Business Modeler Workshop SOA on your terms and our expertise Soudabeh Javadi Consulting Technical Sales Support WebSphere Process Integration

More information

Understanding the Value of In-Memory in the IT Landscape

Understanding the Value of In-Memory in the IT Landscape February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to

More information

Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

More information

IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS

IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS V.Sudhakar 1 and G. Draksha 2 Abstract:- Collective behavior refers to the behaviors of individuals

More information

Data Mining System, Functionalities and Applications: A Radical Review

Data Mining System, Functionalities and Applications: A Radical Review Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

Data Mining Algorithms and Techniques Research in CRM Systems

Data Mining Algorithms and Techniques Research in CRM Systems Data Mining Algorithms and Techniques Research in CRM Systems ADELA TUDOR, ADELA BARA, IULIANA BOTHA The Bucharest Academy of Economic Studies Bucharest ROMANIA {Adela_Lungu}@yahoo.com {Bara.Adela, Iuliana.Botha}@ie.ase.ro

More information

Envisioning a Future for Public Health Knowledge Management

Envisioning a Future for Public Health Knowledge Management Envisioning a Future for Public Health Knowledge Management By Cadence Group Public health today faces challenges and opportunities of a degree that it has never seen before. Never before have methods

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

Data Discovery, Analytics, and the Enterprise Data Hub

Data Discovery, Analytics, and the Enterprise Data Hub Data Discovery, Analytics, and the Enterprise Data Hub Version: 101 Table of Contents Summary 3 Used Data and Limitations of Legacy Analytic Architecture 3 The Meaning of Data Discovery & Analytics 4 Machine

More information

Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets

Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets Aaditya Prakash, Infosys Limited aaadityaprakash@gmail.com Abstract--Self-Organizing

More information

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,

More information

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,

More information

BUSINESS INTELLIGENCE

BUSINESS INTELLIGENCE BUSINESS INTELLIGENCE Microsoft Dynamics NAV BUSINESS INTELLIGENCE Driving better business performance for companies with changing needs White Paper Date: January 2007 www.microsoft.com/dynamics/nav Table

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

Animation. Intelligence. Business. Computer. Areas of Focus. Master of Science Degree Program

Animation. Intelligence. Business. Computer. Areas of Focus. Master of Science Degree Program Business Intelligence Computer Animation Master of Science Degree Program The Bachelor explosive of growth Science of Degree from the Program Internet, social networks, business networks, as well as the

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

A Review of Data Mining Techniques

A Review of Data Mining Techniques Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

SUSTAINING COMPETITIVE DIFFERENTIATION

SUSTAINING COMPETITIVE DIFFERENTIATION SUSTAINING COMPETITIVE DIFFERENTIATION Maintaining a competitive edge in customer experience requires proactive vigilance and the ability to take quick, effective, and unified action E M C P e r s pec

More information

Research Statement Immanuel Trummer www.itrummer.org

Research Statement Immanuel Trummer www.itrummer.org Research Statement Immanuel Trummer www.itrummer.org We are collecting data at unprecedented rates. This data contains valuable insights, but we need complex analytics to extract them. My research focuses

More information

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph

More information

Clustering Technique in Data Mining for Text Documents

Clustering Technique in Data Mining for Text Documents Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 1.1 Background The command over cloud computing infrastructure is increasing with the growing demands of IT infrastructure during the changed business scenario of the 21 st Century.

More information

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016 Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00

More information

Reinventing Business Intelligence through Big Data

Reinventing Business Intelligence through Big Data Reinventing Business Intelligence through Big Data Dr. Flavio Villanustre VP, Technology and lead of the Open Source HPCC Systems initiative LexisNexis Risk Solutions Reed Elsevier LEXISNEXIS From RISK

More information

CONNECTING DATA WITH BUSINESS

CONNECTING DATA WITH BUSINESS CONNECTING DATA WITH BUSINESS Big Data and Data Science consulting Business Value through Data Knowledge Synergic Partners is a specialized Big Data, Data Science and Data Engineering consultancy firm

More information

Sentiment Analysis on Big Data

Sentiment Analysis on Big Data SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

Big Data, Physics, and the Industrial Internet! How Modeling & Analytics are Making the World Work Better."

Big Data, Physics, and the Industrial Internet! How Modeling & Analytics are Making the World Work Better. Big Data, Physics, and the Industrial Internet! How Modeling & Analytics are Making the World Work Better." Matt Denesuk! Chief Data Science Officer! GE Software! October 2014! Imagination at work. Contact:

More information

Specific Usage of Visual Data Analysis Techniques

Specific Usage of Visual Data Analysis Techniques Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia

More information

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate

More information

Microsoft Services Exceed your business with Microsoft SharePoint Server 2010

Microsoft Services Exceed your business with Microsoft SharePoint Server 2010 Microsoft Services Exceed your business with Microsoft SharePoint Server 2010 Business Intelligence Suite Alexandre Mendeiros, SQL Server Premier Field Engineer January 2012 Agenda Microsoft Business Intelligence

More information

Personalization of Web Search With Protected Privacy

Personalization of Web Search With Protected Privacy Personalization of Web Search With Protected Privacy S.S DIVYA, R.RUBINI,P.EZHIL Final year, Information Technology,KarpagaVinayaga College Engineering and Technology, Kanchipuram [D.t] Final year, Information

More information

A Survey on Product Aspect Ranking

A Survey on Product Aspect Ranking A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,

More information

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool. International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Machine Learning with MATLAB David Willingham Application Engineer

Machine Learning with MATLAB David Willingham Application Engineer Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the

More information

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014 Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions

More information

An Ontology Based Text Analytics on Social Media

An Ontology Based Text Analytics on Social Media , pp.233-240 http://dx.doi.org/10.14257/ijdta.2015.8.5.20 An Ontology Based Text Analytics on Social Media Pankajdeep Kaur, Pallavi Sharma and Nikhil Vohra GNDU, Regional Campus, GNDU, Regional Campus,

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

How To Write A Summary Of A Review

How To Write A Summary Of A Review PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,

More information

Cisco Data Preparation

Cisco Data Preparation Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and

More information

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Ernst van Waning Senior Sales Engineer May 28, 2010 Agenda SPSS, an IBM Company SPSS Statistics User-driven product

More information

Research of Postal Data mining system based on big data

Research of Postal Data mining system based on big data 3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication

More information

agility made possible

agility made possible SOLUTION BRIEF Flexibility and Choices in Infrastructure Management can IT live up to business expectations with soaring infrastructure complexity and challenging resource constraints? agility made possible

More information

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

More information

Big Data 101: Harvest Real Value & Avoid Hollow Hype

Big Data 101: Harvest Real Value & Avoid Hollow Hype Big Data 101: Harvest Real Value & Avoid Hollow Hype 2 Executive Summary Odds are you are hearing the growing hype around the potential for big data to revolutionize our ability to assimilate and act on

More information

White Paper. Version 1.2 May 2015 RAID Incorporated

White Paper. Version 1.2 May 2015 RAID Incorporated White Paper Version 1.2 May 2015 RAID Incorporated Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively

More information

Internship Opportunities Xerox Research Centre India (XRCI), Bangalore Analytics Research Group

Internship Opportunities Xerox Research Centre India (XRCI), Bangalore Analytics Research Group Analytics Research Group The Analytics Research Group in Xerox Research Centre India (XRCI) is seeking bright Undergraduate, Masters and PhD students for research internships to participate in exciting

More information

A Semantic Marketplace of Peers Hosting Negotiating Intelligent Agents

A Semantic Marketplace of Peers Hosting Negotiating Intelligent Agents A Semantic Marketplace of Peers Hosting Negotiating Intelligent Agents Theodore Patkos and Dimitris Plexousakis Institute of Computer Science, FO.R.T.H. Vassilika Vouton, P.O. Box 1385, GR 71110 Heraklion,

More information

An Implementation of Active Data Technology

An Implementation of Active Data Technology White Paper by: Mario Morfin, PhD Terri Chu, MEng Stephen Chen, PhD Robby Burko, PhD Riad Hartani, PhD An Implementation of Active Data Technology October 2015 In this paper, we build the rationale for

More information

Big Data in Pictures: Data Visualization

Big Data in Pictures: Data Visualization Big Data in Pictures: Data Visualization Huamin Qu Hong Kong University of Science and Technology What is data visualization? Data visualization is the creation and study of the visual representation of

More information

A Framework for End-to-End Proactive Network Management

A Framework for End-to-End Proactive Network Management A Framework for End-to-End Proactive Network Management S. Hariri, Y. Kim, P. Varshney, Department of Electrical Engineering and Computer Science Syracuse University, Syracuse, NY 13244 {hariri, yhkim,varshey}@cat.syr.edu

More information

Text Analytics. A business guide

Text Analytics. A business guide Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

Augmented Search for Software Testing

Augmented Search for Software Testing Augmented Search for Software Testing For Testers, Developers, and QA Managers New frontier in big log data analysis and application intelligence Business white paper May 2015 During software testing cycles,

More information

Interactive Visual Data Analysis in the Times of Big Data

Interactive Visual Data Analysis in the Times of Big Data Interactive Visual Data Analysis in the Times of Big Data Cagatay Turkay * gicentre, City University London Who? Lecturer (Asst. Prof.) in Applied Data Science Started December 2013 @ the gicentre (gicentre.net)

More information

Why your business decisions still rely more on gut feel than data driven insights.

Why your business decisions still rely more on gut feel than data driven insights. Why your business decisions still rely more on gut feel than data driven insights. THERE ARE BIG PROMISES FROM BIG DATA, BUT FEW ARE CONNECTING INSIGHTS TO HIGH CONFIDENCE DECISION-MAKING 85% of Business

More information

Component visualization methods for large legacy software in C/C++

Component visualization methods for large legacy software in C/C++ Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University mcserep@caesar.elte.hu

More information

Visual Analytics. Daniel A. Keim, Florian Mansmann, Andreas Stoffel, Hartmut Ziegler University of Konstanz, Germany http://infovis.uni-konstanz.

Visual Analytics. Daniel A. Keim, Florian Mansmann, Andreas Stoffel, Hartmut Ziegler University of Konstanz, Germany http://infovis.uni-konstanz. Visual Analytics Daniel A. Keim, Florian Mansmann, Andreas Stoffel, Hartmut Ziegler University of Konstanz, Germany http://infovis.uni-konstanz.de SYNONYMS Visual Analysis; Visual Data Analysis; Visual

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Intinno: A Web Integrated Digital Library and Learning Content Management System

Intinno: A Web Integrated Digital Library and Learning Content Management System Intinno: A Web Integrated Digital Library and Learning Content Management System Synopsis of the Thesis to be submitted in Partial Fulfillment of the Requirements for the Award of the Degree of Master

More information

SharePoint for Engineering Document Management & Control

SharePoint for Engineering Document Management & Control SharePoint for Engineering Document Management & Control Managing and controlling engineering documents and drawings with SharePoint A white paper by Cadac Organice BV Date: 01-03-2012 Table of contents

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

Big Data: Rethinking Text Visualization

Big Data: Rethinking Text Visualization Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important

More information

ADVANCED MACHINE LEARNING. Introduction

ADVANCED MACHINE LEARNING. Introduction 1 1 Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures

More information

Politecnico di Torino. Porto Institutional Repository

Politecnico di Torino. Porto Institutional Repository Politecnico di Torino Porto Institutional Repository [Proceeding] NEMICO: Mining network data through cloud-based data mining techniques Original Citation: Baralis E.; Cagliero L.; Cerquitelli T.; Chiusano

More information

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO Agenda Difficult text analytics tasks Feature extraction Bio-inspired

More information

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining Lluis Belanche + Alfredo Vellido Intelligent Data Analysis and Data Mining a.k.a. Data Mining II Office 319, Omega, BCN EET, office 107, TR 2, Terrassa avellido@lsi.upc.edu skype, gtalk: avellido Tels.:

More information

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven

More information

Supply Chains: From Inside-Out to Outside-In

Supply Chains: From Inside-Out to Outside-In Supply Chains: From Inside-Out to Outside-In Table of Contents Big Data and the Supply Chains of the Process Industries The Inter-Enterprise System of Record Inside-Out vs. Outside-In Supply Chain How

More information

Crime Pattern Analysis

Crime Pattern Analysis Crime Pattern Analysis Megaputer Case Study in Text Mining Vijay Kollepara Sergei Ananyan www.megaputer.com Megaputer Intelligence 120 West Seventh Street, Suite 310 Bloomington, IN 47404 USA +1 812-330-01

More information

Supply Chain Platform as a Service: a Cloud Perspective on Business Collaboration

Supply Chain Platform as a Service: a Cloud Perspective on Business Collaboration Supply Chain Platform as a Service: a Cloud Perspective on Business Collaboration Guopeng Zhao 1, 2 and Zhiqi Shen 1 1 Nanyang Technological University, Singapore 639798 2 HP Labs Singapore, Singapore

More information

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator Accelerate your Big Data Strategy Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator Enterprise Data Hub Accelerator enables you to get started rapidly and cost-effectively with

More information

How To Use Neural Networks In Data Mining

How To Use Neural Networks In Data Mining International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

Application of Business Intelligence in Transportation for a Transportation Service Provider

Application of Business Intelligence in Transportation for a Transportation Service Provider Application of Business Intelligence in Transportation for a Transportation Service Provider Mohamed Sheriff Business Analyst Satyam Computer Services Ltd Email: mohameda_sheriff@satyam.com, mail2sheriff@sify.com

More information

Big Data Analytics for Healthcare

Big Data Analytics for Healthcare Big Data Analytics for Healthcare Jimeng Sun Chandan K. Reddy Healthcare Analytics Department IBM TJ Watson Research Center Department of Computer Science Wayne State University 1 Healthcare Analytics

More information

Visualization of large data sets using MDS combined with LVQ.

Visualization of large data sets using MDS combined with LVQ. Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

More information

Questions to be responded to by the firm submitting the application

Questions to be responded to by the firm submitting the application Questions to be responded to by the firm submitting the application Why do you think this project should receive an award? How does it demonstrate: innovation, quality, and professional excellence transparency

More information