FOUNDATIONAL SYSTEMS BRIDGING DATA MINING AND VISUAL ANALYTICS
|
|
- Egbert Chapman
- 8 years ago
- Views:
Transcription
1 JAEGUL CHOO RESEARCH STATEMENT My primary research goal is to develop new methods and systems that firmly unify data mining and visual analytics for solving challenging problems in big data. Data mining has long been proposing scalable methods for big data. However, real-world data may not necessarily follow the assumptions and conditions required by these methods. Furthermore, given data, users often have little or no idea as to what problems to solve, making existing methods less useful. Visual analytics, a newly emerging discipline, can handle these situations by allowing users to explore and understand data via interactive visualization. However, visual analytics cannot easily accommodate big data due to the limited scalability in terms of human perception and computer screen space. An ideal solution is to combine these two complementary disciplines. Data mining methods can solve the scalability issue in visual analytics by summarizing large-scale complex data and extract intelligent information beyond raw data. Visual analytics can provide users with intuitive visual access to data mining outputs as well as interactive control over data mining methods for users intended tasks. In fact, the two areas have had little amalgamation so far. Based on my research across both, I think the main hurdles lie in (1) difficulties in understanding and interacting with data mining methods and their outputs and (2) significant computational time required by the methods. My research intends to remedy these issues via the following interrelated threads: (1) a foundational visual analytics system providing an easy access to a wide variety of data mining methods, (2) novel methodologies achieving flexible interactivity and real-time response of data mining methods, and (3) scalable visual analytics systems targeting real-world domains. Below I describe specific projects in each thread. FOUNDATIONAL SYSTEMS BRIDGING DATA MINING AND VISUAL ANALYTICS Big data, e.g., text documents, images, and biological data, are often represented in a high-dimensional space. In visual analytics for large-scale high-dimensional data, dimension reduction and clustering are key techniques in that the former visualizes high-dimensional data in a 2D/3D space while the latter reduces numerous data items to a small number of groups. Recent advancement in these methods from data mining and machine learning communities has not been fully transferred to many real-world applications. The Testbed system [1] is a foundational visual analytics system to fill this gap. It integrates more than 20 dimension reduction methods, including the two-stage methods I developed [2], and about 10 clustering methods, allowing users to effortlessly apply different methods to their own data and perform analysis with the most suitable methods. In order to facilitate intuitive comparisons, the system also offers aligning capabilities between outputs from different methods based on manifold alignment techniques [3]. The impact of the Testbed system is two-fold. First, it works as a base for experimenting and improving new dimension reduction and clustering methods in a visual analytic environment. Because of the flexible software Fig 1. The Testbed system providing a visual overview architecture of the system, one can seamlessly integrate and and data details on demand. evaluate new methods [4]. Second, it can be applied to a wide range of applications and provide deep insight
2 about data. For instance, I applied the system to a novel domain of protein disorder prediction [5], where the obtained knowledge via interactive visualization significantly improved the prediction performance over stateof-the-art methods. The system is currently being applied to many other domains such as healthcare and computer network in collaboration with Samsung Electronics and Prof. Nick Feamster at Georgia Tech. DATA MINING METHODS SUPPORTING FLEXIBLE AND REAL-TIME INTERACTIONS Significant noise in real-world data often causes data mining methods to generate unsatisfactory results. Being able to interact with the methods and the data is critical in steering the results in users own way to obtain the most meaningful output. However, most methods are not designed for incorporating the various needs of users. In addition, interaction with the methods may be inefficient since it is slow to compute them repetitively. Thus, I developed novel data mining methods and their integration framework in visual analytics for flexible and real-time interaction support. p-isomap. An essential interaction with data mining methods is to change their parameters. To make this interaction fast, I have proposed a dynamic parametric updating algorithm for a widely-used dimension reduction method, ISOMAP [4]. The proposed approach involves sophisticated algorithmic modules, such as efficient shortest-path update due to edge addition/removal, and it has achieved up to around 100x speed-up compared to the original ISOMAP. PIVE. I also developed a fundamental methodology called PIVE [6, 7], a Per-Iteration Visualization Environment, which enables continuous real-time interactions with data mining methods. PIVE exploits the fact that many modern data mining algorithms run iteratively until convergence and major changes in the solution occur mostly during an early stage of iterations. Motivated by this idea, PIVE visualizes intermediate results from algorithm iterations in real time, during which users can efficiently interact with the method without having to wait until its convergence. PIVE has great impact in that it changes a paradigm of interacting with data mining methods since in the past such continuous interactions in real time have been considered impractical due to the methods running too slow. To demonstrate the advantage with actual methods, we recently developed user interaction capabilities such as re-position of data items and cluster splitting/merging in t-distributed stochastic neighborhood embedding, k-means, and latent Dirichlet allocation [7]. Weakly Supervised NMF. Nonnegative matrix factorization (NMF) is a popular method in data mining tasks including clustering, collaborative filtering, outlier detection, etc. Weakly-supervised NMF (WS-NMF) [8] is a novel method that supports various user interactions in the context of clustering and topic modeling. Unlike other semi-supervised methods, the underlying philosophy in WS-NMF is to reflect semantically meaningful user feedback from users viewpoints instead of requiring method-centric constraints. We demonstrated the capabilities of WS-NMF, such as incorporating information from other sources, exemplar data items, and features of interest. This work is currently under review in a DMKD journal, and it has also led us to an interactive topic modeling system called UTOPIAN [9]. REAL-WORLD VISUAL ANALYTICS SYSTEMS Based on the above-mentioned foundational research, I have built mature visual analytics systems in diverse real-world applications. First, I have focused on two representative machine learning tasks: classification and clustering. These tasks are usually performed in a fully automated manner, but in practice, many algorithms do not properly handle noisy real-world data. ivisclassifier [10] and ivisclustering [11] are the systems that leverage human-in-the-loop processes in classification (e.g., facial recognition) and clustering (e.g., document
3 clustering), respectively. ivisclassifier, which uses regularized linear discriminant analysis to visualize data with class information, allows users to visually analyze the relationships between classes and interactively improve classifier performance. ivisclustering, by enhancing latent Dirichlet allocation (LDA), a popular document topic modeling method, supports various important interactions such as cluster keyword refinement and hierarchical cluster management. More recently, I have proposed a system called UTOPIAN (User-driven Topic modeling based on Interactive NMF) [9]. In general, it is burdensome, given a large-scale document corpus, to go through individual documents to make sense of them and find out those of users interest. Topic modeling is useful in this context, but derived topics are often unclear for real-world data. As a way to tackle this fundamental problem, UTOPIAN provides useful interaction capabilities in topic modeling, such as topic merging/splitting and topic creation via seed Fig 2. The UTOPIAN system visualizing a topic summary with various interaction capabilities. keywords/documents. This work also highlights the important advantages of NMF over LDA in terms of algorithmic consistency against noisy document data. Furthermore, the interactions offered by UTOPIAN are performed efficiently owing to the PIVE framework incorporated. Since UTOPIAN has been published in VAST 13/TVCG [9], the novel idea of bringing NMF in the visual analytics context has received enormous interest from many researchers, which has opened up collaboration opportunities with the research groups of Prof. Daniel Keim at University of Konstanz, Prof. Niklaus Elmqvist at Purdue University, and other researchers. RESEARCH AGENDA My long-term goal is to develop methods and systems that take the best advantage of both data mining and visual analytics for big data leveraging computational methods to sift through huge data to reveal underlying insight and enabling humans to exploit their visual perception and intuition to delve into data. Although I have taken the first steps toward this goal with my previous research, I plan to broaden and deepen this investigation, including both fundamental re-design of computational methods and application of visual analytics to unexplored domains. In the following, I describe a few of my research directions. Scaling up Visual Analytics. My future research will proactively scale up visual analytics. The scalability issues arise from the two perspectives: back-end computation and front-end interactive visualization. For the former, data mining methods have to scale up for large-scale data. On-going efforts include parallel distributed NMF algorithms that I currently work on as a co-pi of the DARPA XDATA project. For the latter, visual analytics systems should support fast interactive visualization of numerous data items. For example, an interactive visual document recommender system, VisIRR [12], which I am currently developing, handles about half a million documents. I plan to further explore various research problems in scalable visual analytics. Revolutionizing Computing Paradigms in Visual Analytics. Considering data mining methods are not originally designed for visual analytics, exploiting inherent characteristics of visual analytics could significantly decrease computational time. My future research will highly harness the fact that the human perception and screen space do not require fully accurate results from computations. I envision a completely new paradigm that allows computational methods to immediately generate approximate solutions and incrementally refine them until
4 users are satisfied. To realize this idea, I am looking into literature from other fields, e.g., adaptive mesh refinement in numerical analysis and wavelet transformation in image processing. I have recently published some of the promising results [6], and I will continue this investigation in my future research. Visualizing the Quality of Computational Output. When humans face computational outputs, it is crucial to inform them of the output quality. For instance, in dimension reduction, the output quality corresponds to how well given relationships are preserved in a low-dimensional space. In clustering, it would be how clear and meaningful the resulting clusters are. This notion of output quality can be further applied at different levels of an individual data item, a cluster, and a data set. The current practice of plugging data mining methods into visual analytics does not effectively reveal such information. However, a poor quality of an output could significantly mislead subsequent analyses. A 2D snapshot of high-dimensional data severely distorting their original relationships would not be helpful towards understanding data. Clusters computed from data with no real clusters, e.g., uniformly distributed data, do not convey any meaningful information. My research will focus on how to visualize this quality information along with the output to properly guide humans analyses. Building Visual Analytics for Data Comparison and Contrast. At the heart of analysis tasks is to compare and contrast between different data groups for acquiring comprehensive knowledge. I plan to develop fundamental data mining methods and visual analytics systems to support these analyses. One method I am currently working on is joint-discriminative topic modeling using NMF, which simultaneously identifies both common and distinct topics among multiple document data sets. Equipping it with a highly interactive visual environment, where users can dynamically create and compare between multiple data groups, will be a promising research direction. I, together with Prof. Haesun Park (Georgia Tech) and Prof. Chandan Reddy (Wayne State University), am preparing to submit an NSF proposal based on this idea on January Broadening Real-world Impact. I will continuously widen the real-world applicability of my research. I plan to carry this out by (1) pioneering novel domains and (2) developing web-based systems. For example, I have recently analyzed novel social media data about nonprofit micro-financing activities available at Kiva.org. This work, the papers about which were accepted in WSDM 14 [13] and WWW 14 [14], is one of the very first studies that applied machine learning techniques in this domain. I plan to perform deeper analysis on this application using visual analytics approaches as well. On the other hand, I am currently extending my visual analytics systems to web-based systems. Collaborating with Georgia Tech Research Institute, a web-based version of the Testbed system is under active development. Additionally, I am collaborating with Prof. Ji Soo Yi (Purdue University) and Dr. Bum Chul Kwon (University of Konstanz) in the project of building a website ( where users can interactively label positive and negative aspects with rich contents when writing reviews or answers. In this project, we also plan to integrate interactive topic modeling capabilities of UTOPIAN for the visual summary of reviews/answers. Cross-disciplinary research between data mining and visual analytics has given me deep interest and motivation, and I still see its tremendous potentials for big data. I have collaborated with more than 40 researchers and engineers in universities, national labs, and companies, who have constantly inspired me with new ideas and directions. I am also involved with various research funding proposals for NSF, DARPA, NIH, ONR, and industry. For example, we recently received $2.7 million award from the DARPA XDATA program for big data. In conclusion, my research seeks to find new methods and systems synthesizing data mining and visual analytics to accomplish interactive in-depth analysis of big data. I hope my unique experiences and insights spanning both fields to further grow, proving the true value of such synthesis.
5 SELECTED REFERENCES 1. An Interactive Visual Testbed System for Dimension Reduction and Clustering of Large-Scale High-Dimensional Data. Jaegul Choo, Hanseung Lee, Zhicheng Liu, John T. Stasko, Haesun Park. SPIE Conference on Visualization and Data Analysis (VDA) Software is available at 2. Two-stage Framework for Visualization of Clustered High- Dimensional Data. Jaegul Choo, Shawn Bohn, Haesun Park. IEEE Symposium on Visual Analytics Science and Technology (VAST) Heterogeneous Data Fusion via Space Alignment Using Nonmetric Multidimensional Scaling. Jaegul Choo, Shawn Bohn, Grant C. Nakamura, Amanda M. White, Haesun Park. SIAM International Conference on Data Mining (SDM) p-isomap: An Efficient Parametric Update for ISOMAP for Visual Analytics. Jaegul Choo, Hanseung Lee, Chandan K. Reddy, Haesun Park. SIAM International Conference on Data Mining (SDM) A Visual Analytics Approach for Protein Disorder Prediction. Jaegul Choo, Fuxin Li, Kihyung Joo, Haesun Park. SIAM Expanding the Frontiers of Visual Analytics and Visualization (Book Chapter) Screen Space- and Perception-Based Framework for Efficient Computational Algorithms in Large-Scale Visual Analytics. Jaegul Choo, Haesun Park. IEEE Computer Graphics and Applications (CG&A) PIVE: A Per-Iteration Visualization Environment for Supporting Real-time Interactions with Computational Methods. Jaegul Choo, Changhyun Lee, Haesun Park. Technical Report, Georgia Institute of Technology, Weakly Supervised Nonnegative Matrix Factorization for User-driven Clustering. Jaegul Choo, Changhyun Lee, Chandan K. Reddy, Haesun Park. Data Mining and Knowledge Discovery (DMKD) 2013, Under Review. 9. UTOPIAN: User-driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization. Jaegul Choo, Changhyun Lee, Chandan K. Reddy, Haesun Park. IEEE Transactions on Visualization and Computer Graphics (TVCG) ivisclassifier: An Interactive Visual Analytics System for Classification based on Supervised Dimension Reduction. Jaegul Choo, Hanseung Lee, Jaeyeon Kihm, Haesun Park. IEEE Conference on Visual Analytics Science and Technology (VAST) ivisclustering: An Interactive Visual Clustering for Documents via Topic Modeling. Hanseung Lee, Jaeyeon Kihm, Jaegul Choo, John T. Stasko, Haesun Park. Computer Graphics Forum (CGF) VisIRR: Interactive Visual Information Retrieval and Recommendation for Large-Scale Document Data. Jaegul Choo, Changhyun Lee, Edward Clarkson, Zhicheng Liu, Hanseung Lee, Duen Horng (Polo) Chau,,Fuxin Li, Ramakrishnan Kannan, Charles D. Stolper, David Inouye, Nishant Mehta,,Hua Ouyang, Subhojit Som, Alexander Gray, John T. Stasko, and Haesun Park. Computer Graphics Forum (Eurovis / CGF) 2014, Under Review. 13. Understanding and Promoting Micro-finance Activities in Kiva.org. Jaegul Choo, Changhyun Lee, Daniel Lee, Hongyuan Zha, Haesun Park. ACM Conference on Web Search and Data Mining (WSDM) 2014, Accepted. 14. To Gather Together for a Better World: Understanding and Leveraging Communities in Micro-Lending Recommendation. Jaegul Choo, Daniel Lee, Bistra Dilkina, Hongyuan Zha, Haesun Park. International Conference on World Wide Web (WWW) 2014, Accepted.
How To Make Visual Analytics With Big Data Visual
Big-Data Visualization Customizing Computational Methods for Visual Analytics with Big Data Jaegul Choo and Haesun Park Georgia Tech O wing to the complexities and obscurities in large-scale datasets (
More informationVisual Analytics for Large-scale High Dimensional Data: from Algorithms to Software Systems
Visual Analytics for Large-scale High Dimensional Data: from Algorithms to Software Systems Haesun Park School of Computational Science and Engineering Georgia Institute of Technology Atlanta, GA, U.S.A.
More informationInformation Visualization WS 2013/14 11 Visual Analytics
1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and
More informationVisual Analytics: Combining Automated Discovery with Interactive Visualizations
Visual Analytics: Combining Automated Discovery with Interactive Visualizations Daniel A. Keim, Florian Mansmann, Daniela Oelke, and Hartmut Ziegler University of Konstanz, Germany first.lastname@uni-konstanz.de,
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
More informationThe Value of Visualization for Understanding Data and Making Decisions
September 24, 2014 The Value of Visualization for Understanding Data and Making Decisions John Stasko School of Interactive Computing Georgia Institute of Technology stasko@cc.gatech.edu JISIC 2014 Data
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationA Framework of User-Driven Data Analytics in the Cloud for Course Management
A Framework of User-Driven Data Analytics in the Cloud for Course Management Jie ZHANG 1, William Chandra TJHI 2, Bu Sung LEE 1, Kee Khoon LEE 2, Julita VASSILEVA 3 & Chee Kit LOOI 4 1 School of Computer
More informationCollective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University
Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum
More informationVisual Analytics and Information Fusion
Visual Analytics and Information Fusion Data in many real world applications may arise from multiple sources, and can be viewed from different aspects. It is a significant analytical challenge to extract
More informationA Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.
More informationDistance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II
More informationData Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority
More informationHow To Create A Multidimensional Data Projection
Eurographics Conference on Visualization (EuroVis) (2013) M. Hlawitschka and T. Weinkauf (Editors) Short Papers Interactive Visualization and Feature Transformation for Multidimensional Data Projection
More informationNStreamAware: Real-Time Visual Analytics for Data Streams to Enhance Situational Awareness
Symposium on Visualization for Cyber Security (VizSec 2014) 10th November 2014, Paris, France NStreamAware: Real-Time Visual Analytics for Data Streams to Enhance Situational Awareness Fabian Fischer and
More informationA Knowledge Management Framework Using Business Intelligence Solutions
www.ijcsi.org 102 A Knowledge Management Framework Using Business Intelligence Solutions Marwa Gadu 1 and Prof. Dr. Nashaat El-Khameesy 2 1 Computer and Information Systems Department, Sadat Academy For
More informationBIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics
BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are
More informationEFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
More informationWebSphere Business Modeler
Discovering the Value of SOA WebSphere Process Integration WebSphere Business Modeler Workshop SOA on your terms and our expertise Soudabeh Javadi Consulting Technical Sales Support WebSphere Process Integration
More informationUnderstanding the Value of In-Memory in the IT Landscape
February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to
More informationDistributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
More informationIJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS
IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS V.Sudhakar 1 and G. Draksha 2 Abstract:- Collective behavior refers to the behaviors of individuals
More informationData Mining System, Functionalities and Applications: A Radical Review
Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially
More informationDatabase Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
More informationClustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
More informationData Mining Algorithms and Techniques Research in CRM Systems
Data Mining Algorithms and Techniques Research in CRM Systems ADELA TUDOR, ADELA BARA, IULIANA BOTHA The Bucharest Academy of Economic Studies Bucharest ROMANIA {Adela_Lungu}@yahoo.com {Bara.Adela, Iuliana.Botha}@ie.ase.ro
More informationEnvisioning a Future for Public Health Knowledge Management
Envisioning a Future for Public Health Knowledge Management By Cadence Group Public health today faces challenges and opportunities of a degree that it has never seen before. Never before have methods
More informationEM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More informationData Discovery, Analytics, and the Enterprise Data Hub
Data Discovery, Analytics, and the Enterprise Data Hub Version: 101 Table of Contents Summary 3 Used Data and Limitations of Legacy Analytic Architecture 3 The Meaning of Data Discovery & Analytics 4 Machine
More informationReconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets
Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation of large unstructured datasets Aaditya Prakash, Infosys Limited aaadityaprakash@gmail.com Abstract--Self-Organizing
More informationText Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,
More informationComparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,
More informationBUSINESS INTELLIGENCE
BUSINESS INTELLIGENCE Microsoft Dynamics NAV BUSINESS INTELLIGENCE Driving better business performance for companies with changing needs White Paper Date: January 2007 www.microsoft.com/dynamics/nav Table
More informationHealthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
More informationAnimation. Intelligence. Business. Computer. Areas of Focus. Master of Science Degree Program
Business Intelligence Computer Animation Master of Science Degree Program The Bachelor explosive of growth Science of Degree from the Program Internet, social networks, business networks, as well as the
More informationSearch Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
More informationA Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More informationSUSTAINING COMPETITIVE DIFFERENTIATION
SUSTAINING COMPETITIVE DIFFERENTIATION Maintaining a competitive edge in customer experience requires proactive vigilance and the ability to take quick, effective, and unified action E M C P e r s pec
More informationResearch Statement Immanuel Trummer www.itrummer.org
Research Statement Immanuel Trummer www.itrummer.org We are collecting data at unprecedented rates. This data contains valuable insights, but we need complex analytics to extract them. My research focuses
More informationMethodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph
More informationClustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
More informationCHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION 1.1 Background The command over cloud computing infrastructure is increasing with the growing demands of IT infrastructure during the changed business scenario of the 21 st Century.
More informationNetwork Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016
Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00
More informationReinventing Business Intelligence through Big Data
Reinventing Business Intelligence through Big Data Dr. Flavio Villanustre VP, Technology and lead of the Open Source HPCC Systems initiative LexisNexis Risk Solutions Reed Elsevier LEXISNEXIS From RISK
More informationCONNECTING DATA WITH BUSINESS
CONNECTING DATA WITH BUSINESS Big Data and Data Science consulting Business Value through Data Knowledge Synergic Partners is a specialized Big Data, Data Science and Data Engineering consultancy firm
More informationSentiment Analysis on Big Data
SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social
More informationSanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a
More informationBig Data, Physics, and the Industrial Internet! How Modeling & Analytics are Making the World Work Better."
Big Data, Physics, and the Industrial Internet! How Modeling & Analytics are Making the World Work Better." Matt Denesuk! Chief Data Science Officer! GE Software! October 2014! Imagination at work. Contact:
More informationSpecific Usage of Visual Data Analysis Techniques
Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia
More informationDATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
More informationMicrosoft Services Exceed your business with Microsoft SharePoint Server 2010
Microsoft Services Exceed your business with Microsoft SharePoint Server 2010 Business Intelligence Suite Alexandre Mendeiros, SQL Server Premier Field Engineer January 2012 Agenda Microsoft Business Intelligence
More informationPersonalization of Web Search With Protected Privacy
Personalization of Web Search With Protected Privacy S.S DIVYA, R.RUBINI,P.EZHIL Final year, Information Technology,KarpagaVinayaga College Engineering and Technology, Kanchipuram [D.t] Final year, Information
More informationA Survey on Product Aspect Ranking
A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,
More informationComparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationMachine Learning with MATLAB David Willingham Application Engineer
Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the
More informationBig Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014
Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions
More informationAn Ontology Based Text Analytics on Social Media
, pp.233-240 http://dx.doi.org/10.14257/ijdta.2015.8.5.20 An Ontology Based Text Analytics on Social Media Pankajdeep Kaur, Pallavi Sharma and Nikhil Vohra GNDU, Regional Campus, GNDU, Regional Campus,
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationHow To Write A Summary Of A Review
PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,
More informationCisco Data Preparation
Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and
More informationDecision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010
Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Ernst van Waning Senior Sales Engineer May 28, 2010 Agenda SPSS, an IBM Company SPSS Statistics User-driven product
More informationResearch of Postal Data mining system based on big data
3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication
More informationagility made possible
SOLUTION BRIEF Flexibility and Choices in Infrastructure Management can IT live up to business expectations with soaring infrastructure complexity and challenging resource constraints? agility made possible
More informationData Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
More informationBig Data 101: Harvest Real Value & Avoid Hollow Hype
Big Data 101: Harvest Real Value & Avoid Hollow Hype 2 Executive Summary Odds are you are hearing the growing hype around the potential for big data to revolutionize our ability to assimilate and act on
More informationWhite Paper. Version 1.2 May 2015 RAID Incorporated
White Paper Version 1.2 May 2015 RAID Incorporated Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively
More informationInternship Opportunities Xerox Research Centre India (XRCI), Bangalore Analytics Research Group
Analytics Research Group The Analytics Research Group in Xerox Research Centre India (XRCI) is seeking bright Undergraduate, Masters and PhD students for research internships to participate in exciting
More informationA Semantic Marketplace of Peers Hosting Negotiating Intelligent Agents
A Semantic Marketplace of Peers Hosting Negotiating Intelligent Agents Theodore Patkos and Dimitris Plexousakis Institute of Computer Science, FO.R.T.H. Vassilika Vouton, P.O. Box 1385, GR 71110 Heraklion,
More informationAn Implementation of Active Data Technology
White Paper by: Mario Morfin, PhD Terri Chu, MEng Stephen Chen, PhD Robby Burko, PhD Riad Hartani, PhD An Implementation of Active Data Technology October 2015 In this paper, we build the rationale for
More informationBig Data in Pictures: Data Visualization
Big Data in Pictures: Data Visualization Huamin Qu Hong Kong University of Science and Technology What is data visualization? Data visualization is the creation and study of the visual representation of
More informationA Framework for End-to-End Proactive Network Management
A Framework for End-to-End Proactive Network Management S. Hariri, Y. Kim, P. Varshney, Department of Electrical Engineering and Computer Science Syracuse University, Syracuse, NY 13244 {hariri, yhkim,varshey}@cat.syr.edu
More informationText Analytics. A business guide
Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application
More informationDynamic Data in terms of Data Mining Streams
International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining
More informationAugmented Search for Software Testing
Augmented Search for Software Testing For Testers, Developers, and QA Managers New frontier in big log data analysis and application intelligence Business white paper May 2015 During software testing cycles,
More informationInteractive Visual Data Analysis in the Times of Big Data
Interactive Visual Data Analysis in the Times of Big Data Cagatay Turkay * gicentre, City University London Who? Lecturer (Asst. Prof.) in Applied Data Science Started December 2013 @ the gicentre (gicentre.net)
More informationWhy your business decisions still rely more on gut feel than data driven insights.
Why your business decisions still rely more on gut feel than data driven insights. THERE ARE BIG PROMISES FROM BIG DATA, BUT FEW ARE CONNECTING INSIGHTS TO HIGH CONFIDENCE DECISION-MAKING 85% of Business
More informationComponent visualization methods for large legacy software in C/C++
Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University mcserep@caesar.elte.hu
More informationVisual Analytics. Daniel A. Keim, Florian Mansmann, Andreas Stoffel, Hartmut Ziegler University of Konstanz, Germany http://infovis.uni-konstanz.
Visual Analytics Daniel A. Keim, Florian Mansmann, Andreas Stoffel, Hartmut Ziegler University of Konstanz, Germany http://infovis.uni-konstanz.de SYNONYMS Visual Analysis; Visual Data Analysis; Visual
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More informationIntinno: A Web Integrated Digital Library and Learning Content Management System
Intinno: A Web Integrated Digital Library and Learning Content Management System Synopsis of the Thesis to be submitted in Partial Fulfillment of the Requirements for the Award of the Degree of Master
More informationSharePoint for Engineering Document Management & Control
SharePoint for Engineering Document Management & Control Managing and controlling engineering documents and drawings with SharePoint A white paper by Cadac Organice BV Date: 01-03-2012 Table of contents
More informationStatistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
More informationBig Data: Rethinking Text Visualization
Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important
More informationADVANCED MACHINE LEARNING. Introduction
1 1 Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures
More informationPolitecnico di Torino. Porto Institutional Repository
Politecnico di Torino Porto Institutional Repository [Proceeding] NEMICO: Mining network data through cloud-based data mining techniques Original Citation: Baralis E.; Cagliero L.; Cerquitelli T.; Chiusano
More informationA Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks
A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO Agenda Difficult text analytics tasks Feature extraction Bio-inspired
More informationClustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
More informationMS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
More informationLluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining
Lluis Belanche + Alfredo Vellido Intelligent Data Analysis and Data Mining a.k.a. Data Mining II Office 319, Omega, BCN EET, office 107, TR 2, Terrassa avellido@lsi.upc.edu skype, gtalk: avellido Tels.:
More informationlarge-scale machine learning revisited Léon Bottou Microsoft Research (NYC)
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
More informationSupply Chains: From Inside-Out to Outside-In
Supply Chains: From Inside-Out to Outside-In Table of Contents Big Data and the Supply Chains of the Process Industries The Inter-Enterprise System of Record Inside-Out vs. Outside-In Supply Chain How
More informationCrime Pattern Analysis
Crime Pattern Analysis Megaputer Case Study in Text Mining Vijay Kollepara Sergei Ananyan www.megaputer.com Megaputer Intelligence 120 West Seventh Street, Suite 310 Bloomington, IN 47404 USA +1 812-330-01
More informationSupply Chain Platform as a Service: a Cloud Perspective on Business Collaboration
Supply Chain Platform as a Service: a Cloud Perspective on Business Collaboration Guopeng Zhao 1, 2 and Zhiqi Shen 1 1 Nanyang Technological University, Singapore 639798 2 HP Labs Singapore, Singapore
More informationAccelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator
Accelerate your Big Data Strategy Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator Enterprise Data Hub Accelerator enables you to get started rapidly and cost-effectively with
More informationHow To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
More informationApplication of Business Intelligence in Transportation for a Transportation Service Provider
Application of Business Intelligence in Transportation for a Transportation Service Provider Mohamed Sheriff Business Analyst Satyam Computer Services Ltd Email: mohameda_sheriff@satyam.com, mail2sheriff@sify.com
More informationBig Data Analytics for Healthcare
Big Data Analytics for Healthcare Jimeng Sun Chandan K. Reddy Healthcare Analytics Department IBM TJ Watson Research Center Department of Computer Science Wayne State University 1 Healthcare Analytics
More informationVisualization of large data sets using MDS combined with LVQ.
Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk
More informationQuestions to be responded to by the firm submitting the application
Questions to be responded to by the firm submitting the application Why do you think this project should receive an award? How does it demonstrate: innovation, quality, and professional excellence transparency
More information