Project Participants



Similar documents
Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008

Introduction to Data Mining

CHAPTER 1 INTRODUCTION

Professional Organization Checklist for the Computer Information Systems Curriculum

Depth and Excluded Courses

A Review of Data Mining Techniques

Data Mining: Opportunities and Challenges

Information Management course

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Grid Density Clustering Algorithm

FOUNDATIONS OF A CROSS- DISCIPLINARY PEDAGOGY FOR BIG DATA

Principles of Data Mining by Hand&Mannila&Smyth

Introduction to Data Mining

Machine Learning Department, School of Computer Science, Carnegie Mellon University, PA

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

The Department of Electrical and Computer Engineering (ECE) offers the following graduate degree programs:

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

How To Get A Computer Science Degree At Appalachian State

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

Xianrui Meng. MCS 138, 111 Cummington Mall Department of Computer Science Boston, MA (857)

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM

Knowledge Discovery from Data Bases Proposal for a MAP-I UC

CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Curriculum of the research and teaching activities. Matteo Golfarelli

The Applied and Computational Mathematics (ACM) Program at The Johns Hopkins University (JHU) is

Fluency With Information Technology CSE100/IMT100

An Introduction to Data Mining

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Accelerated Undergraduate/Graduate (BS/MS) Dual Degree Program in Computer Science

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

How To Get A Computer Engineering Degree

A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment

Visualization of large data sets using MDS combined with LVQ.

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Computer Science Electives and Clusters

Data Mining Analytics for Business Intelligence and Decision Support

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

UNDERGRADUATE DEGREE PROGRAMME IN COMPUTER SCIENCE ENGINEERING SCHOOL OF COMPUTER SCIENCE ENGINEERING, ALBACETE

A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING

Jagir Singh, Greeshma, P Singh University of Northern Virginia. Abstract

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

An Overview of Knowledge Discovery Database and Data mining Techniques

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL

XIAOBAI (BOB) LI ACADEMIC EXPERIENCE RESEARCH HIGHLIGHTS TEACHING HIGHLIGHTS

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Adina Crainiceanu. Ph.D. in Computer Science, Cornell University, Ithaca, NY May 2006 Thesis Title: Answering Complex Queries in Peer-to-Peer Systems

Daniel J. Adabi. Workshop presentation by Lukas Probst

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

The Scientific Data Mining Process

Application of Data Warehouse and Data Mining. in Construction Management

COURSE RECOMMENDER SYSTEM IN E-LEARNING

Statistics for BIG data

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

MS and PhD Degree Requirements

Information Systems. Administered by the Department of Mathematical and Computing Sciences within the College of Arts and Sciences.

Ezgi Dinçerden. Marmara University, Istanbul, Turkey

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

ANALYTICS CENTER LEARNING PROGRAM

Search Result Optimization using Annotators

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Advice for Students completing the B.S. degree in Computer Science based on Quarters How to Satisfy Computer Science Related Electives

ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

The Prophecy-Prototype of Prediction modeling tool

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Computer Science. General Education Students must complete the requirements shown in the General Education Requirements section of this catalog.

BIG DATA What it is and how to use?

Information and Decision Sciences (IDS)

Future Trend Prediction of Indian IT Stock Market using Association Rule Mining of Transaction data

MATTEO RIONDATO Curriculum vitae

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011

Visualization Techniques in Data Mining

Page 1 of 5. (Modules, Subjects) SENG DSYS PSYS KMS ADB INS IAT

Specific Usage of Visual Data Analysis Techniques

Chapter ML:XI. XI. Cluster Analysis

Text Mining: The state of the art and the challenges

An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis]

The STC for Event Analysis: Scalability Issues

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

A Time Efficient Algorithm for Web Log Analysis

Data Mining System, Functionalities and Applications: A Radical Review

Business Analytics and Data Visualization. Decision Support Systems Chattrakul Sombattheera

Transcription:

Annual Report for Period:10/2006-09/2007 Submitted on: 08/15/2007 Principal Investigator: Yang, Li. Award ID: 0414857 Organization: Western Michigan Univ Title: Projection and Interactive Exploration of Large Relational Data Senior Personnel Name: Yang, Li Project Participants Post-doc Graduate Student Name: Sanver, Mustafa Mustafa Sanver is a Ph.D. student who has been working on both data visualization and database components. Name: Zhao, Dongfang Dongfang Zhao worked on the data embedding component and was supported in the 2005-06 academic year. Name: Hua, Danyang Danyang Hua works on the database component and has been supported since 2007. Undergraduate Student Technician, Programmer Other Participant Research Experience for Undergraduates Organizational Partners Other Collaborators or Contacts Activities and Findings Research and Education Activities: Page 1 of 5

This research project consists of three technical components: data visualization, database support, and data embedding. Please refer to the Publications section for related references. In the data visualization component, we have been developing a visualization tool [Yang & Sanver TVCG'07] that takes multi-resolution aggregated data as data input. Two interactive visualization techniques, density-based parallel coordinates and footprint splatting with grand tour, have been extended to support the rendering of data aggregated at multiple resolutions. The tool supports overview-and-drill-down of large relational data and allows users to interactively specify subsets of data for further visualization, possibly at more detailed resolutions. The visualization tool, with further development, can be used by industry and other agencies for scalable interactive data visualization and exploration. We have also developed techniques for pruning and visualizing frequent itemsets and many-to-many association rules [Yang TKDE'03]. Future work in this component includes usability study and better GUI design. In the database component, we have studied multi-resolution data aggregation as a common representation of data between database and visualization tools [Yang & Sanver TVCG'07]. Data aggregated at multiple resolutions are piggybacked onto internal nodes of a k-d-b tree. The k-d-b tree structure is extended to improve query performance and node fan-outs while keeping data aggregation information. We have conducted experiments on both synthetic and real world data sets. Performances of data access and index maintenance have been tested. Future work includes study of better indexing mechanism and data mining techniques using multi-resolution data aggregation as input. In the data embedding component, four methods (k-mst [Yang ICPR'04], min-k-st [Yang TPAMI'05], k-ec [Yang PRL'05], and k-vc [Yang SIGKDD'05, Yang TPAMI'06]) were proposed to build connected neighborhood graphs for robust and reliable dimensionality reduction. A new locally isometric embedding method LMDS [Yang ICPR'06, Yang TPAMI'07] is discovered. Incremental methods [Zhao & Yang ICPR'06, Zhao & Yang TPAMI'07] have been developed for neighborhood graph construction and projection of large data and data streams. Future work includes systemization and evaluation of existing data embedding methods. Assessment of Project's Status: Most activities of this project, as defined in the research and education plan in the original proposal in 01/2004 and revised in 07/2004, have been completed. Compared with the original research objectives, the following list highlights some of the major accomplishments: 1. To support interactive exploration of large relational data, we have studied multi-resolution data aggregation and used high dimensional partition-based tree index to piggyback the aggregated data as an intermediate representation of large relational data for interaction visualization. 2. Two visualization techniques, footprint splatting with grand tour and parallel coordinates, have been extended to visualize multidimensional aggregated data. 3. A client-server visualization tool has been developed to demonstrate the feasibility and effectiveness of this approach in multiresolution visualization of large relational data. Multiple visualization clients can get data from a data server using TCP/IP connections. The feature-rich visualization tool supports many graphical user interactions, including overview-and-drill-down by allowing users to interactively specify subsets of data for further visualization. Software design allows easy integration of new data visualization techniques into the tool. 4. Four methods were proposed to build connected neighborhood graphs for data embedding. A locally isometric embedding method LMDS is proposed. Incremental methods have been developed for projection of large data sets and data streams. The following summarizes our ongoing work. These are what we expect to accomplish during the No-Cost Extension period: 1. Better GUI and query interface design; documentation for end users and developers; 2. Conducting user study; 3. Performance experiments on large data sets up to a few terabytes; 4. Data clustering and other data mining algorithms using multi-resolution data aggregation as data input; 5. Ongoing data embedding research. Findings: Page 2 of 5

Visual exploration of large relational data poses fundamental challenges to both data visualization and database management systems. A major finding of this project is the density-based methodology to interactively explore large relational data sets. It uses multi-resolution data aggregation as a common representation of data between relational databases and visualization tools. Data aggregated at multiple resolutions are stored in internal nodes of a partition-based high dimensional tree index. Such a piggyback ride of aggregated data supports the overview-and-drill-down data access pattern for interactive data exploration. It has build-in support for visual interaction and data scalability. Existing visualization techniques are extended to support this data representation. In addition, the proposed multi-resolution data representation has potential applications to accelerate data aggregation queries and OLAP queries. It can be used as data input for efficient mining of large data sets. It also provides support for privacy preservation where permissions can be granted to users based on data resolutions. New techniques and algorithms in these areas are parts of our ongoing research. We have developed a set of algorithms and methods for nonlinear data embedding and dimensionality reduction. This research calls for new ideas from differential geometry and may have fundamental impacts on multivariate data mining and data processing. This is an important part of our ongoing work. Training and Development: Three Ph.D. students (Mustafa Sanver, Dongfang Zhao, and Danyang Hua) are supported by this grant. M.S. students doing thesis work have also greatly benefited from the research supported by this grant. Parts of this research are used in two courses (CS6430 - Advanced DBMS and CS6030 - Knowledge Discovery and Data Mining) taught at the Department of Computer Science, Western Michigan University (2005-2007). Outreach Activities: We have established collaborations with the Department of Business Information Systems, College of Business and the Department of Educational Leadership, Research and Technology, College of Education. Through such collaborations, we expect to have access to real world data sets and applications and to conduct user study with participation from students with diversified backgrounds. Journal Publications Li Yang, "Distance-preserving projection of high dimensional data for nonlinear dimensionality reduction", IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 1243, vol. 26, (2004). Published, 10.1109/TPAMI.2004.66 Li Yang, "Pruning and visualizing generalized association rules in parallel coordinates", IEEE Transactions on Knowledge and Data Engineering, p. 60, vol. 17, (2005). Published, 10.1109/TKDE.2005.14 Li Yang, "Building k-edge-connected neighborhood graphs for distance-based data projection", Pattern Recognition Letters, p. 2, vol. 26, (2005). Published, 10.1016/j.patrec.2005.03.021 Li Yang, "Building k edge-disjoint spanning trees of minimum total length for isometric data embedding", IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 16, vol. 27, (2005). Published, 10.1109/TPAMI.2005.192 Li Yang, "Data embedding techniques and applications", Proceedings of the 2nd International Workshop on Computer Vision meets Databases (CVDB'2005), Baltimore, MD, June 2005, p. 29, vol., (2005). Published, 10.1145/1160939.1160948 Li Yang, "Building connected neighborhood graphs for isometric data embedding", Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'2005), Chicago, IL, August 2005, p. 722, vol., (2005). Published, 10.1145/1081870.1081963 Li Yang, "Building k-connected neighborhood graphs for isometric data embedding", IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 827, vol. 28, (2006). Published, 10.1109/TPAMI.2006.89 Li Yang, "Alignment of overlapping locally scaled patches for multidimensional scaling and dimensionality reduction", IEEE Transactions on Pattern Analysis and Machine Intelligence, p., vol., (2007). Accepted, 10.1109/TPAMI.2007.70706 Page 3 of 5

Li Yang, " k-edge connected neighborhood graph for geodesic distance estimation and nonlinear data projection", Proceedings of the 17th International Conference on Pattern Recognition (ICPR'04), Cambridge, UK, August 2004, p. 196, vol. 1, (2004). Published, 10.1109/ICPR.2004.1334057 Li Yang, "Sammon's nonlinear mapping using geodesic distances", Proceedings of the 17th International Conference on Pattern Recognition (ICPR'04), Cambridge, UK, August 2004, p. 303, vol. 2, (2004). Published, 10.1109/ICPR.2004.1334180 Dongfang Zhao, Li Yang, "Incremental construction of neighborhood graphs for nonlinear dimensionality reduction", Proceedings of the 18th International Conference on Pattern Recognition (ICPR'06), Hong Kong, China, August 2006, p. 177, vol. 3, (2006). Published, 10.1109/ICPR.2006.707 Li Yang, "Building connected neighborhood graphs for locally linear embedding", Proceedings of the 18th International Conference on Pattern Recognition (ICPR'06), Hong Kong, China, August 2006, p. 194, vol. 4, (2006). Published, 10.1109/ICPR.2006.345 Li Yang, "Locally multidimensional scaling for nonlinear dimensionality reduction", Proceedings of the 18th International Conference on Pattern Recognition (ICPR'06), Hong Kong, China, August 2006, p. 202, vol. 4, (2006). Published, 10.1109/ICPR.2006.774 Dongfang Zhao, Li Yang, "Incremental isometric embedding of high dimensional data using connected neighborhood graphs", IEEE Transactions on Pattern Analysis and Machine Intelligence, p., vol., (2007). Submitted, Li Yang, Mustafa Sanver, "Multiresolution data aggregation for visual exploration of large relational data", IEEE Transactions on Visualization and Computer Graphics, p., vol., (2007). Submitted, Books or Other One-time Publications Li Yang, "Data projection techniques and their application in sensor array data processing", (2005). Book chapter, Published Editor(s): Mehmed Kantardzic, Jozef Zurada Collection: Next Generation of Data Mining Applications Bibliography: pages 57-77, Wiley-IEEE Press Li Yang, Tosiyasu L. Kunii, "Visual database", (2007). Book chapter, Submitted Editor(s): Benjamin Wah, Jeffrey Tsai Collection: Wiley Encyclopedia of Computer Science and Engineering Bibliography: John Wiley & Sons Inc Web/Internet Site URL(s): http://www.cs.wmich.edu/~yang Description: A dedicated web site will be setup once we finish the development of the first release of the software tool. Other Specific Products Contributions Contributions within Discipline: We have devised multi-resolution data aggregation and have used high dimensional partition-based tree index to piggyback the data aggregated at multiple resolutions as an intermediate representation of large relational data for interactive visualization. Two visualization techniques, footprint splatting with grand tour and parallel coordinates, are extended to visualize the multi-resolution Page 4 of 5

aggregated data. A client/server visualization tool is developed to demonstrate the feasibility and effectiveness of this approach. We have developed an approach to visualize generalized association rules in parallel coordinates. In data embedding, a set of algorithms and methods are developed for building connected neighborhood graphs and for locally isometric data embedding. Incremental methods are developed to project large data sets and data streams. Contributions to Other Disciplines: The proposed multi-resolution data representation has potential applications in: (1) optimization of traditional database queries such as data aggregation queries and OLAP queries; (2) efficient mining of large data sets; (3) privacy-preserving data mining. Contributions to Human Resource Development: Three Ph.D. students (Mustafa Sanver, Dongfang Zhao, and Danyang Hua) are supported by this grant. The PI and the students have gained great research experience in working on this project. M.S. students doing thesis work have also greatly benefited from the research supported by this grant. Contributions to Resources for Research and Education: Parts of this research are used in two courses (CS6430 - Advanced DBMS and CS6030 - Knowledge Discovery and Data Mining) taught at the Department of Computer Science, Western Michigan University (2005-2007). Students in these courses have benefited from the results of this research. Contributions Beyond Science and Engineering: Special Requirements Special reporting requirements: None Change in Objectives or Scope: None Unobligated funds: less than 20 percent of current funds Animal, Human Subjects, Biohazards: None Categories for which nothing is reported: Organizational Partners Any Product Contributions: To Any Beyond Science and Engineering Page 5 of 5