Research Summary. Tao Li. March 31, Matrix-based Algorithms for Data Mining 2. 2 General Data Mining Methods 2
|
|
- Lydia James
- 7 years ago
- Views:
Transcription
1 Research Summary Tao Li March 31, 2013 Research Overview I have been pursuing and will continue to pursue a competitive research agenda. My research explores two related topics on mining from data how to efficiently discover useful patterns and how to effectively retrieve information. The interests lie broadly in data mining and information retrieval studying both the algorithmic and application issues. I focus strongly on research challenges grounded in real-world problems, and work to validate my research in this context. Major Research Achievements Contents 1 Matrix-based Algorithms for Data Mining 2 2 General Data Mining Methods 2 3 System Log and Event Mining 2 4 Text Data Mining 3 5 Music Data Mining 3 6 Malware Detection 4 7 Applications Disaster Management Recommendation Systems Database Exploration and Navigation Business Intelligence Bioinformatics
2 1 Matrix-based Algorithms for Data Mining Matrix-based methodologies are rapidly becoming a significant part of data mining. I have been pursuing a strong research agenda in matrix-based data mining algorithms and applications and in the fore-front of this arena. My accomplishments are summarized below. I presented a general framework for clustering based on matrix computation and provided characterizations of different clustering methods within the framework. My recent studies demonstrate the clustering capability of Non-negative matrix Factorization (NMF) and establish the theoretical foundation for NMF to solve unsupervised learning problems: NMF with the sum of squared error cost function is equivalent to a relaxed K-means clustering, the most widely used unsupervised learning algorithm; NMF with the I-divergence cost function is equivalent to probabilistic latent semantic indexing, another unsupervised learning method popularly used in text analysis. I provided several important variants of NMF including Tri-factor NMF, Semi-NMF, and Convex NMF for enhancing clustering capability and improving clustering interpretation and developed a series of computational algorithms for NMF-type factorization with correctness and convergence analysis. I also extended NMF for solving many other data mining problems including, semi-supervised clustering, consensus clustering, dimensionality reduction, and clique finding. Many of these research works are performed in collaboration with Dr. Chris Ding (University of Texas at Arlington) and Dr. Michael I. Jordan (University of California at Berkeley). 2 General Data Mining Methods I have also developed some novel techniques for performing general data mining tasks under different scenarios. These techniques include 1) adaptive dimension reduction methods by combining linear discriminant analysis (LDA) and K-means clustering into a coherent framework to adaptively select the most discriminative subspace; 2) novel clustering and semi-supervised learning methods from multiple information sources; 3) wavelet-based methods for general data mining and data pre-processing; 4) a general framework for combining hierarchical clustering and partitional clustering and algorithms for performing semi-supervised hierarchical clustering; and 5) a general framework for generating multiple clustering views by combining meta-clustering with consensus clustering. 3 System Log and Event Mining Many systems, from computing systems, physical systems, business systems, to social systems, are only observable indirectly from the events they emit. Events are naturally temporal and are stored as logs, e.g., computer system logs, HTTP requests, database queries, network traffic data, etc. These events capture system states and activities over time. To understand the system behaviors and dynamics, one has to mine event logs to uncover patterns from events. In collaboration with IBM research and Xerox Research, I have developed an integrated framework along with a series of algorithms and tools on mining log data for computing system and service management and is 2
3 one of the leaders in this direction. Specifically, I have conducted research in the following themes: (i) design and develop log organization methods that can transform log data in disparate formats and contents into a canonical form and extract system events; (ii) design and develop methods including temporal dependency pattern mining and event summarization for data-driven pattern discovery and problem determination; and (iii) design and develop tools and methods to bridge the gap between the applications and intelligent algorithms. Our recent work on optimizing monitoring situations based on the ticket resolutions to reduce the number of non-actionable tickets has been incorporated into the IBM Tivoli Monitoring System products. 4 Text Data Mining The explosive growth of the volume and complexity of textual data (e.g., news, s, blogs, web pages, twitter streams) causes the data overload problem. It has become a necessity to semantically understand documents and deliver meaningful information to users. My research goal in text mining is to help users better understand and utilize large real document data sets via document clustering, categorization, and summarization. More specifically, there are four closely related dimensions of this research theme: 1) Summarization: Given a collection of documents, how to generate a concise yet comprehensive summary to present the information organized around some key aspects or topics? 2) Categorization and Clustering: Given a collection of documents, how to organize them into a list of meaningful categories? 3) Evolution: Given a collection of documents from diverse sources or documents reporting on temporal events, how to characterize the difference or the evolution? 4) Applications/Systems: How to build useful applications and systems? My students and I have developed effective data mining algorithms for 1) clustering high-dimensional documents by utilizing the dual relationship in word-document matrix; 2) hierarchically categorizing documents with a large number of classes; 3) summarizing documents in multiple scenarios to reflect the major or topic-relevant information contained in the document collection; 4) integrating document clustering and summarization to obtain meaningful document clusters with summarized interpretation; and 5) summarizing the difference and evolution of different document sources. We have also developed two software systems: a) ihelp: an intelligent online helpdesk system to automatically find problem-solution patterns from past customer-representative interactions; and b) Sumview: a web-based review summarization system to effectively and efficiently extract the most representative sentences in the reviews on various product features. 5 Music Data Mining The rapid growth of the Internet and the advancements of the Internet technologies have made it possible for users to have access to large amounts of on-line music data. The multifaceted nature of music information provides a wealth of opportunities for mining useful information and utilizing it to create novel ways of interaction with large music collections. In collaboration with Dr. Mitsunori Ogihara (University of Miami), I have been developing advanced data mining techniques in the context of music processing. We have been awarded a patent on using wavelet histograms 3
4 from music feature representation and are one of the leading groups in content-based music genre classification. We are one of the first to develop data mining methods for computational music emotion detection and many papers have followed our seminal work. We have also developed novel data mining methods that integrate features from different sources (e.g., acoustic signals, lyrics, meta-data and user tags) for music information retrieval. 6 Malware Detection Due to its damage to computer security, malware (such as virus and Trojan Horses) has caught the attention of computer security researchers for decades. The malware s trend towards stealth has motivated much research in intelligent malware analysis, where data mining techniques are used to deal with obfuscation. Having collaborated with the anti-virus laboratory of KingSoft Corporation ( and Comodo Corporation ( // for a long time, we have developed 1) a series of data mining techniques for building automatic malware detection systems; 2) an ensemble classification framework to combine heterogeneous base-level classifiers; 3) a principled cluster ensemble framework for combining individual clustering solutions for malware categorization; and 4) a semiparametric classifier model to combine file content and file relations together for malware detection. Our research has had important theoretical and practical impacts and many related systems have been incorporated into popular industrial products. For example, our cluster ensemble based malware detection scheme has been used in the Comodo file verdict service (see: http: //valkyrie.comodo.com). 7 Applications One of the important characteristics of data mining research is the combination of theory and applications. I focus strongly on research challenges grounded in real-world problems, and work to validate my research in this context. I am always looking for novel applications where data mining methods can help. 7.1 Disaster Management Over the last six years, in collaboration with Dr. Shu-ching Chen and Mr. Steve Luis, I have been developing data-driven techniques for disaster management. We have been working with government and industry partners to build community-driven services and tools for information sharing and exchange to operate during and after a hurricane recovery period. Our developed systems create collaborative solutions using advanced data mining and information retrieval techniques to help impacted communities better understand the current disaster situation and how the community is recovering. The systems are also able to facilitate professional organizations like Chambers of Commerce to assist their members, and help government agencies to assess damage and prioritize recovery needs. Our work has been recognized by FEMA (Federal Emergency 4
5 Management Agency) Private Sector Office as a model in assistance of Public-Private Partnerships (see: under Miami-Dade County). 7.2 Recommendation Systems Recommendation systems have gained increasing attention in recent years. My students and I have developed effective techniques for personalized news recommendation and for reciprocal recommendation. Different from traditional recommendation systems, news recommendation systems need effective news representation and processing and reciprocal recommender systems involve two different parties and focus on the preferences of both parties simultaneously. Our developed recommendation techniques have been deployed in Xiamen Rencai Network (XMRC) for Xiamen Talent Service Center (see: Database Exploration and Navigation My research on database exploration and navigation aims to use data mining and information retrieval techniques to help user quickly find useful information from the large volume of data stored in many different databases. In collaboration with Dr. Zhiyuan Chen (University of Maryland Baltimore County), I have developed several approaches to organize and rank SQL query results and to dynamically generate query forms. The novelty of our approaches is: (1) they take into account the diversity of user preferences, and (2) they use probabilistic models and are robust to poor-quality data. 7.4 Business Intelligence The leading indicators within Business Intelligence (BI) systems are one type of Key Performance Indicators (KPIs) that present key drivers of business value and offer the organization the unique opportunity to positively effect. My students and I have developed a semi-automatic system for analyzing operational metrics, factoring out the key performance indicators (KPIs), and then further discovering leading indicators by developing data mining techniques incorporated with the domain knowledge. We have also developed a system for collecting and analyzing the customer feedbacks or requirements (e.g., voice of customers) with the purpose of extracting useful knowledge nuggets and turning them into desired features in the products, solutions, or services. 7.5 Bioinformatics I have been developing computational tools for microarray data analysis and protein-protein interaction inference. I have performed a comprehensive comparative study on feature selection methods as well as state-of-the-art classification methods on various multi-class gene expression datasets and proposed uncorrelated discriminant analysis for multi-class expression data classification. I have also developed several gene selection algorithms to identify marker genes from microarray data with different characteristics and studied the problem of gene functional classification from multiple information sources. 5
Semi-Supervised and Unsupervised Machine Learning. Novel Strategies
Brochure More information from http://www.researchandmarkets.com/reports/2179190/ Semi-Supervised and Unsupervised Machine Learning. Novel Strategies Description: This book provides a detailed and up to
More informationMultichannel Customer Listening and Social Media Analytics
( Multichannel Customer Listening and Social Media Analytics KANA Experience Analytics Lite is a multichannel customer listening and social media analytics solution that delivers sentiment, meaning and
More informationUnsupervised and supervised dimension reduction: Algorithms and connections
Unsupervised and supervised dimension reduction: Algorithms and connections Jieping Ye Department of Computer Science and Engineering Evolutionary Functional Genomics Center The Biodesign Institute Arizona
More informationMachine Learning for Data Science (CS4786) Lecture 1
Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:
More informationA Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel Martín-Merino Universidad
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
More informationANALYTICS IN BIG DATA ERA
ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationA Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
More informationData Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
More informationKeywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
More informationCS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
More informationFlorida International University - University of Miami TRECVID 2014
Florida International University - University of Miami TRECVID 2014 Miguel Gavidia 3, Tarek Sayed 1, Yilin Yan 1, Quisha Zhu 1, Mei-Ling Shyu 1, Shu-Ching Chen 2, Hsin-Yu Ha 2, Ming Ma 1, Winnie Chen 4,
More informationIntroduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
More informationClustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
More information12 A framework for knowledge management
365 12 A framework for knowledge management As those who work in organizations know, organizations are not homogenous entities where grand theoretical systems are easily put in place. Change is difficult.
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationHexaware E-book on Predictive Analytics
Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,
More informationData Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
More informationHow To Create A Text Classification System For Spam Filtering
Term Discrimination Based Robust Text Classification with Application to Email Spam Filtering PhD Thesis Khurum Nazir Junejo 2004-03-0018 Advisor: Dr. Asim Karim Department of Computer Science Syed Babar
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationUnsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More information1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining
1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining techniques are most likely to be successful, and Identify
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationIntroduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
More informationA Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks
A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO Agenda Difficult text analytics tasks Feature extraction Bio-inspired
More informationWhite Paper April 2006
White Paper April 2006 Table of Contents 1. Executive Summary...4 1.1 Scorecards...4 1.2 Alerts...4 1.3 Data Collection Agents...4 1.4 Self Tuning Caching System...4 2. Business Intelligence Model...5
More informationNetwork Big Data: Facing and Tackling the Complexities Xiaolong Jin
Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10
More informationIJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS
IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS V.Sudhakar 1 and G. Draksha 2 Abstract:- Collective behavior refers to the behaviors of individuals
More informationData Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority
More informationMorphological analysis on structural MRI for the early diagnosis of neurodegenerative diseases. Marco Aiello On behalf of MAGIC-5 collaboration
Morphological analysis on structural MRI for the early diagnosis of neurodegenerative diseases Marco Aiello On behalf of MAGIC-5 collaboration Index Motivations of morphological analysis Segmentation of
More informationData Discovery, Analytics, and the Enterprise Data Hub
Data Discovery, Analytics, and the Enterprise Data Hub Version: 101 Table of Contents Summary 3 Used Data and Limitations of Legacy Analytic Architecture 3 The Meaning of Data Discovery & Analytics 4 Machine
More informationConclusions and Future Directions
Chapter 9 This chapter summarizes the thesis with discussion of (a) the findings and the contributions to the state-of-the-art in the disciplines covered by this work, and (b) future work, those directions
More informationDatabase Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationDistance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II
More informationAn Ants Algorithm to Improve Energy Efficient Based on Secure Autonomous Routing in WSN
An Ants Algorithm to Improve Energy Efficient Based on Secure Autonomous Routing in WSN *M.A.Preethy, PG SCHOLAR DEPT OF CSE #M.Meena,M.E AP/CSE King College Of Technology, Namakkal Abstract Due to the
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationData Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
More informationWeb Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
More informationPSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
More informationNon-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationdm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING
dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on
More informationBig Data in the Mathematical Sciences
Big Data in the Mathematical Sciences Wednesday 13 November 2013 Sponsored by: Extract from Campus Map Note: Walk from Zeeman Building to Arts Centre approximately 5 minutes ZeemanBuilding BuildingNumber38
More informationCourse 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationInternational Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 Over viewing issues of data mining with highlights of data warehousing Rushabh H. Baldaniya, Prof H.J.Baldaniya,
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
More informationRequirements Analysis Concepts & Principles. Instructor: Dr. Jerry Gao
Requirements Analysis Concepts & Principles Instructor: Dr. Jerry Gao Requirements Analysis Concepts and Principles - Requirements Analysis - Communication Techniques - Initiating the Process - Facilitated
More information131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
More information15.00 15.30 30 XML enabled databases. Non relational databases. Guido Rotondi
Programme of the ESTP training course on BIG DATA EFFECTIVE PROCESSING AND ANALYSIS OF VERY LARGE AND UNSTRUCTURED DATA FOR OFFICIAL STATISTICS Rome, 5 9 May 2014 Istat Piazza Indipendenza 4, Room Vanoni
More informationOverview. Background. Data Mining Analytics for Business Intelligence and Decision Support
Mining Analytics for Business Intelligence and Decision Support Chid Apte, PhD Manager, Abstraction Research Group IBM TJ Watson Research Center apte@us.ibm.com http://www.research.ibm.com/dar Overview
More informationIntroduction to Pattern Recognition
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationPractical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING
Practical Applications of DATA MINING Sang C Suh Texas A&M University Commerce r 3 JONES & BARTLETT LEARNING Contents Preface xi Foreword by Murat M.Tanik xvii Foreword by John Kocur xix Chapter 1 Introduction
More informationGraduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina
Graduate Co-op Students Information Manual Department of Computer Science Faculty of Science University of Regina 2014 1 Table of Contents 1. Department Description..3 2. Program Requirements and Procedures
More informationCrime Pattern Analysis
Crime Pattern Analysis Megaputer Case Study in Text Mining Vijay Kollepara Sergei Ananyan www.megaputer.com Megaputer Intelligence 120 West Seventh Street, Suite 310 Bloomington, IN 47404 USA +1 812-330-01
More informationTIETS34 Seminar: Data Mining on Biometric identification
TIETS34 Seminar: Data Mining on Biometric identification Youming Zhang Computer Science, School of Information Sciences, 33014 University of Tampere, Finland Youming.Zhang@uta.fi Course Description Content
More informationJoint Feature Learning and Clustering Techniques for Clustering High Dimensional Data: A Review
International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-03 E-ISSN: 2347-2693 Joint Feature Learning and Clustering Techniques for Clustering High Dimensional
More informationIMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
More informationChapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
More informationVerifying Business Processes Extracted from E-Commerce Systems Using Dynamic Analysis
Verifying Business Processes Extracted from E-Commerce Systems Using Dynamic Analysis Derek Foo 1, Jin Guo 2 and Ying Zou 1 Department of Electrical and Computer Engineering 1 School of Computing 2 Queen
More informationIndex Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationVisualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
More informationMaster s Program in Information Systems
The University of Jordan King Abdullah II School for Information Technology Department of Information Systems Master s Program in Information Systems 2006/2007 Study Plan Master Degree in Information Systems
More informationUsing multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
More informationThe Big Data Paradigm Shift. Insight Through Automation
The Big Data Paradigm Shift Insight Through Automation Agenda The Problem Emcien s Solution: Algorithms solve data related business problems How Does the Technology Work? Case Studies 2013 Emcien, Inc.
More informationIT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More informationLluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining
Lluis Belanche + Alfredo Vellido Intelligent Data Analysis and Data Mining a.k.a. Data Mining II Office 319, Omega, BCN EET, office 107, TR 2, Terrassa avellido@lsi.upc.edu skype, gtalk: avellido Tels.:
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationMachine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
More informationUser Authentication/Identification From Web Browsing Behavior
User Authentication/Identification From Web Browsing Behavior US Naval Research Laboratory PI: Myriam Abramson, Code 5584 Shantanu Gore, SEAP Student, Code 5584 David Aha, Code 5514 Steve Russell, Code
More informationComplexity and Scalability in Semantic Graph Analysis Semantic Days 2013
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation
More informationCourse 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services
Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services Length: Delivery Method: 3 Days Instructor-led (classroom) About this Course Elements of this syllabus are subject
More informationTopics in basic DBMS course
Topics in basic DBMS course Database design Transaction processing Relational query languages (SQL), calculus, and algebra DBMS APIs Database tuning (physical database design) Basic query processing (ch
More informationIndustrial Cyber Security Risk Manager. Proactively Monitor, Measure and Manage Industrial Cyber Security Risk
Industrial Cyber Security Risk Manager Proactively Monitor, Measure and Manage Industrial Cyber Security Risk Industrial Attacks Continue to Increase in Frequency & Sophistication Today, industrial organizations
More informationContent Analyst's Cerebrant Combines SaaS Discovery, Machine Learning, and Content to Perform Next-Generation Research
INSIGHT Content Analyst's Cerebrant Combines SaaS Discovery, Machine Learning, and Content to Perform Next-Generation Research David Schubmehl IDC OPINION Organizations are looking for better ways to perform
More informationSocial Media Implementations
SEM Experience Analytics Social Media Implementations SEM Experience Analytics delivers real sentiment, meaning and trends within social media for many of the world s leading consumer brand companies.
More informationDe la Business Intelligence aux Big Data. Marie- Aude AUFAURE Head of the Business Intelligence team Ecole Centrale Paris. 22/01/14 Séminaire Big Data
De la Business Intelligence aux Big Data Marie- Aude AUFAURE Head of the Business Intelligence team Ecole Centrale Paris 22/01/14 Séminaire Big Data 1 Agenda EvoluHon of Business Intelligence SemanHc Technologies
More informationMicrosoft Dynamics NAV
Microsoft Dynamics NAV Maximizing value through business insight Business Intelligence White Paper November 2011 The information contained in this document represents the current view of Microsoft Corporation
More informationA Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities
A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities The first article of this series presented the capability model for business analytics that is illustrated in Figure One.
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationDelivering Smart Answers!
Companion for SharePoint Topic Analyst Companion for SharePoint All Your Information Enterprise-ready Enrich SharePoint, your central place for document and workflow management, not only with an improved
More informationTIBCO Spotfire Guided Analytics. Transferring Best Practice Analytics from Experts to Everyone
TIBCO Spotfire Guided Analytics Transferring Best Practice Analytics from Experts to Everyone Introduction Business professionals need powerful and easy-to-use data analysis applications in order to make
More informationCLUSTER ANALYSIS WITH R
CLUSTER ANALYSIS WITH R [cluster analysis divides data into groups that are meaningful, useful, or both] LEARNING STAGE ADVANCED DURATION 3 DAY WHAT IS CLUSTER ANALYSIS? Cluster Analysis or Clustering
More informationDistances, Clustering, and Classification. Heatmaps
Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be
More informationText Mining for Health Care and Medicine. Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk
Text Mining for Health Care and Medicine Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk The Need for Text Mining MEDLINE 2005: ~14M 2009: ~18M Overwhelming information in textual,
More informationA SURVEY ON WEB MINING TOOLS
IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 3, Issue 10, Oct 2015, 27-34 Impact Journals A SURVEY ON WEB MINING TOOLS
More informationSocial Media Analytics
Social Media Analytics Raghu Krishnapuram and Jitendra Ajmera IBM Research - India 2011 IBM Corporation Convergence of Social and Analytic Technologies Transform the Way the World Operates Socially Synergistic
More informationMachine Learning Log File Analysis
Machine Learning Log File Analysis Research Proposal Kieran Matherson ID: 1154908 Supervisor: Richard Nelson 13 March, 2015 Abstract The need for analysis of systems log files is increasing as systems
More informationPrinciples of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More information