Data Mining as Part of Knowledge Discovery in Databases (KDD)
|
|
|
- Elizabeth Newman
- 10 years ago
- Views:
Transcription
1 Mining as Part of Knowledge Discovery in bases (KDD) Presented by Naci Akkøk as part of INF4180/3180, Advanced base Systems, fall 2003 (based on slightly modified foils of Dr. Denise Ecklund from 6 November 2002) KDD-1 Lecture Contents KDD: Knowledge Discovery in bases Definition and Applications OLAP: On-Line Analytical Processing Architectures for OLAP and KDD KDD Life Cycle Mining Mechanisms Implications for the base System Literature: Elmasri & Navathe - Section 26.2 Garcia-Molina, Ullman, & Widom - Section 20.6 KDD Denise Ecklund KDD-1
2 Definition - KDD Knowledge Discovery in bases (KDD) the non-trivial extraction of implicit, previously unknown and potentially useful knowledge from data Why? To find trends and correlations in existing databases, that help you change structures and processes in human organizations to be more effective to save time to make more money to improve product quality to etc. KDD-3 Applications - KDD Marketing Find the most important segmentation for classifying my customers Predict future sales for a specific product Product Maintenance Define a service maintenance contract of interest to a majority of my customers Human Resource Management Define an employee compensation package to increase employee retention to at least 5 years of service For each university department, predict the number of new students that will major in that subject area Finance Detect fraudulent use of credit cards KDD Denise Ecklund KDD-2
3 Where SQL Stops Short Observe Analyze Theoritize Predict Mining Tools Creates new knowledge in the form of predictions, classifications, associations, time-based patterns OLAP Tools DSS-Extended SQL Queries Analyzes a multi-dimensional view of existing knowledge Extends the DB schema to a limited, multi-dimensional view for Decision Support Systems (DSS) Complex SQL Queries DB schema describes the structure you already know base Successful KDD uses the entire tool hierarchy KDD-5 OnLine Analytical Processing (OLAP) OLAP the dynamic synthesis, analysis, and consolidation of large volumes of multi-dimensional data Focuses on multi-dimensional relationships among existing data records Differs from extended SQL for data warehouses DW operations: roll-up, drill-down, slice, dice, pivot OLAP packages add application-specific analysis Finance depreciation, currency conversion,... Building regulations usable space, electrical loading,... Computer Capacity Planning disk storage requirements, failure rate estimation... KDD Denise Ecklund KDD-3
4 OnLine Analytic Processing (OLAP) OLAP differs from data mining OLAP tools provide quantitative analysis of multi-dimensional data relationships mining tools create and evaluate a set of possible problem solutions (and rank them) Ex: Propose 3 marketing strategies and order them based on marketing cost and likely sales income Three system architectures are used for OLAP Relational OLAP (ROLAP) Multi-dimensional OLAP (MOLAP) Managed Query Environment (MQE) KDD-7 Relational OLAP Relational base Relational base System On-Demand Queries Result Relations ROLAP Server and Analysis App Relation Cube Runtime Steps: Translate client request into 1 or more SQL queries Present SQL queries to a back-end RDBMS Convert result relations into multi-dimensional datacubes Execute the analysis application and return the results Analysis Request Analysis Output Optional Cache Advantages: No special data storage structures, use standard relational storage structures Uses most current data from the OLTP server Disadvantages: Typically accesses only one RDBMS => no data integration over multiple DBSs Conversion from flat relations to memory-based datacubes is slow and complex Poor performance if large amounts of data are retrieved from the RDBMS KDD Denise Ecklund KDD-4
5 Multi-dimensional OLAP base System Source Fetch Source Fetch Source Create Cubes Warehouse MOLAP Server and Analysis App Analysis Request Analysis Output Preprocessing Steps: Extract data from multiple data sources Store as a data warehouse, using custom storage structures Runtime Steps: Access datacubes through special index structures Execute the analysis application and returns the results Advantages: Special, multi-dimensional storage structures give good retrieval performance Warehouse integrates clean data from multiple data sources Disadvantages: Inflexible, multi-dimensional storage structures support only one application well Requires people and software to maintain the data warehouse KDD-9 Managed Query Environment (MQE) Relational base Relational base System Cubes MOLAP Server Client Runtime Steps: Fetch data from MOLAP Server, or RDBMS directly Build memory-based data structures, as required Execute the analysis application Advantages: Distributes workload to the clients, offloading the servers Simple to install, and maintain => reduced cost Retrieve Retrieve Cube Cache Analysis App Disadvantages: Provides limited analysis capability (i.e., client is less powerful than a server) Lots of redundant data stored on the client systems Client-defined and cached datacubes can cause inconsistent data Uses lots of network bandwidth KDD Denise Ecklund KDD-5
6 KDD System Architecture Cubes Fetch Cubes Warehouse Query Multi-dim KDD Server DM-T1 DM-T2 Knowledge Request Transaction DB Multimedia DB Web Web Web Fetch Relations Fetch Relations Fetch Fetch ROLAP Server MOLAP Server Return Multi-dim Values DM-T3 OLAP T1 SQL i/f DM-T4 OLAP T2 nd SQL New Knowledge Need to mine all types of data sources DM tools require multi-dimensional input data General KDD systems also hosts OLAP tools, SQL interface, and SQL for DWs (e.g., nd SQL) KDD-11 Typical KDD Deployment Architecture Cubes Fetch cubes Warehouse Server Query Warehouse Return Cubes KDD Server T1 T2 Knowledge Request Knowledge Runtime Steps: Submit knowledge request Fetch datacubes from the warehouse Execute knowledge acquisition tools Return findings to the client for display Advantages: warehouse provides clean, maintained, multi-dimensional data retrieval is typically efficient warehouse can be used by other applications Easy to add new KDD tools Disadvantages: KDD is limited to data selected for inclusion in the warehouse If DW is not available, use MOLAP server or provide warehouse on KDD server KDD Denise Ecklund KDD-6
7 KDD Life Cycle Selection and Extraction Cleaning Enrichment Coding Commonly part of data warehouse preparation Additional preparation for KDD Preparation Phases Mining Result Reporting preparation and result reporting are 80% of the work! Based on results you may decide to: Get more data from internal data sources Get additional data from external sources Recode the data KDD-13 Enrichment Integrating additional data from external sources Sources are public and private agencies Government, Credit bureau, Research lab, Typical data examples: Average income by city, occupation, or age group Percentage of homeowners and car owners by A person s credit rating, debt level, loan history, Geographical density maps (for population, occupations, ) New data extends each record from internal sources base issues: More heterogenous data formats to be integrated Understand the semantics of the external source data KDD Denise Ecklund KDD-7
8 Coding Goal: to streamline the data for effective and efficient processing by the target KDD application Steps: 1) Delete records with many missing values But in fraud detection, missing values are indicators of the behavior you want to discover! 2) Delete extra attributes Ex: delete customer names if looking at customer classes 3) Code detailed data values into categories or ranges based on the types of knowledge you want to discover Ex: divide specific ages into age ranges, 0-10, 11-20, map home addresses to regional codes convert homeownership to yes or no convert purchase date to a month number starting from Jan KDD-15 The Mining Process Based on the questions being asked and the required form of the output 1) Select the data mining mechanisms you will use 2) Make sure the data is properly coded for the selected mechnisms Ex. A tool may accept numeric input only 3) Perform rough analysis using traditional tools Create a naive prediction using statistics, e.g., averages The data mining tools must do better than the naive prediction or you are not learning more than the obvious! 4) Run the tool and examine the results KDD Denise Ecklund KDD-8
9 Mining - Mechanisms Purpose/Use To define classes and predict future behavior of existing instances. To define classes and categorize new instances based on the classes. To identify a cause and predict the effect. To define classes and detect new and old instances that lie outside the classes. Knowledge Type Predictive Modeling base Segmentation Link Analysis Deviation Detection Mechanisms Decision Trees, Neural Networks, Regression Analysis Genetic Algorithms K-nearest Neighbors, Neural Networks, Kohonen Maps Negative Association, Association Discovery, Sequential Pattern Discovery, Matching Time Sequence Discovery Statistics, Visualization, Genetic Algorithms KDD-17 Configuring the KDD Server mining mechanisms are not application-specific, they depend on the target knowledge type The application area impacts the type of knowledge you are seeking, so the application area guides the selection of data mining mechanisms that will be hosted on the KDD server. To configure a KDD server Select an application area Select (or build) a data source Select N knowledge types (types of questions you will ask) For each knowledge type Do Select 1 or more mining tools for that knowledge type Example: Application area: marketing Source: data warehouse of current customer info Knowledge types: classify current customers, predict future sales, predict new customers Mining Tools: decision trees, and neural networks KDD Denise Ecklund KDD-9
10 Mining Example - base Segmentation Given: a coded database with 1 million records on subscriptions to five types of magazines Goal: to define a classification of the customers that can predict what types of new customers will subscribe to which types of magazines Client# Age Income Credit Car House Owner Owner Home Area House Sports Magaz Car Magaz Magaz Music Comic Magaz Magaz KDD-19 Decision Trees for DB Segmentation Which attribute(s) best predicts which magazine(s) a customer subscribes to (sensitivity analysis) Attributes: age, income, credit, car-owner, house-owner, area Classify people who subscribe to a car magazine Car Magaz Only 1% of the people over 44.5 years of age buys a car magazine Age > 44.5 Age <= % 62% of the people under 44.5 years of age buys a car magazine 38% Target advertising segment for car magazines Age > 48.5 Age <= 48.5 Income > 34.5 Income <= % 92% 100% 47% No one over 48.5 years of age buys a car magazine Age > 31.5 Age <= 31.5 Everyone in this DB who has an income of less than $34, 500 AND buys a car magazine is less than 31.5 years of age 46% 0% KDD Denise Ecklund KDD-10
11 Neural Networks Car Input nodes are connected to output nodes by a set of hidden nodes and edges House Sports Music Inputs describe DB instances Outputs are the categories we want to recognize Input Nodes Hidden Layer Nodes Output Nodes Comic Hidden nodes assign weights to each edge so they represent the weight of relationships between the input and the output of a large set of training data KDD-21 Training and Mining Training Phase (learning) Age < Age < 50 Age 50 Income < Income < Income < 50 Income 50 Car House Input Nodes Hidden Layer Nodes Output Nodes Car House Sports Music Comic Code all DB data as 1 s and 0 s Set all edge weights to prob = 0 Input each coded database record Check that the output is correct The system adjusts the edge weights to get the correct answer Mining Phase (recognition) Input a new instance coded as 1 s and 0 s Output is the classification of the new instance Issues: Training sample must be large to get good results Network is a black box, it does not tell why an instance gives a particular output (no theory) KDD Denise Ecklund KDD-11
12 An Explosion of Mining Results Mining tools can output thousands of rules and associations (collectively called patterns) When is a pattern interesting? 1) If it can be understood by humans 2) The pattern is strong (i.e., valid for many new data records) 3) It is potentially useful for your business needs 4) It is novel (i.e., previously unknown) Metrics of interestingness Simplicity: Rule Length for (A B) is the number of conjunctive conditions or the number of attributes in the rule Confidence: Rule Strength for (A B) is the conditional probability that A implies B (#recs with A & B)/(#recs with A) Support: Support for (A B) is the number or percentage of DB records that include A and B Novelty: Rule Uniqueness for (A B) means no other rule includes or implies this rule KDD-23 Genetic Algorithms Based on Darwin s theory of survival of the fittest Living organisms reproduce, individuals evolve/mutate, individuals survive or die based on fitness The output of a genetic algorithm is the set of fittest solutions that will survive in a particular environment The input is an initial set of possible solutions The process Produce the next generation (by a cross-over function) Evolve solutions (by a mutation function) Discard weak solutions (based on a fitness function) KDD Denise Ecklund KDD-12
13 Classification Using Genetic Algorithms Suppose we have money for 5 marketing campaigns, so we wish to cluster our magazine customers into 5 target marketing groups Customers with similar attribute values form a cluster (assumes similar attributes => similar behavior) Preparation: Define an encoding to represent solutions (i.e., use a character sequence to represent a cluster of customers) Create 5 possible initial solutions (and encode them as strings) Define the 3 genetic functions to operate on a cluster encoding Cross-over( ), Mutate( ), Fitness_Test( ) KDD-25 Genetic Algorithms - Initialization Define 5 initial solutions Use a subset of the database to create a 2-dim scatter plot Map customer attributes to 2 dimensions Divide the plot into 5 regions Calculate an initial solution point (guide point) in each region Equidistant from region lines Define an encoding for the solutions Strings for the customer attribute values Encode each guide point Voronoi Diagram Guide Point #1 10 Attribute Values KDD Denise Ecklund KDD-13
14 Genetic Algorithms Evolution 1st Gen Solution Solution Solution Mutation Cross-over Fitness 16 creating 2 children 2nd Gen Solution Solution 5 Fitness 14.5 Fitness 17 Fitness Fitness 33 Fitness 13 Fitness Cross-over function Create 2 children Take 6 attribute values from one parent and 4 from the other Mutate function Randomly switch several attribute values from values in the sample subspace Fitness function: Average distance between the solution point and all the points in the sample subspace Stop when the solutions change very little KDD-27 Selecting a Mining Mechanism Multiple mechanisms can be used to answer a question select the mechanism based on your requirements Many recs Many Numeric String attrs values values Learn rules Learn incre Est stat L-Perf signif disk L-Perf cpu A-Perf disk A-Perf cpu Decision Trees Neural Networks Genetic Algs Good Avg Poor Quality of Input Quality of Output Learning Performance Application Performance KDD Denise Ecklund KDD-14
15 So... Artificial Intelligence is fun,... but what does this have to do with database? mining is just another database application mining applications have requirements The database system can help mining applications meet their requirements So, what are the challenges for data mining systems and how can the database system help? KDD-29 base Support for DM Challenges Mining Challenge Support many data mining mechanisms in a KDD system Support interactive mining and incremental mining Guide the mining process by integrity constraints Determine usefulness of a data mining result Help humans to better understand mining results (e.g., visualization) base Support Support multiple DB interfaces (for different DM mechanisms) Intelligent caching and support for changing views over query results Make integrity constraints query-able (meta-data) Gather and output runtime statistics on DB support, confidence, and other metrics Prepare output data for selectable presentation formats KDD Denise Ecklund KDD-15
16 base Support for DM Challenges Mining Challenge Accurate, efficient, and flexible methods for data cleaning and data encoding Improved performance for data mining mechanisms Ability to mine a wide variety of data types Support web mining Define a data mining query language base Support Programmable data warehouse tools for data cleaning and encoding; Fast, runtime re-encoding Parallel data retrieval and support for incremental query processing New indexing and access methods for non-traditional data types Extend XML and WWW database technology to support large, long running queries Extend SQL, OQL and other interfaces to support data mining KDD-31 Commercial Mining Products and Tools Some DB companies with Mining products: Oracle Oracle 9i, with BI-Beans (an OLAP toolset) IBM Miner for and Miner for Text NCR TeraMiner for the Tera warehouse Companies with Mining tools COGNOS Scenario, a set of DM and OLAP tools Elseware Classpad (classification) and Previa (prediction) Logic Programming Associates, Ltd. mite (clustering) Prudential System Software GmbH credit card fraud RedShed Software Dowser (association discovery) KDD Denise Ecklund KDD-16
17 Conclusions To support data mining, we should Enhance database technologies developed for warehouses Very large database systems Object-oriented (navigational) database systems View Management mechanisms Heterogeneous databases Create new access methods basedonminingaccess Develop query languages or other interfaces to better support data mining mechanisms Select and integrate the needed DB technologies KDD-33 Studying for the exam? Send questions to the appropriate lecturer. There will be a review session , hopefully with questions and answers. For the success of the last session, please send questions in advance, preferably by mail and preferably to the right lecturer. KDD Denise Ecklund KDD-17
Data Mining. Vera Goebel. Department of Informatics, University of Oslo
Data Mining Vera Goebel Department of Informatics, University of Oslo 2011 1 Lecture Contents Knowledge Discovery in Databases (KDD) Definition and Applications OLAP Architectures for OLAP and KDD KDD
Knowledge Discovery in Databases (KDD) - Data Mining (DM)
Knowledge Discovery in Databases (KDD) - Data Mining (DM) Vera Goebel Department of Informatics, University of Oslo 2015 1! Why Data Mining / KDD? Applications Banking: loan/credit card approval predict
OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP
Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key
Introduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
Week 3 lecture slides
Week 3 lecture slides Topics Data Warehouses Online Analytical Processing Introduction to Data Cubes Textbook reference: Chapter 3 Data Warehouses A data warehouse is a collection of data specifically
Data W a Ware r house house and and OLAP II Week 6 1
Data Warehouse and OLAP II Week 6 1 Team Homework Assignment #8 Using a data warehousing tool and a data set, play four OLAP operations (Roll up (drill up), Drill down (roll down), Slice and dice, Pivot
Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of
Data Warehouse design
Data Warehouse design Design of Enterprise Systems University of Pavia 21/11/2013-1- Data Warehouse design DATA PRESENTATION - 2- BI Reporting Success Factors BI platform success factors include: Performance
Chapter 5. Warehousing, Data Acquisition, Data. Visualization
Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier
Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.
from Larson Text By Susan Miertschin
Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
DATA WAREHOUSING AND OLAP TECHNOLOGY
DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Fluency With Information Technology CSE100/IMT100
Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999
Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms
Data Mining Techniques forcrm Data Mining The non-trivial extraction of novel, implicit, and actionable knowledge from large datasets. Extremely large datasets Discovery of the non-obvious Useful knowledge
BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT
BUILDING BLOCKS OF DATAWAREHOUSE G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT 1 Data Warehouse Subject Oriented Organized around major subjects, such as customer, product, sales. Focusing on
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
Building Data Cubes and Mining Them. Jelena Jovanovic Email: [email protected]
Building Data Cubes and Mining Them Jelena Jovanovic Email: [email protected] KDD Process KDD is an overall process of discovering useful knowledge from data. Data mining is a particular step in the
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2
Class Announcements TIM 50 - Business Information Systems Lecture 15 Database Assignment 2 posted Due Tuesday 5/26 UC Santa Cruz May 19, 2015 Database: Collection of related files containing records on
Importance or the Role of Data Warehousing and Data Mining in Business Applications
Journal of The International Association of Advanced Technology and Science Importance or the Role of Data Warehousing and Data Mining in Business Applications ATUL ARORA ANKIT MALIK Abstract Information
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Data Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
An Overview of Database management System, Data warehousing and Data Mining
An Overview of Database management System, Data warehousing and Data Mining Ramandeep Kaur 1, Amanpreet Kaur 2, Sarabjeet Kaur 3, Amandeep Kaur 4, Ranbir Kaur 5 Assistant Prof., Deptt. Of Computer Science,
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
Data Mining System, Functionalities and Applications: A Radical Review
Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially
Outlines. Business Intelligence. What Is Business Intelligence? Data mining life cycle
Outlines Business Intelligence Lecture 15 Why integrate BI into your smart client application? Integrating Mining into your application Integrating into your application What Is Business Intelligence?
DATA WAREHOUSE E KNOWLEDGE DISCOVERY
DATA WAREHOUSE E KNOWLEDGE DISCOVERY Prof. Fabio A. Schreiber Dipartimento di Elettronica e Informazione Politecnico di Milano DATA WAREHOUSE (DW) A TECHNIQUE FOR CORRECTLY ASSEMBLING AND MANAGING DATA
OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH
OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 Online Analytic Processing OLAP 2 OLAP OLAP: Online Analytic Processing OLAP queries are complex queries that Touch large amounts of data Discover
CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing
CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing Class Projects Class projects are going very well! Project presentations: 15 minutes On Wednesday
DATA WAREHOUSING - OLAP
http://www.tutorialspoint.com/dwh/dwh_olap.htm DATA WAREHOUSING - OLAP Copyright tutorialspoint.com Online Analytical Processing Server OLAP is based on the multidimensional data model. It allows managers,
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Content Problems of managing data resources in a traditional file environment Capabilities and value of a database management
Distance Learning and Examining Systems
Lodz University of Technology Distance Learning and Examining Systems - Theory and Applications edited by Sławomir Wiak Konrad Szumigaj HUMAN CAPITAL - THE BEST INVESTMENT The project is part-financed
Week 13: Data Warehousing. Warehousing
1 Week 13: Data Warehousing Warehousing Growing industry: $8 billion in 1998 Range from desktop to huge: Walmart: 900-CPU, 2,700 disk, 23TB Teradata system Lots of buzzwords, hype slice & dice, rollup,
Data Mining for Fun and Profit
Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
Chapter 6 - Enhancing Business Intelligence Using Information Systems
Chapter 6 - Enhancing Business Intelligence Using Information Systems Managers need high-quality and timely information to support decision making Copyright 2014 Pearson Education, Inc. 1 Chapter 6 Learning
OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA
OLAP and OLTP AMIT KUMAR BINDAL Associate Professor Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information, which is created by data,
IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users
1 IT and CRM A basic CRM model Data source & gathering Database Data warehouse Information delivery Information users 2 IT and CRM Markets have always recognized the importance of gathering detailed data
Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives
Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives Describe how the problems of managing data resources in a traditional file environment are solved
European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project
European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project Janet Delve, University of Portsmouth Kuldar Aas, National Archives of Estonia Rainer Schmidt, Austrian Institute
Anwendersoftware Anwendungssoftwares a. Data-Warehouse-, Data-Mining- and OLAP-Technologies. Online Analytic Processing
Anwendungssoftwares a Data-Warehouse-, Data-Mining- and OLAP-Technologies Online Analytic Processing Online Analytic Processing OLAP Online Analytic Processing Technologies and tools that support (ad-hoc)
Nine Common Types of Data Mining Techniques Used in Predictive Analytics
1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better
Web Data Mining: A Case Study. Abstract. Introduction
Web Data Mining: A Case Study Samia Jones Galveston College, Galveston, TX 77550 Omprakash K. Gupta Prairie View A&M, Prairie View, TX 77446 [email protected] Abstract With an enormous amount of data stored
Introduction to Data Warehousing. Ms Swapnil Shrivastava [email protected]
Introduction to Data Warehousing Ms Swapnil Shrivastava [email protected] Necessity is the mother of invention Why Data Warehouse? Scenario 1 ABC Pvt Ltd is a company with branches at Mumbai,
Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer
Data Warehousing Overview, Terminology, and Research Issues 1 Heterogeneous Database Integration Integration System World Wide Web Digital Libraries Scientific Databases Personal Databases Collects and
IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance
Data Sheet IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance Overview Multidimensional analysis is a powerful means of extracting maximum value from your corporate
2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000
2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000 Introduction This course provides students with the knowledge and skills necessary to design, implement, and deploy OLAP
CHAPTER 5: BUSINESS ANALYTICS
Chapter 5: Business Analytics CHAPTER 5: BUSINESS ANALYTICS Objectives The objectives are: Describe Business Analytics. Explain the terminology associated with Business Analytics. Describe the data warehouse
Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University
Bussiness Intelligence and Data Warehouse Schedule Bussiness Intelligence (BI) BI tools Oracle vs. Microsoft Data warehouse History Tools Oracle vs. Others Discussion Business Intelligence (BI) Products
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
White Paper. How Streaming Data Analytics Enables Real-Time Decisions
White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream
Data Mining Techniques
15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Hexaware E-book on Predictive Analytics
Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,
Foundations of Business Intelligence: Databases and Information Management
Chapter 6 Foundations of Business Intelligence: Databases and Information Management 6.1 2010 by Prentice Hall LEARNING OBJECTIVES Describe how the problems of managing data resources in a traditional
Data Warehouse: Introduction
Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,
Concept and Applications of Data Mining. Week 1
Concept and Applications of Data Mining Week 1 Topics Introduction Syllabus Data Mining Concepts Team Organization Introduction Session Your name and major The dfiiti definition of dt data mining i Your
Foundations of Business Intelligence: Databases and Information Management
Chapter 5 Foundations of Business Intelligence: Databases and Information Management 5.1 Copyright 2011 Pearson Education, Inc. Student Learning Objectives How does a relational database organize data,
Course 103402 MIS. Foundations of Business Intelligence
Oman College of Management and Technology Course 103402 MIS Topic 5 Foundations of Business Intelligence CS/MIS Department Organizing Data in a Traditional File Environment File organization concepts Database:
Chapter 14: Databases and Database Management Systems
15 th Edition Understanding Computers Today and Tomorrow Comprehensive Chapter 14: Databases and Database Management Systems Deborah Morley Charles S. Parker Copyright 2015 Cengage Learning Learning Objectives
1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining
1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining techniques are most likely to be successful, and Identify
Data Mining + Business Intelligence. Integration, Design and Implementation
Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution
B.Sc (Computer Science) Database Management Systems UNIT-V
1 B.Sc (Computer Science) Database Management Systems UNIT-V Business Intelligence? Business intelligence is a term used to describe a comprehensive cohesive and integrated set of tools and process used
Data Mining for Everyone
Page 1 Data Mining for Everyone Christoph Sieb Senior Software Engineer, Data Mining Development Dr. Andreas Zekl Manager, Data Mining Development Page 2 Executive Summary Contents 2 Data mining in the
Subject Description Form
Subject Description Form Subject Code Subject Title COMP417 Data Warehousing and Data Mining Techniques in Business and Commerce Credit Value 3 Level 4 Pre-requisite / Co-requisite/ Exclusion Objectives
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Data Mining for Successful Healthcare Organizations
Data Mining for Successful Healthcare Organizations For successful healthcare organizations, it is important to empower the management and staff with data warehousing-based critical thinking and knowledge
CHAPTER 4: BUSINESS ANALYTICS
Chapter 4: Business Analytics CHAPTER 4: BUSINESS ANALYTICS Objectives Introduction The objectives are: Describe Business Analytics Explain the terminology associated with Business Analytics Describe the
Data Warehousing and Data Mining
Data Warehousing and Data Mining Winter Semester 2010/2011 Free University of Bozen, Bolzano DW Lecturer: Johann Gamper [email protected] DM Lecturer: Mouna Kacimi [email protected] http://www.inf.unibz.it/dis/teaching/dwdm/index.html
DATA MINING AND WAREHOUSING CONCEPTS
CHAPTER 1 DATA MINING AND WAREHOUSING CONCEPTS 1.1 INTRODUCTION The past couple of decades have seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation
M2074 - Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000 5 Day Course
Module 1: Introduction to Data Warehousing and OLAP Introducing Data Warehousing Defining OLAP Solutions Understanding Data Warehouse Design Understanding OLAP Models Applying OLAP Cubes At the end of
Introduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Data Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011
DATA MINING CONCEPTS AND TECHNIQUES Marek Maurizio E-commerce, winter 2011 INTRODUCTION Overview of data mining Emphasis is placed on basic data mining concepts Techniques for uncovering interesting data
A Survey on Web Research for Data Mining
A Survey on Web Research for Data Mining Gaurav Saini 1 [email protected] 1 Abstract Web mining is the application of data mining techniques to extract knowledge from web data, including web documents,
Data Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI
Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA ([email protected]) Faculty of Computer Science, University of Indonesia Objectives
Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data
INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are
TIM 50 - Business Information Systems
TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz March 1, 2015 The Database Approach to Data Management Database: Collection of related files containing records on people, places, or things.
DBTech Pro Workshop. Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining. Georgios Evangelidis
DBTechNet DBTech Pro Workshop Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining Dimitris A. Dervos [email protected] http://aetos.it.teithe.gr/~dad Georgios Evangelidis
Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing
Data Warehousing Outline Overview of data warehousing Dimensional Modeling Online Analytical Processing From OLTP to the Data Warehouse Traditionally, database systems stored data relevant to current business
Data Warehousing and OLAP Technology for Knowledge Discovery
542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories
Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1
Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints
BUILDING OLAP TOOLS OVER LARGE DATABASES
BUILDING OLAP TOOLS OVER LARGE DATABASES Rui Oliveira, Jorge Bernardino ISEC Instituto Superior de Engenharia de Coimbra, Polytechnic Institute of Coimbra Quinta da Nora, Rua Pedro Nunes, P-3030-199 Coimbra,
SQL Server 2012 Business Intelligence Boot Camp
SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations
CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes Final Exam Overview Open books and open notes No laptops and no other mobile devices
