CSC 177 Data warehouse and Mining project. Pooja Vora Vishma Shah Guided by Prof. Meiliu lu



Similar documents
Data Mining and Data Warehousing on US Farmer s Data

CSC 177 Fall 2014 Team Project Final Report

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

Data W a Ware r house house and and OLAP II Week 6 1

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

DATA WAREHOUSING - OLAP

Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5 Days

Week 3 lecture slides

LEARNING SOLUTIONS website milner.com/learning phone

Microsoft Implementing Data Models and Reports with Microsoft SQL Server

CHAPTER 5: BUSINESS ANALYTICS

Building Data Cubes and Mining Them. Jelena Jovanovic

Implementing Data Models and Reports with Microsoft SQL Server

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

CHAPTER 4: BUSINESS ANALYTICS

Business Intelligence & Product Analytics

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools

SAS Business Intelligence Online Training

When to consider OLAP?

Module 1: Introduction to Data Warehousing and OLAP

Data Mining as Part of Knowledge Discovery in Databases (KDD)

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

Avoiding Common Analysis Services Mistakes. Craig Utley

Presented by: Jose Chinchilla, MCITP

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

DATA WAREHOUSING AND OLAP TECHNOLOGY

Online Courses. Version 9 Comprehensive Series. What's New Series

Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

MS 50511A The Microsoft Business Intelligence 2010 Stack

Introduction to Data Mining

Driving Peak Performance IBM Corporation

New Approach of Computing Data Cubes in Data Warehousing

CASE PROJECTS IN DATA WAREHOUSING AND DATA MINING

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

Monitoring Genebanks using Datamarts based in an Open Source Tool

Anwendersoftware Anwendungssoftwares a. Data-Warehouse-, Data-Mining- and OLAP-Technologies. Online Analytic Processing

DATA CUBES E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

Week 13: Data Warehousing. Warehousing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

DATA WAREHOUSE E KNOWLEDGE DISCOVERY

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Introduction Predictive Analytics Tools: Weka

Optimizing Your Data Warehouse Design for Superior Performance

Distance Learning and Examining Systems

Subject Description Form

Application of Data Warehouse and Data Mining. in Construction Management

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Fluency With Information Technology CSE100/IMT100

OLAP Theory-English version

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Business Intelligence: Real ROI Using the Microsoft Business Intelligence Platform. April 6th, 2006

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Part 22. Data Warehousing

SQL Server 2012 Business Intelligence Boot Camp

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Prerequisites. Course Outline

SQL Server Administrator Introduction - 3 Days Objectives

Implementing Data Models and Reports with Microsoft SQL Server

from Larson Text By Susan Miertschin

Establish and maintain Center of Excellence (CoE) around Data Architecture

Informationslogistik Unit 10: OLTP, OLAP, SAP, Data Warehouse, and Object-relational Databases

An Introduction to WEKA. As presented by PACE

The Microsoft Business Intelligence 2010 Stack Course 50511A; 5 Days, Instructor-led

Cleveland State University

Welcome To Today s Webinar: Dynamics Insights SM for Microsoft Dynamics AX

CHAPTER 4 Data Warehouse Architecture

End to End Microsoft BI with SQL 2008 R2 and SharePoint 2010

OLAP Systems and Multidimensional Expressions I

Migrating a Discoverer System to Oracle Business Intelligence Enterprise Edition

Implementing a Data Warehouse with Microsoft SQL Server 2012

M Designing and Implementing OLAP Solutions Using Microsoft SQL Server Day Course

Data Search. Searching and Finding information in Unstructured and Structured Data Sources

Data Warehouse design

DECISION SUPPORT SYSTEM FOR SEISMIC RISKS

Introduction to Data Warehousing. Ms Swapnil Shrivastava

Designing a Dimensional Model

Data Warehousing Concepts

OLAP and Data Warehousing! Introduction!

Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days

Data Warehousing and OLAP

Data Warehouse: Introduction

Chapter 6 - Enhancing Business Intelligence Using Information Systems

DATA WAREHOUSING APPLICATIONS: AN ANALYTICAL TOOL FOR DECISION SUPPORT SYSTEM

DBTech Pro Workshop. Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining. Georgios Evangelidis

Web Log Data Sparsity Analysis and Performance Evaluation for OLAP

John R. Vacca INSIDE

Transcription:

CSC 177 Data warehouse and Mining project Pooja Vora Vishma Shah Guided by Prof. Meiliu lu

Agenda Data Warehouse Project Introduction Background Scope of study Implementation Data Cleaning and Preprocessing Data Mart Data Mining Project Introduction Background Scope of study Implementation Data mining Learning experience Future Scope References

Data Warehouse Introduction The objective of our project is to create a data mart with star schema Data mart will be used to find answers related to various company key factors and statistics.

Background Source website : Navathe company schema Dataset : Company dataset Company dataset : Fact table - 7 attribtues,1000 entries

Scope Of Study Data Preprocessing Microsoft Office Excel Microsoft SQL Server Data Mart Microsoft SQL server, Visio, convertcsvtosql Olap Operations SQL server queries

Implementation Data Cleaning & Preprocessing Data Mart Olap Operations

Data Cleaning & Preprocessing The company schema had different tables as per navathe, we also added few dimension for analytical processing and created a fact table with star schema.

Data Mart We have 5 dimension tables in our data mart and one fact table which forms star schema. The Fact table tables consists of around 1000 rows having various details about ssn, project, work_id etc

Star Schema

Data Mart Question-Answers How many products were produced over the months? Rollup How to find employee current working project? Slicing on employee dimension How to find the statistics of days where more than 5 products were produced Dicing on product and work dimension How to find which days and how many products of particular product were produced? Scoping

Olap Operations Example Roll Up select t.date_year, t.date_month, sum(w.numberofproduct) as 'No. Of Products' from EmpFactTable f, DimTime t, DimEmp_work_record w where f.date_key= t.date_key and f.work_id = w.work_id group by date_year, date_month with rollup date_year date_month No. Of Products 2014 1 980 2014 2 761 2014 3 1274 2014 4 240 2014 NULL 3255 NULL NULL 3255 winning month

Quiz Which dimension was used for slicing cube? Employee Time Work Product Answer - Employee

Data Mining Project

Perform Data mining on data set to discover knowledge Apply data mining algorithms using tools compare the performance of algorithms using these tools. Compare the tools performance Introduction

Background Source Website www.data.gov Dataset : Consumer complaints Data: - 14 attribtues, 55000 entries (Data from 2012 to 2014)

Scope Of Study Data Preprocessing Microsoft Office Excel Tools (Weka, Rapidminer) Data Mining Tools : Weka, Rapidminer Algorithms : K-Means, Naïve Bayes

Implementation Data Cleaning & Preprocessing Data Mining Tools Comparision

Data Cleaning & Preprocessing Data Cleaning - Replaced missing values with unknown Data selection Selected Consumer complaints data of two months (Sept, Oct) for mining Sample Data selected as 3000 rows

Data Mining We have used One Classification & One Clustering Algorithm Classification Naïve Bayes Clustering K-means

Data Mining Demo

Tools Comparision : K-Means Rapid Miner Weka

Tools Comparision : Naïve Bayes Rapidminer Weka

Quiz Which Clustering Algorithm was used for data mining? K-Means EM Answer K-means

Learning Experience Learned the analytical processing through data mart project. Helped to improve knowledge for Database statistics Learned to gain information out of the querying results. Learned different data mining tools like weka and rapid miner Improved understanding of various algorithms and their practical implementation through tools Learned to make sense out of the results obtained from the tools

Future Scope Data Warehouse Create a snowflake schema by introducing dimension like employee types contractors/fulltime and then take it further for analytical processing for different statistics Data Mining Can implement other algorithms and tools like orange etc

References Elmasri and Navathe, Fundamentals of Database System, 6th Edition, Addison-Wesley Publishing OLAP Courseware http://athena.ecs.csus.edu/~olap/olap/introduction.php DM dataset http://www.data.gov/consumer/ Data Mining Courseware http://athena.ecs.csus.edu/~datamini https://rapidminer.com/wpcontent/uploads/2013/10/rapidminer_ra pidminerinacademicuse_en.pdf

Questions.