Data Mining and Data Warehousing on US Farmer s Data Guide: Dr. Meiliu Lu Presented By, Yogesh Isawe Kalindi Mehta Aditi Kulkarni
* Data Warehousing Project * Introduction * Background * Technologies Explored * Implementation Steps * Demo * Future scope * Data Mining Project * Objective * Algorithm Applied * Demo * Learning Experience * References Agenda
Introduction * The primary objective of our project is to design data mart. * We have used Star schema to generate it. * This data mart answers questions related to US farmers market.
Background * Source : http://catalog.data.gov/dataset/farmers- markets- geographic- data * Dataset: US Farmer s Market Data * Farmer s Market Dataset: Fact table 5 Dimensions, 1907 records
Technologies Explored * Data Preprocessing * Microsoft Excel Spreadsheet * MySQL Server * Data Mart * MySQL Server * CSV to SQL Converter * PHP * Ajax * JQuery * Twitter Bootstrap * OLAP Operations * SQL Server Queries
Implementation Steps * Data Cleaning and Preprocessing * Data Mart * OLAP Operations
Data Cleaning and Preprocessing * Original data had 8000 rows, we trimmed data to 1907 rows. * Add missing values using SQL Script * Season duration is not consistent. To maintain consistency we add two columns for season start and end
SQL Script
Data Mart * Data mart is implemented on star schema base * Data Mart provided following information to user * Market Name, Address, Goods and Nutrition program available, Season details on basis of below attributes * State * City * Goods * Nutrition Program * Season Duration * Location Type
Market Market_ID Market_Name Website Goods Goods_ID Beakgoods Cheese Meat Wine Location Location_ID Location_Type Street State Zip Fact Table Market_ID Location_ID Season_ID Program_ID Goods_ID Program Program_ID WIC WICCash SNAP SFMNP Season Season_ID Season_start Season_end Star Schema
Database Queries * select m.market_id, m.market_name, CONCAT(l.street,l.city,l.state,l.zip) AS Address, s.season_start, s.season_end, l.location_type,p.wic,p.wicash, p.sfmnp,p.snap, g.bakedgoods,g.cheese, g.crafts,g.flowers,g.eggs,g.seafood,g.herbs,g.vegetables,g.honey,g.jams,g.maple,g.meat, g.nursery, g.nuts,g.plants,g.ploutry,g.prepared,g.soap,g.trees,g.wine from Season s,fact_table as f,market_details as m,program as p,location as l,goods as g where s.season_start >'$season_start' and s.season_end < '$season_end' and s.season_id=f.season_id
Fun Quiz * How many dimensions we have used for star schema? A. 6 B. 5
DEMO
Future Scope * Privileged user can insert new records in future * Integrate Google Maps for location and directions * Develop Mobile Application * Apply UI Validations and filtering option on data
DATA MINING PROJECT
Objective * Mining data to extract knowledge from available data. * Explore different data mining tools. * Apply different data mining algorithms to US Farmers Market Data
Algorithms Applied * Tool Used * Weka * Classification Algorithm * Logistic Algorithm * J48 * Clustering Algorithm * K- Means * EM Algorithm
Fun Quiz * Which tool is used for Data Mining? 1. Weka 2. Rapid Miner
DEMO
Classification Algorithms
Histogram of states on goods class
Logistic Algorithm On class SFMNP
J48 Algorithms with class Bake Goods
Decision Tree for class Bakesgoods
Clustering Algorithms
Simple K- Means Algorithm
EM Algorithm applied on Nutrition Program
Learning Experience * Analytical processing * Learned different data mining tools like Weka, rapid Miner * Learned about real time application for different data mining algorithms * Learn about new technologies like PHP, Ajax, JQuery, Twitter Bootstrap
References * Data Source: http://catalog.data.gov/dataset/farmers- markets- geographic- data * Weka Tutorial: http://youtu.be/m7kpibgedki * Rapid Miner Tutorial: https://www.youtube.com/watch? v=eyyghzsvzpm&list=pllyinnlbo1evvz2wjlwrp_jwgg 5It1O6
Questions