TURKISH ORACLE USER GROUP



Similar documents
What Are They Thinking? With Oracle Application Express and Oracle Data Miner

Anomaly and Fraud Detection with Oracle Data Mining 11g Release 2

Azure Machine Learning, SQL Data Mining and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Microsoft Azure Machine learning Algorithms

The Data Mining Process

The? Data: Introduction and Future

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Data Mining + Business Intelligence. Integration, Design and Implementation

Cross Validation. Dr. Thomas Jensen Expedia.com

Oracle Data Mining. Concepts 11g Release 2 (11.2) E

SAP Predictive Analytics: An Overview and Roadmap. Charles Gadalla, SESSION CODE: 603

Predictive modelling around the world

Data Mining. Dr. Saed Sayad. University of Toronto

Tax Fraud in Increasing

Exadata V2 + Oracle Data Mining 11g Release 2 Importing 3 rd Party (SAS) dm models

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Oracle Advanced Analytics Oracle R Enterprise & Oracle Data Mining

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

How can we discover stocks that will

Predicting Market Value of Soccer Players Using Linear Modeling Techniques

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Car Insurance. Prvák, Tomi, Havri

Data Mining. Nonlinear Classification

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

Data Mining Solutions for the Business Environment

2015 Workshops for Professors

Starting Smart with Oracle Advanced Analytics

Model Validation Techniques

Data Mining Part 5. Prediction

Data Mining Algorithms Part 1. Dejan Sarka

Obtaining Value from Big Data

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

This Symposium brought to you by

REVIEW OF ENSEMBLE CLASSIFICATION

Oracle Data Mining. Concepts 11g Release 2 (11.2) E

Data Mining - Evaluation of Classifiers

MS1b Statistical Data Mining

Equity forecast: Predicting long term stock price movement using machine learning

How To Use Data Mining For Loyalty Based Management

Advanced Analytics for Call Center Operations

Chapter 6. The stacking ensemble approach

IBM SPSS Modeler Professional

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Predictive Modeling and Big Data

A Logistic Regression Approach to Ad Click Prediction

Introduction to Data Mining

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

The Oracle Data Mining Machine Bundle: Zero to Predictive Analytics in Two Weeks Collaborate 15 IOUG

Prerequisites. Course Outline

Oracle Data Mining. Concepts 11g Release 1 (11.1) B

Getting Value from Big Data with Analytics

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Employer Health Insurance Premium Prediction Elliott Lui

So, how do you pronounce. Jilles Vreeken. Okay, now we can talk. So, what kind of data? binary. * multi-relational

Statistics for BIG data

Oracle Data Mining Hands On Lab

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Data Mining Part 5. Prediction

Sunnie Chung. Cleveland State University

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Getting Started with Oracle Data Miner 11g R2. Brendan Tierney

Customer Classification And Prediction Based On Data Mining Technique

Predicting Flight Delays

IBM SPSS Modeler 15 In-Database Mining Guide

Big Data and Data Science: Behind the Buzz Words

Active Learning SVM for Blogs recommendation

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

Fraud and Anomaly Detection Using Oracle Advanced Analytic Option 12c

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

SEIZE THE DATA SEIZE THE DATA. 2015

Machine Learning over Big Data

The Use of Open Source Is Growing. So Why Do Organizations Still Turn to SAS?

INDIAN STATISTICAL INSTITUTE announces Training Program on Statistical Techniques for Data Mining & Business Analytics

DATA MINING AND WAREHOUSING CONCEPTS

Oracle Data Miner (Extension of SQL Developer 4.0)

High-Performance Analytics

ALGORITHMIC TRADING USING MACHINE LEARNING TECH-

Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90

Data Mining with Oracle Database 11g Release 2

BIG DATA What it is and how to use?

MHI3000 Big Data Analytics for Health Care Final Project Report

Mining. Practical. Data. Monte F. Hancock, Jr. Chief Scientist, Celestech, Inc. CRC Press. Taylor & Francis Group

Predictive Analytics Certificate Program

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

Improve Model Accuracy with Unstructured Data

extreme Datamining mit Oracle R Enterprise

Transcription:

TURKISH ORACLE USER GROUP Data Mining in 30 Minutes Husnu Sensoy Global Maksimum Data & Information Tech. Founder VLDB Expert

Agenda Who am I? Different problems of Data Mining In database data mining?!? German Credit Ad-hoc Attacks DBMS_PREDICTIVE_ANALYTICS Attribute Importance Training/Validation Set Training Evaluation Conclusion

Who am I? Co-Founder of TROUG Oracle ACED on BI DBA of The Year 2009 Senior Member of Oracle DWH CAB Exadata Implementation Specialist

Data Mining Data Information Knowledge In DWH 1.0 we have accumulated sufficient amount of columns and rows. Classical reporting is nothing both rotating, folding, cutting, and pasting the same data again and again. It is just DATA TRANSFORMATION. User should infer the information and knowledge if lucky. Data Mining is all about creating information/insight about your business. Data Scientists are/will be the actual founders of BI environment what we have meant a few decades ago.

Different Problems of Data Mining Classification Regression Outlier Detection Basket Analysis Social Network Analysis Sentiment Analysis

In Database Data Mining 90% of data mining is all about finding the correct inputs In contrast to common belief using fancy algorithms will not improve your results by large factors Finding correct inputs is a matter of Join Group By Densification Database Management Systems are still the best place to handle those operations

German Credit Scoring SOLVING A SAMPLE PROBLEM

Details of SAMPLE DATA 20 Different Inputs A few examples Status of existing checking account Credit History Purpose Credit amount. Details : http://archive.ics.uci.edu/ml/datasets/statlog+%28german+credit+data%29 Classification Target: 1 for Good for Credit, 2 for Bad for Credit

Adhoc Attacks The first trials are always (and should be) adhoc. What is the distribution of Good and Bad Candidates? (Prior) Do we need any strafied sampling? What is the distribution of Good and Bad Candidates given that a variable X get value Y? (Posterior) Correlation between each variable and target value?

Data Mining at Speed of Light DBMS_PREDICTIVE_ANALYTICS functions allow us to perform mining activity very quickly: PREDICT: Support Vector Machine (SVM) model to perform credit score prediction. PROFILE: Decision Tree based explanatory model EXPLAIN: Minimum Descriptive Length (MDL) based attribute importance algorithm.

Attribute Importance Some mining problems may contain extremely high number of attributes: Amazon Access Sample : 20000 attributes Amazon Commerce Reviews Set : 10000 attributes URL Reputation: 3231961 attributes Reducing the number of attributes before any analysis will let you See trees in forest Move quickly Use less resources

Training vs. Validation Set In order to deliver unbiased performance results for data models, training and validation sets should be exclusive. There are different techniques used in literature %X validation vs %(100-X) training K-fold cross validation Ensure that your method is Suitable for your problem type Statistically stable and sound.

Model Build Oracle Data Miner offers several algorithms for data modeling Naïve Bayesian Decision Tree Generalize Linear Model (GLM) Support Vector Machine Remember that all model requires a unique identifier in data set.

Evaluation & Scoring Obviously final point is on how well you did with your model. This step is usually told to be evaluation Once you are sure that your model is sufficiently accurate final step is to score a given customer for credit Batch Real-time

Conclusion Remember that 90% of data modeling is all about adhoc attacks. That makes in database mining very appealing A crude understanding of your data might save huge amount of time. Some problems may ask for input set reduction DBMS_PREDICTIVE_ANALYTICS is the adhoc way of data modeling. For model evaluation & scoring use prediction and prediction_probability operators of SQL.

TEŞEKKÜRLER Husnu Sensoy http://husnusensoy.wordpress.com