OLSUG Workshop Oracle Data Mining
|
|
- Donna Houston
- 8 years ago
- Views:
Transcription
1
2 OLSUG Workshop Oracle Data Mining Charlie Berger Sr. Director of Product Mgmt, Life Sciences and Data Mining Oracle Corporation Dr. Lutz Hamel Asst. Professor, Computer Science University of Rhode Island Carolyn K. Hamm, Ph.D. Chief, Decision Support Center Walter Reed Army Medical Center Washington, DC x Carolyn.Hamm@NA.AMEDD.ARMY.MIL
3 Oracle Data Mining Workshop Oracle Data Mining overview Data mining process & example use cases Explore, build, test, cluster, etc. Clustering and more at URI
4 Oracle Data Mining Platform for data mining PL/SQL API Java API Oracle Data Miner (GUI) Wide range of algorithms Classification Support Vector Machines, Naïve Bayes, Adaptive Bayes Networks Attribute Importance Association Rules Clustering Enhanced K-Means, Orthogonal Clustering Nonnegative Matrix Factorization (feature extraction) BLAST (Sequence similarity search & alignment)
5 Oracle Data Mining Algorithms & Example Applications Attribute Importance Identify most influential attributes for a target attribute Factors associated a disease Promising leads Classification and Prediction Predict most likely to: Regression Doctors who prescribe a new drug Patients who respond to a treatment Predict a numeric value Predict a value Predict the size tumor will be reduced A1 A2 A3 A4 A5 A6 A7
6 Oracle Data Mining Algorithms & Example Applications Clustering Find naturally occurring groups Gene clusters Find disease subgroups Distinguish normal from non-normal behavior Association Rules Find co-occurring items Suggest interactions Feature Extraction Reduce a large dataset into representative new attributes Useful for clustering and text mining F1 F2 F3 F4
7 Oracle Data Mining Algorithms & Example Applications Text Mining Combine data and text for better models Add unstructured text e.g. physician s notes to structured data e.g. age, weight, height, etc., to predict outcomes Classify and cluster documents Combined with Oracle Text to develop advanced text mining applications e.g. Medline BLAST Sequence matching and alignment Find genes and proteins that are similar ATGCAATGCCAGGATTTCCA CTGCAAGGCCAGGAAGTTCCA ATGCGTTGCCAC ATTTCCA GGC..TGCAATGCCAGGATGACCA ATGCAATGTTAGGACCTCCA
8 10g Statistics & SQL Analytics Ranking functions rank, dense_rank, cume_dist, percent_rank, ntile Window Aggregate functions (moving and cumulative) Avg, sum, min, max, count, variance, stddev, first_value, last_value LAG/LEAD functions Direct inter-row reference using offsets Reporting Aggregate functions Sum, avg, min, max, variance, stddev, count, ratio_to_report Statistical Aggregates Correlation, linear regression family, covariance Linear regression Fitting of an ordinary-least-squares regression line to a set of number pairs. Frequently combined with the COVAR_POP, COVAR_SAMP, and CORR functions. Descriptive Statistics average, standard deviation, variance, min, max, median (via percentile_count), mode, group-by & roll-up DBMS_STAT_FUNCS: summarizes numerical columns of a table and returns count, min, max, range, mean, stats_mode, variance, standard deviation, median, quantile values, +/- 3 sigma values, top/bottom 5 values Correlations Pearson s correlation coefficients, Spearman's and Kendall's (both nonparametric). Cross Tabs Enhanced with % statistics: chi squared, phi coefficient, Cramer's V, contingency coefficient, Cohen's kappa Hypothesis Testing Student t-test, F-test, Binomial test, Wilcoxon Signed Ranks test, Chi-square, Mann Whitney test, Kolmogorov- Smirnov test, One-way ANOVA Distribution Fitting Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi- Squared Test, Normal, Uniform, Weibull, Exponential Pareto Analysis (documented) 80:20 rule, cumulative results table
9 Statistics Enables analytic pipelines without removing data to statistical packages for simple analyses (e.g. hypothesis testing)
10 Workshop Outline Explore the data View data, simple graphs, ranges, etc. Cluster the data (undirected) & look for interesting patterns Determine problem to be solved What factors are associated with target 1, target 2, etc. Predict patients likely to respond to treatment Data transformations Building Models Classification Models Build, Test, Apply Throw out attributes e.g. 100% correlations etc. Classification Models w/ unstructured data (text) Mining Activity Guides 10gR2 Preview Decision Trees (10gR2) Anomaly Detection (10gR2)
11 Explore the data View data, simple graphs, ranges, etc. Lymphoma_7 data Bpress default Relative value data (WRMC) Cluster the data (undirected) looking for interesting patterns
12 Determine problem State the problem in terms of data mining What factors are associated with target 1, target 2, etc. Predict patients likely to respond to treatment Find new disease subgroups
13 Building Models Classification Build, Test, Apply Use SVM on Brain Tumor data w/o TEXT and SVM on Brain Tumor w/ TEXT Throw out attributes e.g. 100% correlations etc. Use Diabetes data Classification Models w/ unstructured data (text)
14 Mining Activity Guides Step by step guidance to achieve a goal and increase the likelihood of successful data mining
15 The future of data mining lies in predictive analytics. The Future of Data Mining Predictive Analytics Article published in DM Review Magazine August 2004 Issue By Lou Agosta
16 What is Predictive Analytics? One click data mining Automatically selects appropriate algorithm Automates all advanced algorithm settings Automates Train, Test, and Apply steps Power data analysts can use Oracle Data Miner (wizard driven gui) PL/SQL API Java API Concept of performing predictive analytics is better than doing nothing
17 Oracle Data Mining Algorithms & Example Applications Attribute Importance Identify most influential attributes for a target Explain attribute PA Easy Button Factors associated Attribute Importance a disease Promising leads A1 A2 A3 A4 A5 A6 A7 Classification and Prediction Predict most likely to: Doctors who prescribe a new drug Patients who respond to a treatment Regression Predict PA Easy Button Classification & Regression Predict a numeric value Predict a value Predict the size tumor reduction
18 Life Sciences Oracle Data Miner 10g Release 2 Preview
19 Oracle Data Mining Decision Trees Decision Trees Popular algorithm Human readable rules Builds classification trees in Database Parallel implementation Status Age >45 <45 Age No Infection Infection >35 <=35 Temp Gender Days ICU <100 >100 F M >4 Problem: Find profiles of high risk patients Risk = 0 Risk = 1 Risk = 0 Risk = 1 Risk = 0 IF (Age > 45 AND Status = Infection AND Temp = >100) THEN P(High Risk=1) =.77 Support = 250 <=4 Risk = 1
20 Oracle Data Mining 10g Release 2 New Features Anomaly Detection One-Class Classification Builds SVM classification models where only one class e.g. 0 s exists Network intrusion detection Disease outbreaks Outlier detection Rare events, true novelty X2 X1 Problem: Detect rare cases
21 Oracle Data Mining 10g Release 2 New Features (Continued) Oracle Predictive Analytics PL/SQL Packages (Available now on OTN) EXPLAIN and PREDICT PL/SQL packages completely automate data mining Oracle Spreadsheet Add-In for Predictive Analytics on OTN Operator SQL-Level Data Mining Capability Prediction Operator SQL-Level Data Mining Capability Fast, SQL-level data mining prediction ( Apply ) functions that can be used to pipeline predictions e.g. Select customers where Churner_predicted >.80 AND Customer_value_prediction > $500 AND Response_likehood >.6 Java Data Mining (JDM) Compliant Java API Oracle Database 10g R2 provides a Java Data Mining (JDM) JSR-73 compliant Java API Implemented on top of the DBMS_DATA_MINING PL/SQL API and unifies the overall product, enabling interoperability of mining models between APIs
22 Oracle Data Mining 10g Release 2 Updated Oracle Data Miner (GUI) Ability to mine text column Anomaly detection Decision Trees Predictive Analytics ( one click data mining)
23 Oracle Data Mining 10g Release 2 Decision Trees (10gR2) Anomaly Detection (10gR2)
24 Oracle Data Mining 10g Release 2 Decision Trees (10gR2) Anomaly Detection (10gR2)
25 Q U E S T I O N S A N S W E R S
26
27 Additional Life Sciences Use Case Slides
28 Life Sciences Use Cases 1. Gene expression analysis 2. Clinical treatment outcome analysis 3. Classification of Multiple Tumor Types 4. Medline text mining
29 Oracle Data Mining in the Life Sciences Gene expression analysis Problem 1 Given thousands of gene expression values for each patient, can a small subset of the expressions be identified that can be used to distinguish one type of leukemia from another? Solution Apply ODM s Attribute Importance algorithm to the data to decrease the size of the problem Build an Adaptive Bayes Network Classification model to predict disease type from the gene expressions
30 Oracle Data Mining in the Life Sciences Gene expression analysis Top Genes (of ~7000) for Classifying Leukemia Gene Expression Relative Importance V00594_s_at D43950_at U34038_at J03827_at U64863_at S85655_at L07758_at U19345_at U89336_cds4_at U79295_at HG311-HT311_at V00599_s_at
31 Oracle Data Mining in the Life Sciences Gene expression analysis ABN Model Predictions Lymphoid Leukemia vs. Myeloid Leukemia Predicted LL ML Actual LL 19 1 ML 2 12 Test set accuracy: 91.2%
32 Oracle Data Mining in the Life Sciences Clinical treatment outcome analysis Problem 2 Is it possible to classify treatments that are most effective in causing improvement in clinical patients suffering from a given disease? Solution Use Attribute Importance to rank the treatment factors Use Association Rules to establish correlations between treatment and outcome Source: Walter Reed Medical Center, Dr. Carolyn Hamm, presentation at Oracle Life Sciences User Group Meeting, June 2004
33 Oracle Data Mining in the Life Sciences Clinical treatment outcome analysis Factors associated with positive diabetes outcomes 1. DRUG_TYPE 2. COMPLETE_HISTORY_RECORDED (Scorecard) 3. NUM_HOSPITAL_ADMISSIONS 4. GENDER 5. NUM_VISITS_TO_PROVIDER 6. INSURANCE_TYPE 7. BLOOD_PRESSURE_GOAL 8. WEIGHT_GOAL 9. LDL_GOAL 10.PROVIDER_TYPE Source: Walter Reed Medical Center, Dr. Carolyn Hamm, presentation at Oracle Life Sciences User Group Meeting, June 2004
34 Oracle Data Mining in the Life Sciences Clinical treatment outcome analysis Sample Association Rules If Then OUTCOME Percentage of Cases NUM_HOSPITAL_ADMISSIONS=0 NO_IMPROVEMENT NUM_VISITS_TO_PROVIDER>5 IMPROVEMENT NUM_HOSPITAL_ADMISSIONS =0 and NUM_VISITS_TO_PROVIDER>5 IMPROVEMENT DRUG_GROUP=2 and NUM_HOSPITAL_ADMISSIONS =0 NO_IMPROVEMENT DRUG_GROUP=2 and NUM_VISITS_TO_PROVIDER>5 NO_IMPROVEMENT NUM_HOSPITAL_ADMISSIONS =0 and GENDER=FEMALE NO_IMPROVEMENT COMPLETE_HISTORY_RECORDED=NO NO_IMPROVEMENT NUM_HOSPITAL_ADMISSIONS =0 and COMPLETE_HISTORY_RECORDED=Yes IMPROVEMENT Source: Walter Reed Medical Center, Dr. Carolyn Hamm, presentation at Oracle Life Sciences User Group Meeting, June 2004
35 Oracle Data Mining in the Life Sciences Classification of Multiple Tumor Types DNA Microarray Data We feed multiple cancer types data into the Oracle DB: 16,063 genes, 144 cancer patients and 10 samples per class. Oracle Data Mining Actual\Predicted BR PR LU CO LY BL ML UT LE RE PA OV MS BR BREAST-BR 1 1 PROSTATE-PR 1 1 LUNG-LU 1 2 We mine the data using Support Vector Machines and create the confusion matrix COLON-CO 3 LYMPHOMA-LY 6 BLADDER-BL % accuracy MELANOMA-ML 1 1 UTERUS-UT 2 LEUKEMIA-LE 1 5 RENAL-RE 3 PANCREAS-PA 1 2 OVARY-OV 1 2 MESOTHELIOMA- 3 MS BRAIN-BR 4 Green=Correct Red=Errors Multiple Examples of tumor tissue (public data from Broad Institute/MIT)
36 Oracle Data Mining in the Life Sciences Classification of Multiple Tumor Types Multiple examples of 14 tumor types Training set: 144 samples. Test set: 46 samples Microarrays gene expression profiles: 7,129 genes (features) Can we build a model to distinguish between multiple tumor types? Tumor Class # Train # Test Tumor Class # Train # Test Breast (BR) 8 3 Uterus (UT) 8 2 Prostate (PR) 8 2 Leukemia (LE) 24 6 Lung (LU) 8 3 Renal (RE) 8 3 Colorectal (CO) 8 5 Pancreas (PA) 8 3 Lymphoma (LY) 16 6 Ovary (OV) 8 3 Bladder (BL) 8 3 Mesothelioma (MS) 8 3 Melanoma (ML) 8 2 Brain (BR) 16 4
37 Oracle Data Mining in the Life Sciences Classification of Multiple Tumor Types Multi-Tumor Dataset Oracle Task Read into RDMS as Table SQLLDR Data Preparation (Scaling) SQL query Tumor Labels (Train) Build SVM Model (Training) ODM Model Build Tumor Labels (Test) Evaluate Model on Test Set ODM Model Apply Prediction Results
38 Oracle Data Mining in the Life Sciences Classification of Multiple Tumor Types The datasets were downloaded from the web site and stored in flat files prior to loading them to the Oracle database The data was loaded using SQLLDR to create a fact table of the following format: column type sid gene expr NUMBER VARCHAR2(30) NUMBER Rescaling: the values were divided by a constant (10000) to make them into small numbers near 1 (to keep the dot products between all samples in the dataset inside the [-1, 1] range
39 Oracle Data Mining in the Life Sciences Classification of Multiple Tumor Types Entire methodology implemented in Oracle Database The SVM model works with all 7,129 input features (genes) genes and do not require feature selection. The SVM model is relatively fast: 9 minutes training time on 500MHz Netra. The SVM is very accurate for multi-tumor molecular classification: 78.25% accuracy Comparable to published results in Ramaswamy et al PNAS 2001 paper, they also found that k-nn = 63% and Weighted Voting = 46% accuracy
40 Oracle Data Mining in the Life Sciences Classification of Multiple Tumor Types Results: 78.25% accuracy Actual\Predicted BR PR LU CO LY BL ML UT LE RE PA OV MS BR BREAST-BR 1 1 PROSTATE-PR 1 1 LUNG-LU 1 2 COLON-CO 3 LYMPHOMA-LY 6 BLADDER-BL 1 2 MELANOMA-ML 1 1 UTERUS-UT 2 Green=Correct LEUKEMIA-LE 1 5 RENAL-RE 3 PANCREAS-PA 1 2 OVARY-OV 1 2 Red=Errors Oracle Data Mining s SVM models are able to accurately predict the multi-class tumor problem with 78.25% accuracy. MESOTHELIOMA- 3 MS BRAIN-BR 4
41
Statistical Analysis of Gene Expression Data With Oracle & R (- data mining)
Statistical Analysis of Gene Expression Data With Oracle & R (- data mining) Patrick E. Hoffman Sc.D. Senior Principal Analytical Consultant pat.hoffman@oracle.com Agenda (Oracle & R Analysis) Tools Loading
More informationExadata V2 + Oracle Data Mining 11g Release 2 Importing 3 rd Party (SAS) dm models
Exadata V2 + Oracle Data Mining 11g Release 2 Importing 3 rd Party (SAS) dm models Charlie Berger Sr. Director Product Management, Data Mining Technologies Oracle Corporation charlie.berger@oracle.com
More informationThe Oracle Data Mining Machine Bundle: Zero to Predictive Analytics in Two Weeks Collaborate 15 IOUG
The Oracle Data Mining Machine Bundle: Zero to Predictive Analytics in Two Weeks Collaborate 15 IOUG Presentation #730 Tim Vlamis and Dan Vlamis Vlamis Software Solutions 816-781-2880 www.vlamis.com Presentation
More information1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
1 Copyright 2011, Oracle and/or its affiliates. FPO In-Database Analytics: Predictive Analytics, Data Mining, Exadata & Business Intelligence Charlie Berger Sr. Director Product Management, Data Mining
More informationSQL - the best analysis language for Big Data!
SQL - the best analysis language for Big Data! NoCOUG Winter Conference 2014 Hermann Bär, hermann.baer@oracle.com Data Warehousing Product Management, Oracle 1 The On-Going Evolution of SQL Introduction
More informationOracle Data Mining In-Database Data Mining Made Easy!
Oracle Data Mining In-Database Data Mining Made Easy! Charlie Berger Sr. Director Product Management, Data Mining and Advanced Analytics Oracle Corporation charlie.berger@oracle.com www.twitter.com/charliedatamine
More informationPredictive Analytics for Better Business Intelligence
Oracle 11g DB Data Warehousing ETL OLAP Statistics Predictive Analytics for Better Business Intelligence Data Mining Charlie Berger Sr. Director Product Management, Data Mining Technologies
More informationOracle's In-Database Statistical Functions
Oracle 11g DB Data Warehousing Oracle's In-Database Statistical Functions OLAP Statistics Data Mining Charlie Berger Sr. Director Product Management, Data Mining Technologies
More informationSeamless Access from Oracle Database to Your Big Data
Seamless Access from Oracle Database to Your Big Data Brian Macdonald Big Data and Analytics Specialist Oracle Enterprise Architect September 24, 2015 Agenda Hadoop and SQL access methods What is Oracle
More informationBig Data Analytics with Oracle Advanced Analytics In-Database Option
Big Data Analytics with Oracle Advanced Analytics In-Database Option Charlie Berger Sr. Director Product Management, Data Mining and Advanced Analytics charlie.berger@oracle.com www.twitter.com/charliedatamine
More informationThis presentation is for informational purposes only and may not be incorporated into a contract or agreement.
This presentation is for informational purposes only and may not be incorporated into a contract or agreement. The following is intended to outline our general product direction. It is intended for information
More informationSemantic and Data Mining Technologies. Simon See, Ph.D.,
Semantic and Data Mining Technologies Simon See, Ph.D., Introduction to Semantic Web and Business Use Cases 2 Lots of Scientific Resources NAR 2009 over 1170 databases Reuse, Recycling, Repurposing Paul
More informationSun / Oracle Life Science Platform From Deluge to Discovery. 2011 Oracle Corporation
Sun / Oracle Life Science Platform From Deluge to Discovery SGI and Sun 1996 2011 Graph Algorithims Social Media We re a very tiny circle in the middle of this big universe. So it s more likely interesting
More informationAnalyzing Big Data. Heartland OUG Spring Conference 2014
Analyzing Big Data Heartland OUG Spring Conference 2014 Dan Vlamis Vlamis Software Solutions 816-781-2880 http://www.vlamis.com Copyright 2014, Vlamis Software Solutions, Inc. Copyright 2014, Vlamis Software
More informationBlazing BI: the Analytic Options to the Oracle Database. ODTUG Kscope 2013
Blazing BI: the Analytic Options to the Oracle Database ODTUG Kscope 2013 Dan Vlamis Tim Vlamis Vlamis Software Solutions 816-781-2880 http://www.vlamis.com Copyright 2013, Vlamis Software Solutions, Inc.
More informationAnomaly and Fraud Detection with Oracle Data Mining 11g Release 2
Oracle 11g DB Data Warehousing ETL OLAP Statistics Anomaly and Fraud Detection with Oracle Data Mining 11g Release 2 Data Mining Charlie Berger Sr. Director Product Management, Data
More informationOracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features
Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features Charlie Berger, MS Eng, MBA Sr. Director Product Management, Data Mining and Advanced Analytics charlie.berger@oracle.com www.twitter.com/charliedatamine
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationOracle Big Data SQL Architectural Deep Dive
Oracle Big Data SQL Architectural Deep Dive Dan McClary, Ph.D. Big Data Product Management Oracle Safe Harbor Statement The following is intended to outline our general product direction. It is intended
More informationBig Data: Are you ready?
Big Data: Are you ready? Oracle Big Data SQL George Bourmas Enterprise Architect EMEA XLOB Enterprise Architects September 13, 2014 Oracle Confidential Internal/Restricted/Highly Restricted Thoughts Things
More informationWebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationBig Data Management System Solution Overview
Big Data Management System Solution Overview Pascal GUY Pre Sales Architect Business Unit Systems Oracle France Copyright 2014 Oracle and/or its affiliates. All rights reserved. Safe Harbor Statement The
More informationextreme Datamining mit Oracle R Enterprise
extreme Datamining mit Oracle R Enterprise Oliver Bracht Managing Director eoda Matthias Fuchs Senior Consultant ISE Information Systems Engineering GmbH extreme Datamining with Oracle R Enterprise About
More informationIntroduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
More informationData Mining - The Next Mining Boom?
Howard Ong Principal Consultant Aurora Consulting Pty Ltd Abstract This paper introduces Data Mining to its audience by explaining Data Mining in the context of Corporate and Business Intelligence Reporting.
More informationDATA ANALYSIS. QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University
DATA ANALYSIS QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University Quantitative Research What is Statistics? Statistics (as a subject) is the science
More informationOracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.
Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse
More informationBowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition
Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application
More informationOracle Data Mining Hands On Lab
Oracle Data Mining Hands On Lab Material provided by Oracle Corporation Vlamis Software Solutions is one of the most respected training organizations in the Oracle Business Intelligence community because
More informationComparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More informationOracle Data Mining. Concepts 10g Release 2 (10.2) B14339-01
Oracle Data Mining Concepts 10g Release 2 (10.2) B14339-01 June 2005 Oracle Data Mining Concepts, 10g Release 2 (10.2) B14339-01 Copyright 2005, Oracle. All rights reserved. Primary Authors: Margaret Taft,
More informationbusiness statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
More informationStatistical tests for SPSS
Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly
More informationFraud and Anomaly Detection Using Oracle Advanced Analytic Option 12c
Fraud and Anomaly Detection Using Oracle Advanced Analytic Option 12c Charlie Berger Sr. Director Product Management, Data Mining and Advanced Analytics charlie.berger@oracle.com www.twitter.com/charliedatamine
More informationAnalyzing Research Data Using Excel
Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial
More informationPrerequisites. Course Outline
MS-55040: Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot Description This three-day instructor-led course will introduce the students to the concepts of data mining,
More informationSPSS Tests for Versions 9 to 13
SPSS Tests for Versions 9 to 13 Chapter 2 Descriptive Statistic (including median) Choose Analyze Descriptive statistics Frequencies... Click on variable(s) then press to move to into Variable(s): list
More informationData Mining with Oracle Database 11g Release 2
An Oracle White Paper September 2009 Data Mining with Oracle Database 11g Release 2 Competing on In-Database Analytics Executive Overview... 1 In-Database Data Mining... 1 Key Benefits of Oracle Data Mining...
More informationDirections for using SPSS
Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...
More informationInstructions for SPSS 21
1 Instructions for SPSS 21 1 Introduction... 2 1.1 Opening the SPSS program... 2 1.2 General... 2 2 Data inputting and processing... 2 2.1 Manual input and data processing... 2 2.2 Saving data... 3 2.3
More informationAnomaly and Fraud Detection with Oracle Data Mining
Oracle 11g DB Data Warehousing ETL OLAP Statistics Anomaly and Fraud Detection with Oracle Data Mining Data Mining Charlie Berger Sr. Director Product Management, Data Mining Technologies
More informationOracle Advanced Analytics Oracle R Enterprise & Oracle Data Mining
Oracle Advanced Analytics Oracle R Enterprise & Oracle Data Mining R The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated
More informationBill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1
Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce
More informationAn introduction to using Microsoft Excel for quantitative data analysis
Contents An introduction to using Microsoft Excel for quantitative data analysis 1 Introduction... 1 2 Why use Excel?... 2 3 Quantitative data analysis tools in Excel... 3 4 Entering your data... 6 5 Preparing
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationSTATISTICAL ANALYSIS WITH EXCEL COURSE OUTLINE
STATISTICAL ANALYSIS WITH EXCEL COURSE OUTLINE Perhaps Microsoft has taken pains to hide some of the most powerful tools in Excel. These add-ins tools work on top of Excel, extending its power and abilities
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationMicrosoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
More informationThe Dummy s Guide to Data Analysis Using SPSS
The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests
More informationRole of Social Networking in Marketing using Data Mining
Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:
More informationOracle Advanced Analytics - Option to Oracle Database: Oracle R Enterprise and Oracle Data Mining. Data Warehouse Global Leaders Winter 2013
Oracle Advanced Analytics - Option to Oracle Database: Oracle R Enterprise and Oracle Data Mining Data Warehouse Global Leaders Winter 2013 Dan Vlamis, Vlamis Software Solutions Tim Vlamis, Vlamis Software
More informationData Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
More informationData analysis process
Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis
More informationCONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationModel Deployment. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/
Model Deployment Dr. Saed Sayad University of Toronto 2010 saed.sayad@utoronto.ca http://chem-eng.utoronto.ca/~datamining/ 1 Model Deployment Creation of the model is generally not the end of the project.
More informationMathematical Models of Supervised Learning and their Application to Medical Diagnosis
Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical
More informationProjects Involving Statistics (& SPSS)
Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,
More informationAnomaly and Fraud Detection with Oracle Data Mining 11g Release 2
Oracle 11g DB Data Warehousing ETL OLAP Statistics Data Mining Anomaly and Fraud Detection with Oracle Data Mining 11g Release 2 Charlie Berger Sr. Director Product Management, Data
More informationJanuary 26, 2009 The Faculty Center for Teaching and Learning
THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i
More informationNormality Testing in Excel
Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com
More informationGetting Started with Oracle Data Miner 11g R2. Brendan Tierney
Getting Started with Oracle Data Miner 11g R2 Brendan Tierney Scene Setting This is not about DB log mining This is an introduction to ODM And how ODM can be included in OBIEE (next presentation) Domain
More informationWhy is Internal Audit so Hard?
Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets
More informationOracle Data Mining 11g Release 2
An Oracle White Paper February 2012 Oracle Data Mining 11g Release 2 Competing on In-Database Analytics Disclaimer The following is intended to outline our general product direction. It is intended for
More informationOracle Data Miner (Extension of SQL Developer 4.0)
An Oracle White Paper October 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Generate a PL/SQL script for workflow deployment Denny Wong Oracle Data Mining Technologies 10 Van de Graff Drive Burlington,
More informationData Mining On Diabetics
Data Mining On Diabetics Janani Sankari.M 1,Saravana priya.m 2 Assistant Professor 1,2 Department of Information Technology 1,Computer Engineering 2 Jeppiaar Engineering College,Chennai 1, D.Y.Patil College
More informationSPSS TUTORIAL & EXERCISE BOOK
UNIVERSITY OF MISKOLC Faculty of Economics Institute of Business Information and Methods Department of Business Statistics and Economic Forecasting PETRA PETROVICS SPSS TUTORIAL & EXERCISE BOOK FOR BUSINESS
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationMicroarray Data Mining: Puce a ADN
Microarray Data Mining: Puce a ADN Recent Developments Gregory Piatetsky-Shapiro KDnuggets EGC 2005, Paris 2005 KDnuggets EGC 2005 Role of Gene Expression Cell Nucleus Chromosome Gene expression Protein
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationHigh Productivity Data Processing Analytics Methods with Applications
High Productivity Data Processing Analytics Methods with Applications Dr. Ing. Morris Riedel et al. Adjunct Associate Professor School of Engineering and Natural Sciences, University of Iceland Research
More informationBig Data and Predictive Analytics: Fiserv Data Mining Case Study [CON8631] Data Warehouse and Big Data
Big Data and Predictive Analytics: Fiserv Data Mining Case Study [CON8631] Data Warehouse and Big Data Miguel Barrera - Director, Risk Analytics, Fiserv, Inc. Julia Minkowski - Risk Manager, Fiserv, Inc.
More informationName: Srinivasan Govindaraj Title: Big Data Predictive Analytics
Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice
More informationOracle Data Mining. Concepts 11g Release 2 (11.2) E16808-07
Oracle Data Mining Concepts 11g Release 2 (11.2) E16808-07 June 2013 Oracle Data Mining Concepts, 11g Release 2 (11.2) E16808-07 Copyright 2005, 2013, Oracle and/or its affiliates. All rights reserved.
More informationData Mining III: Numeric Estimation
Data Mining III: Numeric Estimation Computer Science 105 Boston University David G. Sullivan, Ph.D. Review: Numeric Estimation Numeric estimation is like classification learning. it involves learning a
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationData Mining. SPSS Clementine 12.0. 1. Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine
Data Mining SPSS 12.0 1. Overview Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Types of Models Interface Projects References Outline Introduction Introduction Three of the common data mining
More informationIntroduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
More informationUniversité de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr
Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection
More informationGamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationHow to Build MicroStrategy Projects on Top of Big Data Sources in the Cloud
How to Build MicroStrategy Projects on Top of Big Data Sources in the Cloud Jochen Demuth, Director, Partner Engineering Use Cases for Big Data in the Cloud Four broad categories and their value Traditional
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
More informationFigure 1. An embedded chart on a worksheet.
8. Excel Charts and Analysis ToolPak Charts, also known as graphs, have been an integral part of spreadsheets since the early days of Lotus 1-2-3. Charting features have improved significantly over the
More informationAnalysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk
Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
More informationIBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA
CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationOracle Business Intelligence and Analytics Platform. SFOUG March 22, 2006. Shyam Varan Nath Oracle Corporation
Oracle Business Intelligence and Analytics Platform SFOUG March 22, 2006 Shyam Varan Nath Oracle Corporation 1 Agenda Introduction to Business Intelligence A brief look into Oracle Integrated BI platform
More informationOracle Data Mining. Concepts 11g Release 1 (11.1) B28129-04
Oracle Data Mining Concepts 11g Release 1 (11.1) B28129-04 May 2008 Oracle Data Mining Concepts, 11g Release 1 (11.1) B28129-04 Copyright 2005, 2008, Oracle. All rights reserved. The Programs (which include
More informationOracle Data Mining. Concepts 11g Release 1 (11.1) B28129-02
Oracle Data Mining Concepts 11g Release 1 (11.1) B28129-02 September 2007 Oracle Data Mining Concepts, 11g Release 1 (11.1) B28129-02 Copyright 2005, 2007, Oracle. All rights reserved. The Programs (which
More informationOnce saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.
1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationMHI3000 Big Data Analytics for Health Care Final Project Report
MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data pre-processing The data given
More informationPredictive Data modeling for health care: Comparative performance study of different prediction models
Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar
More informationScalable Developments for Big Data Analytics in Remote Sensing
Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,
More information