Semantic and Data Mining Technologies. Simon See, Ph.D.,

Similar documents
Exadata V2 + Oracle Data Mining 11g Release 2 Importing 3 rd Party (SAS) dm models

The Oracle Data Mining Machine Bundle: Zero to Predictive Analytics in Two Weeks Collaborate 15 IOUG

Anomaly and Fraud Detection with Oracle Data Mining 11g Release 2

Application of OASIS Integrated Collaboration Object Model (ICOM) with Oracle Database 11g Semantic Technologies

Oracle Data Mining In-Database Data Mining Made Easy!

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

Statistical Analysis of Gene Expression Data With Oracle & R (- data mining)

OLSUG Workshop Oracle Data Mining

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Blazing BI: the Analytic Options to the Oracle Database. ODTUG Kscope 2013

SQL - the best analysis language for Big Data!

Graph Database Performance: An Oracle Perspective

Seamless Access from Oracle Database to Your Big Data

Predictive Analytics for Better Business Intelligence

Big Data Analytics with Oracle Advanced Analytics In-Database Option

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features

Put SPARQL in Your Code: Building Applications with Oracle Semantic Technologies. Xavier Lopez, Ph.D. Zhe Wu, Ph.D. Souripriya Das, Ph.D.

Mining Big Data with RDF Graph Technology:

Oracle Advanced Analytics Oracle R Enterprise & Oracle Data Mining

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Oracle Spatial and Graph

How To Use An Orgode Database With A Graph Graph (Robert Kramer)

Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics

Oracle's In-Database Statistical Functions

IBM SPSS Modeler 15 In-Database Mining Guide

The Data Mining Process

Starting Smart with Oracle Advanced Analytics

Oracle Data Miner (Extension of SQL Developer 4.0)

extreme Datamining mit Oracle R Enterprise

Data Analysis with Various Oracle Business Intelligence and Analytic Tools

Oracle Data Miner (Extension of SQL Developer 4.0)

Semantic Stored Procedures Programming Environment and performance analysis

Smart Cities require Geospatial Data Providing services to citizens, enterprises, visitors...

A collaborative platform for knowledge management

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

An In-Depth Look at In-Memory Predictive Analytics for Developers

Oracle Big Data SQL Architectural Deep Dive

The Semantic Web for Application Developers. Oracle New England Development Center Zhe Wu, Ph.D. 1

Data Mining with Oracle Database 11g Release 2

Getting Started with Oracle Data Miner 11g R2. Brendan Tierney

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Fraud and Anomaly Detection Using Oracle Advanced Analytic Option 12c

Safe Harbor Statement

Data Integration Checklist

How to Build MicroStrategy Projects on Top of Big Data Sources in the Cloud

Oracle Data Mining Hands On Lab

Tax Fraud in Increasing

The basic data mining algorithms introduced may be enhanced in a number of ways.

Geospatial Platforms For Enabling Workflows

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

Anomaly and Fraud Detection with Oracle Data Mining

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

Oracle Data Mining 11g Release 2

Make Better Decisions Through Predictive Intelligence

IBM SPSS Modeler Premium

Big Data and Predictive Analytics: Fiserv Data Mining Case Study [CON8631] Data Warehouse and Big Data

Geospatial Technology Innovations and Convergence

Azure Machine Learning, SQL Data Mining and R

Ganzheitliches Datenmanagement

What Are They Thinking? With Oracle Application Express and Oracle Data Miner

How To Make Sense Of Data With Altilia

Geospatial Platforms For Enabling Workflows

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Quick start. A project with SpagoBI 3.x

Semantic Interoperability

Knowledge Discovery from patents using KMX Text Analytics

HOW TO DO A SMART DATA PROJECT

Data Mining Analytics for Business Intelligence and Decision Support

Design and Implementation of a Semantic Web Solution for Real-time Reservoir Management

KnowledgeSEEKER Marketing Edition

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

Introduction to Ontologies

Improve Model Accuracy with Unstructured Data

Forecasting, Prediction Models, and Times Series Analysis with Oracle Business Intelligence and Analytics. Rittman Mead BI Forum 2013

R Tools Evaluation. A review by Global BI / Local & Regional Capabilities. Telefónica CCDO May 2015

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Oracle Advanced Analytics - Option to Oracle Database: Oracle R Enterprise and Oracle Data Mining. Data Warehouse Global Leaders Winter 2013

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Fusion Applications Overview of Business Intelligence and Reporting components

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

E6895 Advanced Big Data Analytics Lecture 4:! Data Store

Oracle Data Mining. Concepts 11g Release 2 (11.2) E

Pentaho Data Mining Last Modified on January 22, 2007

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

Big Data Analytics Nokia

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

<Insert Picture Here> Oracle SQL Developer 3.0: Overview and New Features

Sunnie Chung. Cleveland State University

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Transcription:

Semantic and Data Mining Technologies Simon See, Ph.D.,

Introduction to Semantic Web and Business Use Cases 2

Lots of Scientific Resources NAR 2009 over 1170 databases

Reuse, Recycling, Repurposing Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis in cattle Paul meets Jo. Jo is investigating Whipworm in mouse. Jo reuses one of Paul s workflow without change. Jo identifies the biological pathways involved in sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite. Previously a manual two year study by Jo had failed to do this.

Workflows Access to distributed and local resources Automation of data flow Iteration over data sets Interactive Agile software development Experimental protocols Declarative mashups?

The Data Playground 1) The Playground Panel with A) Data Objects that feed into BioMoby objects, B) BioMoby Services which can be executed through a right-click option and C) the record button for recording user activity. 2) The Taverna tree of Web Services which can be dragged into the Playground Panel and connected to Data Objects. 3) Semantic Discovery panel that shows compatible services based on input or output. 5) The Data Viewer panel for viewing data that features the Taverna data rendering capabilities. Andrew Gibson 4) The Data Editor panel for inputting Data.

Semantic Technology Stack Basic Technologies URI Uniform Resource Identifier RDF Resource Description Framework RDFS RDF Schema OWL Web ontology language SPARQL Protocol and Query Language 7 http://www.w3.org/2007/03/ layercake.svg

Extraction, Modeling, Reasoning & Discovery Workflow Transaction Systems Unstructured Content Transform & Edit Load, Query Applications & Tools & Inference Analysis Tools Entity Extraction & Transform RDF/OWL Data Management Ontology Engineering SQL & SPARQL Query RSS, email Categorization Other Data Formats Custom Scripting Inferencing BI Analytics Graph Visualization Social Network Analysis Semantic Rules Metadata Registry Security Faceted Search Semantic Indexing SPARQL Endpoint Versioning Data Sources Partner Tools Partner/Oracle Tools

Why Organizations use Oracle RDF Database Key Capabilities: Oracle database is the leading commercial database with native RDF/ OWL data management Load / Storage Scalable & secure platform for widerange of semantic applications Readily scales to ultra-large repositories (10s billions of triples) Choice of SQL or SPARQL query Leverages Oracle Partitioning. RAC supported Growing ecosystem of 3 rd party tools partners Query Native RDF graph data store Manages billions of triples Fast batch, bulk and incremental load SQL: SEM_Match SPARQL: via Jena plug-in Ontology assisted query of RDBMS data Forward chaining model Reasoning OWLprime, OWL 2 RL, RDFS User defined rule base 9

Architectural Overview User-def. OWLsubsets Incr. DML RDF/S Versioning: Workspaces Rulebases: OWL, RDF/S, user-defined Inferred RDF/OWL data SQLdev. PL/SQL Semantic Indexes RDF/OWL Oracle DB Security: fine-grained QUERY (SQL-based SPARQL) Query RDF/ OntologyOWL data assisted and Query of ontologies Enterprise Data Enterprise (Relational) data 3rd-Party Callouts INFER Tools NLP Info. Extractor: Calais, GATE Bulk-Load LO AD JDBC SQL Interface SQLplus Java API support Java Programs Topbraid Composer Visualizer (cytoscope) Programming Core Interface functionality SPARQL: Jena / Sesame RDF/OWL data and ontologies 3rd Party Tools Joseki / Sesame Reasoners: Pellet SPARQL Endpoints

Core Entities in Oracle Database Semantic Store Sem. Network Dictionary and scott herman scott data tables. New entities: models, rule bases, and rules indexes (entailments). OWL and RDFS rule bases are preloaded. SDO_RDF_TRIPLE_S A new Application object type for RDF. Tables Application Table Contains col R A1 of object type sdo_rdf_triple_s to allow loading and accessing RDF R triples, and storing ancillary values. Model A model holds an RDF A2 graph (set of S-P-O triples) and is associated with an sdo_rdf_triple_s column in an application table. R Rulebase A rulebase is a set of rules used for inferencing. An Entailments An entailment stores triples derived via inferencing. Oracle Database Rulebases & Vocabularies RDF / RDFS OWL subset Rule base m Models M1 X1 X2 Entailments M2 Xp Values Mn Triples Semantic Network (MDSYS)

Load / Query / Inference Oracle Database Load Bulk load Incremental load Application Tables scott R A1 A2 Rule base m Models M1 X1 X2 Entailments M2 scott Inference using OWL 2 RL, RDFS, etc. using User-defined rules RDF / RDFS OWL subset R herman Query SPARQL (direct or SQL-based) simple data access Rulebases & Vocabularies Xp R Values An Mn Triples Semantic Network (MDSYS)

Data Integration Platform in Health Informatics Enterprise Information Consumers (EICs) Access Patient Care Workforce Management Business Intelligence Clinical Analytics Design-Time Metadata Run-Time Metadata Model Virtual Deploy Relate Integration Server Model Physical (Semantic Knowledge base) Access LIS CIS HTB HIS 13

Vocabulary Management BACK Intelligent Topic Manager (ITM) Manage Edit Maintain Search Control Import Export Audit ITM Features Web User Interfaces Multilingual Connectors to text mining Collaborative maintenance Import / export Scalability API & Web Services Java J2EE LDAP Ontology model (OWL) FRONT Oracle Text 11g For text based search Terminology Taxonomy (RDF SKOS) Knowledge representation Catalog - Yellow pages (RDF) Oracle RDF 11g For graph based search Oracle Database 11g Ontology repository Semantic Portal

National Intelligence: Text Mining 2. Model 3. Structured Data Entity Extraction Engine Feature/ term/relation Extraction, categorization (Insight, Lymba, Calais, Gate) organization founder person Oracle founder 1. Unstructured Data (Text) Larry Ellison Oracle s founder Larry Ellison wins 2010 America s Cup Race Blogs, open source, newsfeed XML/OWL/N3 Ontologies + Rules Knowledge Base (RDF Store) Signal Intelligence, message traffic 10 s of billions of triples SPARQL/SQL Analyst Reports (Content Mgmt) Triple Structure: Subj Pred - Obj 4. Mining & Discovery Explore Browsing, Presentation, Reporting, Visualization, Query Tools (e.g. i2, Centrifuge, Visual Analytics) Analyst

Oracle Data Mining Algorithms Problem Classification Regression Anomaly Detection Attribute Importance Association Rules Clustering Feature Extraction Algorithm Applicability Logistic Regression (GLM) Decision Trees Naïve Bayes Support Vector Machine Multiple Regression (GLM) Support Vector Machine Classical statistical technique Popular / Rules / transparency Embedded app Wide / narrow data / text One Class SVM Minimum Description Length (MDL) A1 A2 A3 A4 A5 A6 A7 Apriori Hierarchical K-Means Hierarchical O-Cluster NMF F1 F2 F3 F4 Copyright 2010 Oracle Corporation Classical statistical technique Wide / narrow data / text Lack examples Attribute reduction Identify useful data Reduce data noise Market basket analysis Link analysis Product grouping Text mining Gene and protein analysis Text analysis Feature reduction

In-Database Data Mining Oracle Data Mining Traditional Analytics Data Import Results Data Mining Model Scoring Data Preparation and Transformation Savings Data Mining Model Building Model Scoring Data remains in the Database Embedded data preparation Data Prep & Transformation Data Extraction Model Scoring Embedded Data Prep Model Building Data Preparation Secs, Mins or Hours Hours, Days or Weeks Source Data SAS Work Area S A S SAS Process ing S A S Process Output Faster time for Data to Insights Lower TCO Eliminates Data Movement Data Duplication Maintains Security Target S A S Copyright 2010 Oracle Corporation Cutting edge machine learning algorithms inside the SQL kernel of Database SQL Most powerful language for data preparation and Data remains in the Database

11g Statistics & SQL Analytics Ranking functions Descriptive Statistics rank, dense_rank, cume_dist, percent_rank, ntile Window Aggregate functions (moving and cumulative) Avg, sum, min, max, count, variance, stddev, first_value, last_value Correlation, linear regression family, covariance Fitting of an ordinary-least-squares regression line to a set of number pairs. Frequently combined with the COVAR_POP, COVAR_SAMP, and CORR functions Enhanced with % statistics: chi squared, phi coefficient, Cramer's V, contingency coefficient, Cohen's kappa Hypothesis Testing Linear regression Pearson s correlation coefficients, Spearman's and Kendall's (both nonparametric). Cross Tabs Sum, avg, min, max, variance, stddev, count, ratio_to_report Statistical Aggregates Direct inter-row reference using offsets DBMS_STAT_FUNCS: summarizes numerical columns of a table and returns count, min, max, range, mean, median, stats_mode, variance, standard deviation, quantile values, +/- n sigma values, top/bottom 5 values Correlations Reporting Aggregate functions LAG/LEAD functions Statistics Student t-test, F-test, Binomial test, Wilcoxon Signed Ranks test, Chi-square, Mann Whitney test, Kolmogorov-Smirnov test, One-way ANOVA Distribution Fitting Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-Squared Test, Normal, Uniform, Weibull, Exponential Note: Statistics and SQL Analytics are included in Oracle Database Standard Edition Copyright 2010 Oracle Corporation

Integration with Oracle BI EE Copyright 2010 Oracle Corporation ODM provides likelihood of expense reporting fraud.and other important questions.

Copyright 2010 Oracle Corporation

Oracle s Partners for Semantic Technologies Integrated Tools and Solution Providers: Ontology Engineering Query Tool Interfaces Reasoners Applications Standards Sesame Joseki NLP Entity Extractors SI / Consulting 21

Some Oracle Database Semantics Customers Life Sciences Defense/ Intelligence Education Telecomm & Networking Hutchinson 3G Austria Clinical Medicine & Research Publishing Thomson Reuters 22

Summary: Oracle Database 11g Release 2 Oracle database is the leading commercial database with native support for W3C standards compliant RDF/OWL data store w/ comprehensive capabilities for. Reasoning and Discovery supporting std. ontologies persistent, native & 3rd party inference, and user-defined rules Scalability to evolve schemas dynamically and grow to 10 s billions of triples, incremental & parallel inference Data Integration to link structured & unstructured content, Loosely couple business silos Security to protect data on a need to know basis Integrated querying & manageability SPARQL & SQL for RDF/OWL, relational, XML, text, location, & multimedia data 23

An Analytical Database Changes Everything! Less data movement = faster analytics, and faster analytics = better BI throughout enterprise OLAP Statistical Functions Data Mining Copyright 2010 Oracle Corporation Predictive Analytics?x Text Mining

For More Information search.oracle.com Semantic Technologies or oracle.com

Additional Information Preview of the new Oracle Data Miner 11g R2 work flow New GUI Oracle Data Mining 11gR2 presentation at Oracle Open World 2009 Oracle Data Mining Blog Funny YouTube video that features Oracle Data Mining Oracle Data Mining on the Amazon Cloud Oracle Data Mining 11gR2 data sheet Oracle Data Mining 11gR2 white paper New TechCast (audio and video recording): ODM overview and several demos Fraud and Anomaly Detection using Oracle Data Mining 11g presentation Algorithm technical summary with links to Documentation Getting Started w/ ODM page w/ instructions to download Oracle Data Miner graphical user interface (GUI), ODM Step-by-Step Tutorial Demo datasets ODM Discussion Forum on OTN (great for posting questions/answers) ODM 11g Sample Code (examples of ODM SQL and Java APIs applied in several use cases; great for developers) Oracle s 50+ SQL based statistical functions (t-test, ANOVA, Pearson s, etc.) Oracle Data Mining Copyright 2010 Oracle Corporation

Copyright 2010 Oracle Corporation

Copyright purposes 2010 Oracle Corporation This presentation is for informational only and may not be incorporated into a contract or agreement.