Robert Korošec Principal Sales Consultant Oracle
In The Beginning... Data Model with Structure, Integrity Rules, Operations Data Defined Independently of Programs Set-oriented, Declarative Language
UC Berkeley INGRES Commercial Database Systems Genealogy Britton- Lee Commercial INGRES Tandem Sybase Microsoft Illustra ESVEL Informix IBM System R SQL/DS DB2 MVS DB2 AS400 DB2 UDB 1970s 1979 1990 2002
Oracle #1 RDBMS Vendor
#1 RDBMS Vendor in 1999 Microsoft 10% Informix 6% Sybase 4% Others 11% Oracle 40.3% IBM 28,7% Source: Dataquest, May 2000
Oracle Corporation World s largest enterprise software vendor $23.3 billion in revenue, FY09 300,000 global customers 280,000 Oracle Database customers 43,000 Oracle Applications customers 80,000 Oracle Fusion Middleware customers 84,000 employees 20,000 developers 7,500 support personnel 20,000 partners 9,100 Independent Software Vendors (ISVs) Operating in 145 Countries
Enterprise Deals Manufacturing Industries Retail Industry Comms Industry Banking Industry Utilities Industry Insurance Industry Others Enterprise Deals Performance Management Identity Management Content Management Middleware Management Database Systems Management
Oracle DB Stores All Your Data Complete Integrated Relational Characters, Numbers and Dates Text Text management and search Documents & Images Multimedia management GIS Location and Proximity Searching XML Integrated Native XML Database Future Data.
Oracle Database Product Family Express Edition Standard Edition One Standard Edition Enterprise Edition Non-Oracle developers, open source developers, new DBAs, students, non-oracle ISVs, hw vendors Low-price option for SMB/LOB Deployments, ISVs who need a supported Oracle database Full-featured database for SMBs with optional clustering support (up to 4 CPUs) Large-scale Enterprises that demand highperformance BI (ETL, DW, OLTP), security, scalability, availability, etc. FREE $180/user (min. 5) or $5,800 per CPU $350/user or $17,500 per CPU $950/user or $47,500 per CPU Uses 1 CPU < 4GB DB size 1 instance per CPU Use up to 1GB RAM 2 CPU Single or clustered up to 4 CPUs 4+ CPUs Free OTN Community Forum Fee-based Support available Fee-based Support available Fee-based Support available
Continuous Innovation i i
Oracle RDBMS Architecture Oracle Instance PMON DBWR SMON SNPn LCKn RECO Snnn Shared Pool SGA Database Buffer Cache Redo Log Buffer Dnnn Pnnn CKPT LGWR ARCH Server Processes User Processes Oracle Database Parameter File Control Files Datafiles Redo Log Files
Database Trends 1. Grid & Cloud Computing
Oracle s Strategy/Solution 1. Physical Consolidation 2. Data Consolidation 3. Application Platform & Access Consolidation Business Systems (Application) Consolidation Integration
Scale Out as you Grow on Low Cost Hardware Applications A, B, C, D, E Net Workload Oracle Shared Instance If utilization too high, increase capacity Server A Server B Server C Server D Scale-out on-demand World-class clustering at all levels: database, middleware, storage Scale out as workload increases Add/Remove nodes on-demand Pay-as-you-grow scale-out Lower upfront CapEx and ongoing OpEx Green footprint Right-sized capacity planning Smaller, standard machines running at higher utilization Defer equipment procurement Take advantage of advances in hardware price-performance and energy efficiency
Consolidation with Grid Computing Workload Application A Application B Application C Application D Application E Avg Utilization <20% Take advantage of complementary workload peaks Server A Server B Server C Server D Applications A, B, C, D, E Net Workload Oracle Shared Instance Avg Utilization 70% Server E Virtualization and clustering enable consolidation Higher utilization rates and efficiency Lower CapEx & OpEx Server A Server B Server C Server D Server E Freed capacity to deploy elsewhere Green footprint
Economics of Cloud Computing Capacity Demand Capacity Demand Time Time Static data center Data center in the cloud Unused resources
Source: AMR, Database Consolidation: reducing cost and complexity
Database Trends 2. Auditing and Security
Compliance: Legal, Regulatory and Industry Mandates Organizations today face a growing number of regulations that mandate the accuracy, protection and reliability of information
ZVOP 1 Zakon o Varovanju osebnih podatkov Zakon o varstvu osebnih podatkov - ZVOP-1 (Uradni list RS, št. 86/04 z dne 5. 8. 2004), 14. Člen, 2. Točka Pri prenosu občutljivih osebnih podatkov preko telekomunikacijskih omrežij se šteje, da so podatki ustrezno zavarovani, če se posredujejo z uporabo kriptografskih metod in elektronskega podpisa tako, da je zagotovljena njihova nečitljivost oziroma neprepoznavnost med prenosom. 24. Člen, 5. Točka omogoča poznejše ugotavljanje, kdaj so bili posamezni osebni podatki vneseni v zbirko osebnih podatkov, uporabljeni ali drugače obdelani in kdo je to storil, in sicer za obdobje, ko je mogoče zakonsko varstvo pravice posameznika zaradi nedopustnega posredovanja ali obdelave osebnih podatkov.
Oracle Database Security Protect Data Monitoring Configuration Management Audit Vault Total Recall Access Control Database Vault Label Security Encryption and Masking Advanced Security Secure Backup Data Masking 2009 Oracle Corporation Proprietary and Confidential
Total Recall Option Flashback Data Archive ORDERS Select * from orders AS OF Midnight 31-Dec-2004 User Tablespaces Oracle Database Archive Tables Flashback Data Archive Long term retention - years Automatically stores all changes to selected tables in Flashback Data Archive Archive cannot be modified Old data purged per retention policy View table contents as of any time using Flashback Query Uses Change tracking/long term history ILM Auditing Compliance
Database Trends 3. ILM - Information Lifecycle management
Traditional Storage Approach All data resides on single storage tier High Performance Storage Tier = $72 per Gb Active All data on active = $972,000!
Information Lifecycle Management Reduce storage costs accordingly High Performance Storage Tier = $72 per Gb Low cost Storage Tier = $14 per Gb Read only Storage Tier = $7 per Gb 5% Active 35% Less Active 60% Historical $49,800 $67,700 $58,000
The Lifecycle of Data Vast amounts of data are retained for business & regulatory reasons Need to optimize the cost of retaining data This Month This Year Previous Years Active Less Active Data Lifecycle Historical Archive
Database Trends 4. XML & Unstructured Data Management
New in Oracle Database 11g Critical New Data Types RFID Data Types DICOM Medical Images 3D Spatial Images
Oracle Database 11g Release 2 Database File System (DBFS) Network File System interface for the database File system calls passed to DBFS client Also provides shell interface PL/SQL package implements file calls File create, open, read, list, etc. Files stored as LOBs using Secure Files DBFS Links Metadata stored in tables OCI Linux File System Call
XML in the database XML being used to manage mission critical information Interchange with external organizations Web Services Need to manage XML effective and efficiently Number and size of documents increasing Reliability, Scalability, Availability Security Compliance Accurate and fast information location and retrieval
11gR1 : XML Index New universal index for Binary and Text based XMLType storage models Addresses all known issues with CTX-XPath index Optimizes most common classes of Path Expressions Recursive, Relative, Lazy (//) Accelerates path & value based predicates Fully type aware Optimizes numeric and date range predicates Fully namespace aware
Xquery example SELECT XMLQuery( for $i in ora:view("regions"), $j in ora:view("countries") where $i/row/region_id = $j/row/region_id and $i/row/region_name = "Asia" return $j' RETURNING CONTENT) AS asian_countries FROM DUAL; <ROW> <COUNTRY_ID>AU</COUNTRY_ID> <COUNTRY_NAME>Australia</COUNTRY_NAME> <REGION_ID>3</REGION_ID> </ROW>...
Database Trends 5. Predictive Analytics
Competitive Advantage Competitive Advantage of BI & Analytics Optimization Predictive Modeling $$ What s the best that can happen? What will happen next? Analytic$ Forecasting/Extrapolation What if these trends continue? Statistical Analysis Why is this happening? Alerts What actions are needed? Query/drill down Ad hoc reports Where exactly is the problem? How many, how often, where? Access & Reporting Standard Reports What happened? Degree of Intelligence Source: Competing on Analytics, by T. Davenport & J. Harris
Oracle Data Mining Algorithms & Example Applications Attribute Importance Identify most influential attributes for a target attribute Factors associated with high costs, responding to an offer, etc. Classification and Prediction Predict customers most likely to: Respond to a campaign or offer Incur the highest costs Target your best customers Develop customer profiles Regression Predict a numeric value Predict a purchase amount or cost Predict the value of a home A1 A2 A3 A4 A5 A6 A7 Married >$50K Gender Income <=$50K Age M F >35 <=35 Status Gender HH Size Single F M >4 Buy = 0 Buy = 1 Buy = 0 Buy = 1 Buy = 0 <=4 Buy = 1
Oracle Data Mining Algorithms & Example Applications Clustering Find naturally occurring groups Market segmentation Find disease subgroups Distinguish normal from non-normal behavior Association Rules Find co-occurring items in a market basket Suggest product combinations Design better item placement on shelves Feature Extraction Reduce a large dataset into representative new attributes Useful for clustering and text mining F1 F2 F3 F4
Database Trends 6. Semantic Web
Oracle Semantic Database Manages relationships for massive collections of structured & unstructured data Powerful indexing for enduser discovery of related content Rich platform for data integration, repurposing, quality ctrl., classification Tactical, non-invasive, iterative solution for strategic modernization Standards-based: SQL, XML, RDF, OWL, SPARQL, SKOS Semantic Aggregation & Navigation of Data
Storage & Loading Oracle 11g RDF/OWL Graph Data Management Native W3C RDF graph data store Fast Bulk, batch & Incremental load Query SQL: SEM_MATCH graph pattern query SPARQL: supported via Jena plug-in Reasoning RDF, OWL Prime, RDF++ semantic rules Forward chaining inference model User defined rule base Scalability Scales to billions of triples Partitioning, RAC, Adv. Compression Standards & Interoperability Aligned with W3C specifications Supported by leading semantic tools Structured DBMS, Unstructured, Spatial, RSS, email, Documents
Case Study: National Intelligence Ontology Engineering Modeling Process Web Resources Information Extraction Categorization, Feature/term Extraction RDF/OWL Processed Document Collection OWL Ontologies Domain Specific Knowledge Base News, Email, RSS Content Mgmt. Systems Explore Browsing, Presentation, Reporting, Visualization, Query Analyst
Database Trends 7. Real-Time (In Memory Databases)
Oracle TimesTen In-Memory Database Application TimesTen Client lib Client- Server Network Application Application Application TimesTen TimesTen TimesTen Libraries Libraries Libraries In-Memory Database Direct-linked Connection Transaction Logs Checkpoint files In-memory RDBMS Entire database in memory Standard SQL with JDBC, ODBC, OCI, Pro*C, PL/SQL Compatible with Oracle Database Exceptional performance Instantaneous response time High throughput Embeddable Persistent and durable Transactions with ACID properties Real-time services On-line, non-blocking operations Database change notification
Response Time in Microseconds Significant Response Time Improvement In-Memory Database Cache + Oracle Database 12.000 10.000 Oracle Oracle +In-Memory Database Cache 10114 8.000 6.000 6104 6487 5836 4.000 2.000 0 Delete Call Fwd 1848 1850 2105 168 44 65 86 201 128 100 Select Access Data Select Base Data Select New Dest Insert Call Fwd Update Subscriber Update Location Response time improvement for a sample application before and after using In-Memory Database Cache
Server Memory at $500 per Gigabyte Price of 1 Gigabyte of RAM Over Time* $50,000 $40,000 $40,000 $30,000 $20,000 $10,000 $0 $3,000 $500 1986 1996 2004 *Source: Kingston Technology: Current prices are for Sun Fire 6800, HP Integrity rx8620, IBM eserver pseries 670 (based on 2 gigabyte units)
Database Trends 8. Cache Hierarchy
Semiconductor Cache Hierarchy Massive throughput and IOs through innovative Cache Hierarchy Database DRAM Cache 100 GB/sec Flash Cache 50 GB/sec raw scan 1 million IO/sec Disks 21 GB/sec scan 50K IO/sec
The Disk Random I/O Bottleneck Disk drives hold vast amounts of data But are limited to about 300 I/Os per second Flash technology holds much less data But can run tens of thousands of I/Os per second Ideal Solution Keep most data on disk for low cost Transparently move hot data to flash Use flash cards instead of flash disks to avoid disk controller limitations Flash cards in Exadata storage High bandwidth, low latency interconnect 53
In-Memory Parallel Execution How it works
Source: Transaction Processing Council, as of 9/14/2009: Oracle on HP Bladesystem c-class 128P RAC, 1,166,976 QphH@1000GB, $5.42/QphH@1000GB, available 12/1/09. Exasol on PRIMERGY RX300 S4, 1,018,321 QphH@1000GB, $1.18/QphH@1000GB, available 08/01/08. ParAccel on SunFire X4100 315,842 QphH@1000GB, $4.57 /QphH@1000GB, available 10/29/07. In-Memory Parallel Queries New QphH: 1 TB TPC-H 1.018.321 1.166.976 One Sun Oracle Database Machine rack 400GB of DRAM usable for caching Exadata Hybrid Columnar Compression enables 4TB data in DRAM 315.842 Database release 11.2 introduces parallel query processing on DRAM cached data Harnesses DRAM capacity of entire database cluster for queries Technology for world record benchmark ParAccel Exasol Oracle
Database Trends 9. Datawarehouse Databases & Appliances
Storage Bottlenecks Today, database performance is limited by storage Storage systems limit data bandwidth from storage to servers Storage Array internal bottlenecks SAN bottlenecks Random I/O bottlenecks due to physical disk speeds Data Bandwidth limits severely restrict performance for data warehousing Random I/O bottlenecks limit performance of OLTP applications
Sun Oracle Database Machine
Exadata Architecture
Exadata Database Processing in Storage New Exadata storage servers implement data intensive processing in storage Row filtering based on where predicate Column filtering Join filtering Incremental backup filtering Storage Indexing Scans on encrypted data Data Mining model scoring 10x reduction in data sent to DB servers is common No application changes needed Processing is automatic and transparent Even if cell or disk fails during a query
Traditional Scan Processing SELECT customer_name FROM calls WHERE amount > 200; Smart Scan Example: Telco wants to identify customers that spend more than $200 on a single phone call The information about these premium customers occupies 2MB in a 1 terabyte table With traditional storage, all database intelligence resides in the database hosts Very large percentage of data returned from storage is discarded by database servers Discarded data consumes valuable resources, and impacts the performance of other workloads
Exadata Smart Scan Processing SELECT customer_name FROM calls WHERE amount > 200; Only the relevant columns customer_name and required rows where amount>200 are are returned to hosts CPU consumed by predicate evaluation is offloaded to Exadata Moving scan processing off the database host frees host CPU cycles and eliminates massive amounts of unproductive messaging Returns the needle, not the entire hay stack
Simple Query Example What were my sales yesterday? Optimizer Chooses Partitions and Indexes to Access Scan compressed blocks in partitions/indexes Select sum(sales) where Date= 24-Sept Retrieve sales amounts for Sept 24 SUM 10 TB scanned 1 GB returned to servers
Exadata Hybrid Columnar Compression Data is stored by column and then compressed Query Mode for data warehousing Optimized for speed 10X compression ratio is typical Scans improve proportionally Archival Mode for infrequently accessed data Optimized to reduce space 15X compression is typical Up to 50X for some data Up To 50X
Exadata Hybrid Columnar Compression How it works Tables are organized into sets of a few thousand rows called Compression Units (CUs) Reduces Table Size 4x to 40x Within Compression Unit, data is Organized by Column and then compressed Column organization brings similar values close together, enhancing compression Useful for data that is bulk loaded and queried Update activity is light
Exadata I/O Resource Management Mixed Workload Environments With traditional storage,creating and managing shared storage is hampered by the inability to balance the work between users on the same database or on multiple databases sharing the storage subsystem Hardware isolation is the approach to ensure separation Exadata I/O resource management ensures different users and tasks within a database are allocated the correct relative amount of I/O resources For example: Interactive: 50% of I/O resources Reporting: 30% of I/O resources ETL: 20% of I/O resources
Benefits Multiply
10. Q & A