Understanding the SOA Infra Database Mark Simpson Consultancy Director, Griffiths Waite
Your Speaker Mark Simpson Consultancy Director Griffiths Waite > 19 years Oracle development and architecture experience > 1 st UK Oracle ACE Director for SOA > Book Author SOA and FMW > 1 st BPEL project 2005, 1 st BAM 2006 > Regular Speaker UKOUG Technology Conference 10 th Year Butler Group (Ovum) Master classes Gartner, IDC Oracle Open World x 3yrs SOA Symposium > Oracle SOA Community Award Winner 2009, 2010 and 2011 Global Fusion Middleware Innovation Award 2012
10 years of Speaking at UKOUG Tech 2004 Oracle 9i Advanced Queuing for Business Integration 2005 Delivering the Adaptive Enterprise with BPEL and BAM 2006 Empowering the Business with Oracle Business Rules 2006 BAM Closing the Loop of Process Improvement 2007 SOA Customer Success Case Studies of Process Improvement 2007 Oracle BPA and SOA Best Practices 2008 Migrating Oracle Forms to SOA&ADF 2009 SOA Design Best Practices 2010 BPM and SOA Design Patterns 2010 Key Success Factors for Fusion (Middleware) Projects 2011 Enterprise Architecture and SOA Governance 2011 SOA Case Study for Business Value 2012 EDN and BAM 2012 SOA Case Study Innovation Award Winner 2013 Understanding SOA Infra Database 2013 Whiteboard Overview of Fusion Middleware 2013 Building a SOA Reference Architecture
Agenda > Why do we need to understand SOA_Infra DB > Understanding the Service Engines > SOA DB Persistence BPEL Mediator Human Task BPM OSB EDN > Tuning EM and SOA DB > Understanding Purging > Project Scenarios Lessons Learnt
SOA_INFRA is the Persistence Layer for SOA > Oracle SOA Suite dehydration store captures the Instance Data from SOA composites > Why should we care about SOA_INFRA Enterprise Manager can become slow to navigate with large volumes scripts can be quicker DB Management is essential for smooth running SOA engines, persistence is a large overhead Analysing data in SOA_INFRA helps identify unnoticed issues and helps look for bottlenecks, stuck messages reconcile messages > Would you build a Java app without consideration for the Database Layer Bad Design Decisions can put a large weight on the Database Helps you understand how SOA Engines work > What can the impact be EM Console Unusable JTA transaction needing to be raised Messages appearing to be stuck High Engine Faults requiring recovery slow start up times may be a tell tale sign
Understand the Service Engines Mediation BPEL orchestration MDS Business Rules SOA Composite Editor BAM Human Tasks Mediation Web services and adapters Optimized binding BPEL Human Task Rules SCA Composite Mediator Business Rules IDE BPEL Human Workflow 11g Service Infrastructure Common JCA-based connectivity infrastructure Oracle Service Bus Application composers Policy Manager B2B SOA Operations Web-based console SOA_INFRA 6
Anatomy of a Composite Driving Tables.. COMPOSITE_INSTANCE CUBE_INSTANCE (BPEL) MEDIATOR_INSTANCE WFTASK (Human Workflow) BRDECISIONINSTANCE (Rules) BPM_CUBE_PROCESS REFERENCE_INSTANCE
Other useful tables XML_DOCUMENT - Stores process input/output msg and large XML variables (inc dlv), linked to composites via INSTANCE_PAYLOAD [Use Java API to get payload well documented] CUBE_SCOPE Stores the scope variables and objects AUDIT_TRAIL Stores information for the EM Console flows in XML AUDIT_DETAILS Additional Audit Details as defined by the audit levels e.g. assigns) WORK_ITEM Stores Activities created by BPEL (rollback points), onmessage branches will result in one WORK_ITEM per branch WI_FAULT Stores recoverable and non-recoverable BPEL faults DLV_MESSAGE Deferred Processing MEDIATOR_DEFERRED_MESSAGE stores parallel mediator MEDIATOR_CASE_INSTANCE one row per routing rule
Other useful tables BPM_AUDIT_QUERY - Stores BPMN process input/output msg and activity inputs and outputs. Can grow fast - managed by audit levels BPM_CUBE_AUDIT_INSTANCE Stores the information on each instance within the BPMN process BPM_CUBE_ACTIVITY Static view of all activities in a deployed process AQ$EDN_EVENT_QUEUE_TABLE / AQ$EDN_OAOO_DELIVERY_TABLE EDN Messages WLI_QS_REPORT_DATA blob containing reporting information written from OSB, linked to OSB services via WLI_QS_REPORT_ATTRIBUTE
Some Important Columns > Composite Distinguished Name (Composite_Dn) domain/compositename!revision*label E.g. default/soaorderbooking!1.0*2006-07-09_12-23-10_112 FYI - 2006-07-09_12:23:10_112 is the MDS label assigned to the composite during deployment > Execution Context ID (ECID) Common through all linked Composites, beware of inadvertently linking 100s of composites (e.g. looping through a file) Links OSB invocations into the flow, Appears in JVM thread dump, useful to pass to 3 rd party systems > CMPST_ID and CIKEY on Cube_instance CMPST_ID Links BPEL Process to its containing Composite CIKEY is PK for BPEL Instance and useful in EM BPEL Engine for faster access to flow or for Java Error Recovery Utilities
Understand the Service Engines - Mediator > Sequential Routing Mediator Mediator_Case_Instance Mediator_Case_Detail > Parallel Routing Mediator, Mediator_Case_Instance, Mediator_Case_Detail Mediator_Deferred_Message DLV_Message Multiple Threads, Multiple Transactions Invoker Thread Audit Only SOA INFRA Deferred Message Processing and Audit SOA INFRA Locker Thread Blocking Queue
Mediator Database usage > When a Mediator Service Engine is started, it registers itself in the database and gets a container id Used by Mediator to track which nodes are available Each node gets a (different) container id Note : we have seen problems when nodes get started concurrently with old messages having stale container ids resulting in stuck messages CONTAINER_ID RENEWAL_TIME -------------------------------------------------------- ------------------------------------------ ------------- C224C100D91711E28F05691135CDF565 21-JUN-13 11.53.35.140000000 AM GB-EIRE 9AEC5170D91711E2BF864990FAA7619F 21-JUN-13 11.53.58.312000000 AM GB-EIRE > MEDIATOR_DEFERRED_MESSAGE_PAYLOAD gets populated on initial thread and moves through the following states - state 0 ready, 1 locked, 2 done, 3 faulted.
Understand the Service Engines BPEL > BPEL Process Cube_Scope is a key table Audit_Trail is written seperately They may be written in separate transactions depending on Async audit policy > Dehydration Points Common to see rollbacks to previous point in failure Wait, Receive, Reply, Dehydrate, onmessage, onalarm, End Transaction=participate extends transaction
Composite and Cube States > A-team Blog https://blogs.oracle.com/ateamsoab2b/entry/list_of_all_states_from Addendum : Composite Instance States 34- Open and Faulted 36- Running with recovery required and unknown state
Work Item States > Find recoverable activities select * from work_item where state = 1 and execution_type!= 1 STATE 0: Inactive 1: Open Active 2: Open Suspended 3: Open Pending Complete 4: Open Faulted 5: Closed Completed 6: Closed Finalized 7: Closed Pending Cancel 8: Closed Cancelled 9: Closed Faulted 10: Closed Aborted 11: Closed Compensated 12: Closed Stale 13: Open Pending Recovery Execution Type: 0: Normal 1: Scheduled
Some Simple Examples to understand core table > Mediator Simple Sequential Rule Parallel Deferred Routing Synchronous Faults Asynchronous Faults > BPEL BPEL Scopes and activities Mediator wiring to BPEL BPEL with a wait BPEL Faults
Useful Scripts monitor usage > Number of BPEL Instances per day.sql > Number of Cube Scopes per day.sql > Space Monitoring of Dehydration DB.sql > EDN_Monitoring.sql > Faulted Messages of a particular Error Msg.sql > AllFaults.sql > Composite Fault Recovery > Service Engine Recovery
Monitoring Cube Instance > Instance Summary.SQL > l
Event Delivery Network (EDN) > Log Messages http://prodsoa.mfl.co.uk/soa-infra/events/edn-db-log select * from edn_log_enabled; select count(1) from edn_log_message; Shows Enqueuing and Dequeing from EDN > AQ$_EDN_EVENT_QUEUE_TABLE_S 1 EDN_EVENT_QUEUE EDN_JAVA_SUBSCRIBER 3 EDN_EVENT_QUEUE EDN_SQL_SUBSCRIBER
Enterprise Manager Usage > EM queries are heavy on the DB and often produce timeouts/stuck threads. EM is an application to tune for specific usage 1.Educate support users on EM navigation and usage 2. Disable fetching of instance metrics by default 3. Restrict default search display to 0minutes 4. EM authenticates, loads targets and displays page. Loading targets can be cached, e.g. set oracle.sysman.emas.discovery.wls.fmw_discovery_use_cached_results=true 5. Increase Perm Size of Admin Server JVM e.g. MEM_PERM_SIZE_64BIT="-XX:PermSize=1024m 6. Decrease the frequency of DMS Application 7. Disable mserver JMX notifications of state changes 8. Ensure users use personal logins
Database Considerations > Tune and Monitor Weblogic SOA Data Sources SOA Data Source, Used for transaction processing SOA Local Tx Data Source, Used for audit trail > SOA DB Maintenance Purging, define a realistic retention period. Do not use for audit purposes, create archive schema if required. Gather Schema Stats and Index Rebuilds > SOA DB Tuning (inc. EM Queries) - See DB tuning whitepaper, for example AWR reports for Indexes (e.g. Composite_Sensor_Values) Specific Tuning advice for large tables (e.g. Hash partitions, Secure LOB etc.). Candidate include AUDIT_COUNTER, CUBE_INSTANCE, CUBE_SCOPE, MEDIATOR_CASE_INSTANCE, XML_DOCUMENT DB Parameters, such as.. > session_cached_cursors (e.g. 50 ->1000) > sessions and processes (tune in accordance with data sources e.g. 2000) > sga_max_size & sga_target (e.g 8GB ->16GB) > trace_enabled (e.g. switch off if DB performance is an issue)
Purging Strategies > Loop Purge Inefficient, but allows state to be ignored. Good for small installations (< 100GB) > Parallel Purge More Efficient but has restrictions > Table Recreation Scripts (TRS) Backs Up and restores just retained data, introduced in 11.1.1.7 > Partitioned Data Will greatly enhance the efficiency of purging with large environments (> 500GB) > Truncation Not supported, but useful in local or development Environments
Purging Approach and Tuning > Use a combination of parallel purge for historic & loop purge for old running instances > Purge routines create temp tables first with candidates for purge (e.g. ECID_PURGE, TEMP_CUBE_INSTANCE etc.) Experiment with Batch Sizes and monitor CPU and DB Performance batch_size => 10000, max_runtime => 240, min/max_creation date (limit this to periods to better monitor progress) retention_period (how much data to keep, must be >= max_creation_date) DOP => 4, (degree of parallelism) max_count => 1000000 (used to limit the records in the temp ecid_purge table) This will run for 4 hours. increase dop if CPU allows, decrease the batch size if step of creating the temp tables is slow, raise batch size if tracing shows this happens very fast. > Purge Routines may have inefficient SQL in your environment e.g. a SQL Profile needed to be added to help tune.. INSERT INTO reference_instance_purge SELECT ID, ECID FROM REFERENCE_INSTANCE WHERE created_time BETWEEN '01-JAN-13 12.00.00.000000 AM' AND '31-JAN-13 12.00.00.000000 AM' AND ROWNUM <=2293791 and composite_instance_id is null
Purging Identify Purgeable Instances select to_char(trunc(partition_date,'mm'),'yyyy-mm') Month_Year, decode(state, 1,'1 - Completed', 2,'2 - Running with faults', 3, '3 - Completed with faults', 16, '16 - Running with terminated', 17, '17 - Completed with terminated', 19, '19 - Completed with faults and terminated', 64, '64 -?') "State", count(distinct ecid) from composite_instance where ( bitand(state,127)=1 OR bitand(state,6) =2 OR bitand(state,16) =16 OR bitand(state,64) =64 OR state between 32 and 63 OR state = 3 OR state =19) group by to_char(trunc(partition_date,'mm'),'yyyy-mm'), state order by to_char(trunc(partition_date,'mm'),'yyyy-mm'), state asc; This SQL will identify Purgaeable Composites, but if the composites have any running mediator or cubes below it the ecids will be remove from purge list
Distinct ECIDs keep track of purging > Purgeable Instances This SQL will identify Purgaeable Composites, but if the composites have any running mediator or cubes below it the ecids will be remove from purge list select to_char(trunc(partition_date,'mm'),'yyyy-mm') Month_Year, decode(state, 1,'1 - Completed', 2,'2 - Running with faults', 3, '3 - Completed with faults', 16, '16 - Running with terminated', 17, '17 - Completed with terminated', 19, '19 - Completed with faults and terminated', 64, '64 -?', state) "State", count(distinct ecid) from FMWC_SOAINFRA.composite_instance group by to_char(trunc(partition_date,'mm'), YYYY-mm'), state order by to_char(trunc(partition_date,'mm'),'yyyy-mm'), state asc; > Total Rows PurgeMonitoring-key_tables.sql
Purging Tips / Lessons Learnt > Deleting smaller chunks or data in a single transaction more efficient than deleting huge data together > Temporary Tables are created each run and is time consuming with large batches, rollback can be resource heavy if purge is aborted. > Small and Often can be run in daytime, large parallel should be out of hours > ECIDs that are active over purge windows will not be removed > Purge Scripts will still leave some dangling reference see Oracle Support Note > Do not run and Purging and heavy DB operations (e.g. RMAN validation ) > Don t forget to purge OSB reporting schema > Reporting will leave orphaned records (e.g. XML_PAYLOAD, INSTANCE_PAYLOAD)
Case Study running on SOA 11.1.1.4 > 5 SOA partitions 200 + composites > 1.1 TB of size, 800m rows > 1 million Instances Per Day > Limited Usage of OSB > Many instances appearing in Engine Recovery (Activity & Invoke) > Business Impact lost messages, backlogs and delays > High Application Support usage of Enterprise manager > Frequent Stuck threads - restarts > 4 month Rectification Project > Parallel & Loop Purge. 11.1.1.6 purging scripts backported to support ignore_state. EDN purged. > EM usage reduced through provision of direct SQL checks and Java Recovery Utility and EM Tuning > A-Team Assistance few hours a week through Rectification Project > Mediator Engine Tuning Parallel Worker threads increased, BPEL invoker increased > DB Tuning Hash Partition Index added, Cube Scope tuned > Composites refactored (70% reduction) & redesigned with transactions & threads in mind > Stuck Thread Analysis standardised
Case Study LOB Contention Description Active Sessions Percent of Activity 1 High Watermark Waits 30.74 51.96 4 2 "User I/O" wait Class 9.79 16.55 0 3 Top SQL Statements 7.52 12.71 5 4 Global Cache Messaging 7.21 12.18 1 5 Buffer Busy - Hot Objects 6.91 11.68 1 6 Commits and Rollbacks 5.4 9.12 2 7 Top Segments by "User I/O" and "Cluster" 2.8 4.73 2 8 I/O Throughput 2.37 4.01 1 9 Unusual "Other" Wait Event 1.69 2.86 3 10 Global Cache Busy.78 1.32 1 ADDM Report
Check for any Purge Patches > We applied in 11.1.1.4 Patch 12746784 (https://updates.oracle.com/arulink/patchdetails/process_form?patch_num=12746784): Clean up dangling references from the DLV_MESSAGE table after purge. Patch 13615085 (https://updates.oracle.com/arulink/patchdetails/process_form?patch_num=13615085): Purge script hanging; resolves a performance issue that essentially hangs the purge procedure. 11.1.1.6 Purge Scripts backported to 11.1.1.4 Check Indexes on Purge Temp Tables create index temp_cube_instance_idx on temp_cube_instance(cikey); create index dlv_message_cikey_idx1 on dlv_message(cikey); You may need create index on DOCUMENT_DLV_MSG_REF.DOCUMENT_ID; create index on REFERENCE_INSTANCE.COMPOSITE_INSTANCE_ID; create index on DLV_MESSAGE(RECEIVE_DATE,ECID );
SOA DB Lessons Learnt Project Experiences > Do not Neglect SOA DB maintenance Use AWR and ADDM for tuning SQL and adding SQL Profiles Perform advice in DB tuning white paper Hash Partition indexes, Table Partitioning etc. Ensure Purging is considered pre Go Live (on by default in 12c) much harder retrospectively > Consider DB impact in your design Think about auditing levels (Dev, Prod, Off) understan implications of Off. Consider the number of composites you will produce (e.g. should logging monitor service be really implemented on as Composite?) Think about size of payloads, transaction times & thread usage (or DB bottlenecks appear) > Manage Enterprise Manager Usage Personal login credentials are essential Apply EM Tuning and educate on how to use EM and supporting SQL queries
Thank You Any Questions Q & A Monday Tuesday Wednesday Twitter: mark_gw Blog: griffithswaite.wordpress.com Email: mark.simpson @griffiths-waite.co.uk