Exadata in the Retail Sector Jon Mead Managing Director - Rittman Mead Consulting
Agenda Introduction Business Problem Approach Design Considerations Observations Wins Summary Q&A What it is not...
Introductions Rittman Mead Consulting Oracle BI Consultancy Platinum Partner Long(est) running Oracle BI blog Annual BI Forum OBIEE Oracle Press book On-line retailer UK based and leading Internet based Jon Mead Managing Director/Co-Founder BI Consultant
Business Problem It s a simple case of supply and demand... A retail organisation needs to be able to make pricing and purchasing (supply) decisions based on a near real time view of what is happening in the its marketplace (demand).
Business Problem Competitive Advantage
Some examples in retail How many units did we sell last month? Do suppliers meet delivery targets? Are retail promotions attracting additional trade? What is the lifetime value of a customer? How is the credit crunch affecting our sales? What would happen if we sold a new product line based on a customer s likelihood to buy? How should we market, to whom and what product? T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Business Problem Very fast changing market Key Drivers Optimise product line Optimise pricing Optimise stock levels Coupled with Fraud analysis Customer tracking Order tracking Maximise profit Maximise customer satisfaction Turn data into information quicker than their competitors
Business Problem Current Planned Data Up to a monthly lag on certain data Near real time Queries Could take up to 24 hours Near instant response
What would be needed Ability to process large volumes of data Volume of data - 1000 s of transactions a minute at peak periods Complexity of the transformation - Order lifecycle Ability to report on large volumes of data Year on year reporting Cross fact analysis Ad-hoc queries
What would be needed Fast Data Processing
What would be needed Fast Data Processing Querying
Architecture Change Data Capture Asynchronous Oracle Warehouse Builder SCD Framework Oracle Reference Data Warehouse Architecture Staging ODS Performance layer OBIEE Dashboards Answers
CDC Staging ODS Exadata Performance OBIEE
Design considerations Do we change the design for Exadata? Best practices Performance Where is processing done Do we need a Data Warehouse? Data quality Tracking dimensional change Combining sources in OBIEE with high volumes of data can lead to performance problems WHATEVER the data source Preconceptions Don t use indexes Use HASH partitioning
Observations (1) It seems as the memory-intensive processes such as SQL analytics perform roughly the same However Exadata does have a lot of RAM so PGA should be large IO intensive operations are lightning fast. A pivot operation on 80M rows completed in a < 10s
Observations (2) Parallelism and hence performance works best with partition-wise operations HASH partition on business keys for dimensions Would have been good to HASH the fact on the same key for partition-wise joins on same RAC node - Reduce interconnect traffic - SCD may prevent this anyway RANGE partitioning may reduce performance benefits of HASH partitioning, however it make archival easier Indexes Still useful when a specific rows required, e.g. tree-walking with CONNECT BY PRIOR Not required for bulk operations
Observations (3) Advanced compression Big gain was disk space Sequences Must increase size of Sequence pool Common to all RAC instances Patching There is still some complexity in the patching process Not a magic wand It is still possible to produce badly designed star schemas It is still possible to produce badly designed OBIEE repositories It is still possible to run a query requesting every cell of data in the database It can still run slow
Wins - the numbers... ETL 10 facts; 15 dimensions; 300 tables; 157 OWB mappings - Daily load: 1 hour 30 minutes - Daily load : approximately 130 million rows pushed through staging area and into the ODS and the Data Warehouse One of the facts is refreshed with approximately 100 million rows and only takes 10 minutes Reporting Reports could be run over multiple fact over multiple years Report that took 24 hours to run now completes in 15 minutes
Summary Benefits Organisation now has near real time supply and demand data Ad-hoc reporting capability Reduced turnaround of generating reports from 2 to 3 days of spreadsheet crunching to instant available dashboard One source of truth (no spreadsheets) Made possible by Extreme performance in the ETL layer Extreme performance in the Reporting layer
Questions