Value Realization at Johnson Controls using SAP HANA smart data integration Steve Carpenter Johnson Controls Ryan Champlin - SAP
Agenda What is SAP HANA smart data integration? Use cases for Smart Data Integration at Johnson Controls Value realized at Johnson Controls using Smart Data Integration
SAP HANA smart data integration and smart data quality Simplified landscape, accelerated performance and an open framework Transactions Traditional SAP HANA ETL Mix of Potentially Stale and Current Data SAP HANA In Memory Platform Current Data SDA Aggregate HANA EIM Services SDQ SDI Multiple Data Sources Replication On Premise and Cloud sources of Data Separate ETL, Replication and SDA Op RDBMS Other Sources Integrated ETL, Replication and SDA
SAP HANA SDI/SDQ Overview SDI: smart data integration Batch extraction and target-based CDC (HANA) Automated delta extraction -> real-time push of changes Data distribution to target (Hana->Target) Advanced data transformation capabilities Custom source/target development through adapter SDK (Java) Fully integrated HANA development experience (Studio/WebIDE, Extends SDA, Cockpit, Lifecycle Management) SDQ: smart data quality Global address cleansing that has support for over 240 countries Geocode enrichment for countries around the globe Matching and deduplication Data quality metadata to give insight and clarity into data quality changes applies
SAP HANA smart data integration Data Replication Overview WebIDE based replication editor Choose one or many tables Initial load only or with ongoing real time replication Available Configuration Target Columns Customize the target table to your needs Filter Limit what s being replicated Replication Behavior Change the default replication behavior o Identify changed records for consuming applications e.g. SAP BW, SAP Data Services, etc. o Create a history table Logical Partitions Decrease initial load times
SAP HANA smart data integration Data Integration Batch data integration HANA Studio or HANA Web-based Development Workbench design-time UI Design simple or complex data flows Available Transformations Basic SQL Aggregation, Filter, Join, Sort and Union Advanced SQL Case, Lookup, Pivot and Unpivot Data Lifecycle Date Generation, History Preserving Map Operation, Row Generation and Table Comparison Data Quality Cleanse, Match and Geocode Code Execution AFL, Procedure and R Script
SAP HANA smart data integration Real-time push to replicate sources Sources with change data capture (CDC) capability: HANA, Teradata, ECC, DB2, MS SQL Server, Oracle Transaction based: Reading DBMS transaction logs and pushing committed changes on table to be replicated Trigger-based: for each table to be replicated, DB trigger queues record changes to be pushed Guaranteed delivery for real-time push replication stream is disrupted or HANA temporarily goes down. Provides batch pull mode for all types of sources Supported SDI adapters (sources): SAP ECC, DB2, Oracle, MS SQL Server, File, OData, Teradata, HANA, Sybase ASE Distributes data in real-time or batch Supported targets from HANA: Sybase ASE, File, HANA, Teradata, DB2, Oracle, MS SQL Server
Product Differentiation for Data Integration SAP LT Replication Server SAP Sybase Rep Server SAP HANA smart data integration Real-time replication SAP Sources Primarily Impact to source systems NW Stack SAP Data Services Real-time replication Non-SAP Sources No transformations CEP/SDS Real time replication (log-based) Batch SAP and Non-SAP Sources Data transformations Data quality Natively built in HANA Batch Data transformation Data quality SAP and Non-SAP Sources Real-time Event streams
Agenda What is SAP HANA smart data integration? Use cases for Smart Data Integration at Johnson Controls Value realized at Johnson Controls using Smart Data Integration
Wherever they live, work or travel, people all over the world are touched by Johnson Controls products and services every day.
Johnson Controls Overview* (*prior to 2015 results) $42.8 billion Johnson Controls achieved record full-year revenues in 2014. 170,000 employees serving customers in 40 30 20 10 REVENUE $BILLIONS 04 05 06 07 08 09 10 11 12 13 150+ countries
Johnson Controls IT Landscape
Enterprise Data COE Charter Identify & Standardize the use and governance of information in support of the overall global business strategy Enterprise Master Data Hubs (Key Hubs: Supplier, Customer, Direct Materials, Indirect Materials, Employees, Finance, Reference) SAP MDG Develop and maintain controls on data quality, interoperability, and sources to effectively manage corporate risk High Velocity Data Cleansing & Migration SAP IS, DS, BOA Accelerators Decommissioning & Archiving System Data SAP ILM Ongoing Data Quality Operations SAP IS Collaborate w/business to define maintenance processes and teams SAP MDG Develop methods to ensure consistent application and use of analytics Enterprise Data Warehouse (Key Tenets: Performance, Enterprise Model, Data Federation, Agility, Reactive & Predictive, Compliance Forensic Discovery) SAP HANA, SAP HANA EIM, Hadoop, Legacy Data Warehouses
EIM Logical Architecture
EIM Conversion Flow & Risks SAP SAP Copy EBS MFGPRO EBS Copy BODS SAP Ref OTHERS FLAT FILES HDFS HIVE LAND HIVE TRXFRM TRANSFORM LOAD SAP Target MAPICS SQOOP BOA LAWSON Problem Statement Master Data is typically cutover 4-6 weeks prior to business go-live (no concerns) Transactional Tables that are large in nature (~100MM+ records) require special handling during cutover Goals Extract data in a timely manner from the source Analyze and POC alternatives to current techniques and tools (SDI, SLT, SRS, extractors, ABAP Data Flows, Golden Gate) targeting large data tables
SDI POC Overview Business-Relevant Tables (1016) Master, Transaction, Configuration Tables Views (1); Transparent (925), Pool (86), Cluster (4) Tables ECC Instances Size Analysis Automotive Experience 16TB Building Efficiency 8TB Power Solutions NA 4TB Power Solutions EU 4TB Log Switching Analysis (below) 100 50 0 PS Hourly _00_02_04_06_08_10_12_14_16_18_20_22 60 40 20 0 BE Hourly 00 02 04 06 08 10 12 14 16 18 20 22 ECC Instances Table Analysis Record Count AE BE PSNA PSEU < 50,000 842 835 880 851 50,000 to < 100,000 24 18 25 16 100,000 to < 500,000 40 38 37 53 500,000 to < 1 mil 17 15 14 15 1 mil to < 10 mil 54 56 37 50 10 mil to < 100 mil 18 37 15 26 100 mil to 500 mil 12 11 5 4 500 mil to 1 bil 2 2 3 0 >= 1 bil 7 4 0 1 Total Tables 1016 1016 1016 1016 > 50,000 150 163 111 149 > 1 Million 93 110 60 81 100 50 0 AE Hourly _00_02_04_06_08_10_12_14_16_18_20_22
SDI POC Results Performance Testing on pre-production environment (application and database tiers co-located) Baseline Tests x2 (B1, B2) No tables being replicated to HANA No supplemental logging enabled 94 & 47 tables Test with SDI (T94, T47) Replication to HANA Supplemental logging enabled 100 users running for 100m (5m ramp-up, 90m steady state, and 5m ramp-down) Executed using 4 Load Generators located in Holland (MI, USA), Burscheid (DE) x2, and Monterrey (MX) Log switch rate of ~21 mins, log size ~660 mb Test Response Time CPU Memory IO Resp. Time (ms/op) IO/sec Rows Transactions B1 3.7% 0.7% 0.01%.85 30.52 442,824 37,289 B2 17.7% 6.4% 0.01%.81 29.62 453,459 37,718 T47 8.8% 1.7% 0.01% 1.85* 162.22 456,851 38,036 T94 Lowest Lowest Lowest 1.62* 188.56 451,213 38,154 *Storage Response <8 ms/op is desired
Revised Data Ingestion Framework Incremental Extraction of all data from all business-relevant tables from each source system Hadoop (HDFS/HIVE) Data Warehouse Data Migrations Master Data Hubs Data Scientists Lowest Cost Storage SDI-Based Framework for Oracle, HANA, SQL Server Sqoop-Based Framework for non-traditional databases System Current Tool Future Tool Oracle DBs (SAP, Oracle EBS) MS SQL Server DBs (iscala, FourthShift, Concorde, etc...) Progress DBs (MFGPRO/QAD, Symix) AS400 DBs (MAPICS, Lawson, MACPAC) Hadoop Sqoop, SAP Data Services SAP SDI (December) Hadoop Sqoop SAP SDI (2016) Hadoop Sqoop, SAP Data Services Hadoop Sqoop Hadoop Sqoop Hadoop Sqoop IIDP Update BOA with IDP results Ingestion Process Start Get Metadata Create Hive Table Ingest Update Hive Table End Update HBASE with Ingestion stats Reconciliation Update HBASE with reconciliation results
Summary SAP HANA Smart Data Integration HANA native data integration and data quality Loading/Distributing data into/out of HANA Consumable data quality services for applications Value Realization at Johnson Controls Bottlenecks around data ingestion removed, real-time Slight impact to ERP Systems but benefits > costs Next Steps: SDI implementation (December for SAP ECC) Next Next Steps: SDI Transformation Evaluation (2016)
Summary Steve Carpenter Director, Enterprise Data Center of Excellence Steve.Carpenter@jci.com +14145247582 Ryan Champlin Sr. Director of Engineering, SAP Enterprise Information Management & HANA ryan.champlin@sap.com 608-793-7360
Further Readings and Links Roadmaps Customer Facing Roadmap Site (and how to find EIM Products) http://service.sap.com/saproadmaps NOTE: Go to: Database and Technology -> Enterprise Information Management
Further Readings and Links Product Availability Matrix (platforms, connectivity, etc.) SAP HANA Smart Data Integration: https://service.sap.com/~sapidb/0120025231000190243 22014E
Further Readings and Links Documentation EIM Products: http://help.sap.com/pcat_datamgmt SAP HANA smart data integration & smart data quality: http://help.sap.com/hana_options_eim/
Further Readings and Links BEST EIM Resource Links: http://scn.sap.com/docs/doc-48657 SAP HANA smart data integration and smart data quality: Overview: https://www.youtube.com/watch?v=efuljakmbak&list=plkzo92owknvwq_prea3cxlqjn_v3w0eh5 NOTE: See many additional videos on specific capabilities on the right on YouTube after selecting the above link. A few that might be interesting: Real-time replication from ECC: https://www.youtube.com/watch?v=ecgsb0y7_hi&index=6&list=plkzo92owknvwq_prea3cxlqjn_v3w0 Eh5 Data Cleansing: https://www.youtube.com/watch?v=ntuhhpnzmao&index=24&list=plkzo92owknvwq_prea3cxlqjn_v3 W0Eh5 Geocoding: https://www.youtube.com/watch?v=3njyfnabpja&list=plkzo92owknvwq_prea3cxlqjn_v3w0eh5&ind ex=31