Marco Lehmann Technical Sales Professional Integrating Netezza into your existing IT landscape 2011 IBM Corporation
Agenda How to integrate your existing data into Netezza appliance? 4 Steps for creating a trusted Warehouse. Enabling Netezza for multiple site desaster recovery
How to integrate your existing data into Netezza appliance?
Information Management How will you integrate your silos of information with this appliance? FRONT LINE / BI APPLICATIONS OLAP InfoSphere Information Server IBM Netezza INFORMATION INTEGRATION / DATA QUALITY / ETL / REAL-TIME SOURCE SYSTEMS, DATA MARTS, SILOS 4 M CO N MO M A AT D A ET
Key hurdles in creating & deploying a Warehouse and Business Intelligence environment 5 Defining Business Analytics and Its Impact on Organizational Decision-Making, February 2009, ComputerWorld
Information Server integrates siloed data with Netezza for trusted decisions Network Integration Data Quality Netezza Data Warehousing Solutions Channels Divisional customer Billing Billing analysis Buyers Integration ETL Data Quality Advanced Discovery Reusable Mapping & Blueprinting Comprehensive Data Lineage Application Connectivity Packs Integrated Change Data Capture Scalable transformation & data delivery Source /Target analysis Enterprise class scalability Flexible and accurate matching with Probabilistic matching engine Customizable out-of-box integrated rule sets 6
3 Reasons why you need Information Server + Netezza 1. Lower total cost of ownership Reusable components for scalable development efficiencies Support for automated comprehensive data analysis High performance parallel engine for rapid iterations Deliver self service for business users by increasing transparency in data understanding and provenance 2. Accelerate deployment & time to value Highly visual & automated development environment Best practices & methodologies ensure project success Exhaustive pre-built connectivity User centric tooling for Business and IT Collaboration 3. Increase LOB trust & confidence Automated data quality monitoring and cleansing Consistent understanding of enterprise vocabulary with business term definition and documentation Know where information comes from with data lineage 7
4 Steps for creating a trusted Warehouse.
Information Management So how does Information Server with Netezza actually work? Understand Cleanse Transform & Deliver FRONT LINE / BI APPLICATIONS OLAP InfoSphere Information Server IBM Netezza INFORMATION INTEGRATION / DATA QUALITY / ETL / REAL-TIME SOURCE SYSTEMS, DATA MARTS, SILOS 9 M CO N MO M A AT D A ET
Understand the Process to build a Data Warehouse Understand Cleanse Transform & Deliver Increase project success rates: Initiate warehouse integration projects from trusted blueprints Reusable IBM best practices and methodology Reference architectures Tunable for your environment Task management IBM expertise built right into an automated data warehouse template for guiding your project Vision, Execution, Completion 10
Understand Data Sources, Lineage and Business Terms Understand Cleanse Transform & Deliver Speed time to deployment Industry s most advanced data discovery capabilities accelerate technical discovery by 10x Column analysis Fully automated primary-foreign key discovery Cross-source overlap analysis Cross-source transformation discovery Prototyping of data consolidation Modeling tools and industry data models accelerate modeling of the data warehouse Lower development, enhancement and audit costs Achieve consistent understanding of enterprise business vocabulary Transparency into data lineage and change management enables self service and faster, more cost effective updates Cross-tool impact analysis Business and technical data lineage 11
Workflow 1 Start from consistent blueprint, leveraging best practices 2 4 6 Create new transformation rules & document Document KPIs and associated business terms 3 Create or Modify Data Model 5 Identify where KPI s exist, relationship and level of quality 7 Report & Govern on metadata assets Assess & monitor your data quality over time
Cleanse Data going into the Warehouse Understand Cleanse Transform & Deliver Increase trust in your data warehouse information: Best in class data cleansing capabilities help load data warehouse with clean data upfront Name and address cleansing Global name recognition Probabilistic and deterministic algorithms Ongoing Data Quality monitoring Monitor data quality and data relationship quality with customized rules Exception management for reviewing and management the data exception lifecycle 13
Transform and Deliver data into your Netezza in a timely fashion Understand Cleanse Transform & Deliver Deliver even faster time to value: Increase developer efficiency Top down design Highly visual dev environment Enhanced collaboration through design asset reuse Iterate quickly and often High performance delivery options with flexible deployments Support for multiple delivery styles: ETL, ELT, Change Data Capture, SOA integration etc. High performance, parallel engine Rapidly integrates your existing and future environments Exhaustive pre-built connectivity Pre-integrated with Netezza Build once and scale with your hardware requirements 14
So Why Information Server? InfoSphere Information Server Integrating and transforming data and content to deliver accurate, consistent, timely and complete information to your Netezza Warehouse appliance Information Server is better Single integration platform with unified metadata The Difference True parallel framework - Design once, Deploy Anywhere Information Server is faster Automated Data Discovery Data integration planning & methodologies Information Server does not introduce risk The power of one vendor, one team Long history of proven warehouse deployments with Netezza 15 Information Server is extremely cost effective Lower cost, simplified packaging & deployment Supports warehouse growth
Enabling Netezza for multiple site desaster recovery and high availability
Common Requirements for Replication and Data Distribution High availability and continuous access to DW and BI applications Disaster recovery solution Scalable infrastructure supporting: Growing user population Higher levels of concurrency Wider geographic coverage for distributed users Data transformation between heterogeneous systems and data models (ODS, OLAP, OLTP, EDW) 17
IBM Netezza Replication Phase 1 Architecture Geographically wide asynchronous replication focused on: Disaster recovery Reporting scalability Replication method: SQL statement-level replication: Replication of load files SQL statements replay for DML/DDL (inserts, updates, deletes) /DCL The advantages of this approach: Low bandwidth requirements Minimal-to-none performance impact on production queries
Phase 1 Architecture: Geographically Wide, Asynchronous Replication Target BI & EDW Applications PTS BI & EDW Applications Master LAN Files Loads WAN ETL System LAN PTS Target BI & EDW Applications Trickle Feed Updates PTS LAN PTS Persistent Transport System
Persistent Transport System (PTS) External server collocated with every node in replication cluster PTS has three major purposes: Move data/files from one node to another Send control messages from one node to another Act as a persistent store for recovery from failures Automatic copy of data and sync from master to target PTS management software distributed to the server directly by Netezza host
Phase 1: Additional Features and Considerations Supported platforms: Netezza TwinFin (now known as IBM Netezza 1000) Remote site initialization using Truck mode Using full backup from the master to initialize targets nzsql command line interface for management and monitoring Replication granularity: a single database Manual selection of new master in case of failure Client-based load balancing across replication cluster WAN bandwidth dictated by latency and load requirements
Phase 1 Use Case: Disaster Recovery ETL and micro-batch loads into production data warehouse Remote site used for disaster recovery; optionally for test & dev Failover to remote data center manually controlled process Users able to access data during maintenance windows During temporary power outage no automated fail-over is required
Beyond Phase 1: The Future for Replication and Distribution Bi-directional replication Support multiple writers (masters), in separate databases with no conflict resolution Local High Availability Cluster Queryable archive and uniform data access Provide DR and archival facilities while maintaining access to the data Replication between IBM Netezza 1000 and IBM Netezza High Capacity Appliance Data transformation between heterogeneous systems and data models (ODS, OLAP, OLTP, EDW) into and from IBM Netezza Utilizing IBM s InfoSphere Change Data Capture (CDC) Improved and integrated user interface
IBM Netezza Roadshow am 1. Dezember 2011 im KochWerk in Frankfurt am Main. ibm.com/software/de/data/netezza/ 25 Marco Lehmann marco.lehmann@de.ibm.com Telefon: +4915115162301