Informatica Proactive Monitoring for PowerCenter Operations Identify and Stop Potential Problems that Put Your Data Integration Projects at Risk WHITE PAPER
This document contains Confidential, Proprietary and Trade Secret Information ( Confidential Information ) of Informatica Corporation and may not be copied, distributed, duplicated, or otherwise reproduced in any manner without the prior written consent of Informatica. While every attempt has been made to ensure that the information in this document is accurate and complete, some typographical errors or technical inaccuracies may exist. Informatica does not accept responsibility for any kind of loss resulting from the use of information contained in this document. The information contained in this document is subject to change without notice. The incorporation of the product attributes discussed in these materials into any release or upgrade of any Informatica software product as well as the timing of any such release or upgrade is at the sole discretion of Informatica. Protected by one or more of the following U.S. Patents: 6,032,158; 5,794,246; 6,014,670; 6,339,775; 6,044,374; 6,208,990; 6,208,990; 6,850,947; 6,895,471; or by the following pending U.S. Patents: 09/644,280; 10/966,046; 10/727,700. This edition published August 2012
White Paper Table of Contents Executive Summary....2 Pre-Empting Threats to Deployments...3 Architecture and Components...4 Real-Time Alerts and Dashboard...5 Additional Responders...8 Preconfigured Monitoring and Alerting Rules...9 Editing and Creating Rules... 10 Other Features.... 13 Conclusion... 14 Appendix.... 15 Informatica Proactive Monitoring for PowerCenter Operations 1
Executive Summary With more than 10,000 implementations of PowerCenter across thousands of customers, Informatica has collected information about the challenges and complexities of deployments. The explosive growth of data within organizations and the increasing demands for timely delivery of that data have given rise to a number of variables in how customers approach their projects. For example, how are mappings made and environments configured? When are sessions/workflows scheduled? Which sessions/workflows run concurrently or dependently? Which best practices are adopted and how they are enforced? Which infrastructure resources are used, and which platforms are run? What is the state of upstream and downstream systems/schemas/ databases/applications? What are the size and skill of technical teams? These variables have introduced threats to Power Center deployments, including: Failed jobs Operational downtime Extreme work hours Lost momentum on new projects because of time spent fixing emergencies Erroneous reports Dependent failures with systems, people, processes Missed service-level agreements (SLAs) Decreased developer efficiency Slower remediation Threats to security Violations of governance and best practices policies Damage to internal reputation and reduced budgets To combat these threats, IT organizations have taken steps, such as adding teams of full-time employees or contractors, monitoring workflows, developing customized monitoring scripts, or some combination of them. Still, the vast majority of customers report that growing demands on them to manage and move increasingly large and fresh data sets continues to increase the prevalence of issues. The resulting complexities have added delays in identifying and resolving issues, compelling IT organizations to shift their strategies from reactive resolution to proactive prevention. Informatica has examined these challenges and worked directly with customers to develop software that improves the uptime and reliability of deployments. Informatica Proactive Monitoring for PowerCenter Operations provides an automated early warning system to protect data integration projects from potential problems, long before they can pose any threat. It augments and accelerates Informatica PowerCenter deployments by identifying and alerting an IT organization immediately of data integration processes that have failed or are in danger of failing. Informatica Proactive Monitoring for PowerCenter Operations ensures the uptime and reliability of PowerCenter in production. For development and testing environments, Informatica Proactive Monitoring for PowerCenter Governance focuses on identifying deviations from best development practices. (For more information on that solution, visit Informatica.com and navigate to the products section.) This white paper discusses the development of the operations offering, its architecture and components, and some details on the alerting rules accompanying it. 2
Pre-Empting Threats to Deployments With Informatica Proactive Monitoring for PowerCenter Operations, IT organizations reduce the risk of data integration process failures and the resulting downstream impact to systems, data warehouses, and reporting activities. When an operations center or developers can proactively monitor and receive alerts on data integration issues, they dramatically increase their entire organization s productivity. This empowers an IT organization to become more agile and responsive with the flexibility to quickly change how and who to alert on different activities. And they can improve processes and deliver on service-level agreements by using patterns, trends, and deviations from the norm to drive adjustments over time. Proactive Monitoring for PowerCenter Operations includes a set of prebuilt alerting rules and templates to enable an IT organization to quickly create and modify rules to monitor PowerCenter workflow and sessions, as well as to correlate performance with different environmental variables. The software watches workflows and sessions actively and with historical insight. It monitors and correlates workflows with such environmental variables as CPU, memory, and table space. Alerts are delivered to the appropriate people as soon as issues arise, mitigating the risk to dependent applications, reports, and systems. Informatica Proactive Monitoring for PowerCenter Operations fully integrates into Informatica PowerCenter and offers the following benefits: Reduces risk of data integration process failures. Proactive Monitoring for PowerCenter Operations minimizes the impact of poorly performing data integration processes on downstream systems and the whole business. It enables an IT organization to handle issues before they become problems: there is no waiting until jobs fail or downstream impacts are identified alerts can be sent proactively to an email address, a Web dashboard, or a mobile phone. Alerts can even kick off business processes or update systems. Enhances IT s productivity, agility, and responsiveness. Proactive Monitoring for PowerCenter Operations empowers an IT organization to respond quickly and nimbly to potential problems. The software lightens the load of the entire team, allowing developers, administrators, and architects to monitor what s important to them by setting up and modifying alerts simple or complex themselves, without additional help. It s easy to change who is alerted on different activities and how people are alerted. Alerting rule templates can be modified using drop-down lists and new rules can be created using wizards. Easy-to-use, self-service functionality enables an IT team to become more agile and adept at monitoring and improving processes. Streamlines data integration processes and delivers on service-level agreements. An IT organization can monitor and correlate workflows, sessions, and environment variables to find patterns, trends, and deviations from the norm that aren t easily detected by current systems. With this unique insight, timely adjustments can be made to avoid problems and deliver on service-level agreements. Informatica Proactive Monitoring for PowerCenter Operations 3
Architecture and Components Informatica Proactive Monitoring for PowerCenter Operations employs the pre-emptive functions of Informatica RulePoint to monitor Informatica PowerCenter data integration deployments. Figure 1 provides a baseline overview of the system s sources and outcomes, with its numbered components described below. 4 3 Solution Architecture PowerCenter SDK PowerCenter Repository Workflow Controls Workflow Statistics Monitor & Alert Real time Workflow and Session Data Proactive Monitoring for PowerCenter Operations (Source Feeds, Rules/Templates, Watchlists, Alerts) 1 Alerts (Dashboards, e-mail, DBs, other systems, etc.) Environment & Correlation Enrichment (other monitors, etc.) OS (CPU, Memory, Disk) DB (Tablespace, Listener) 2 5 Figure 1. Proactive Monitoring for PowerCenter Operations uses RulePoint to monitor PowerCenter deployments. 1. The software typically runs on a separate installation along with its own repository (Oracle, SQL Server, DB2, MySQL), which is optimized for real-time data and alerting. Through a Web browser (Internet Explorer or Firefox), you can access and manage data sources, users, rule writing, alert definitions, and more. 2. Alerts can be delivered to any number of destinations including email, additional databases, systems, and dashboards. The Real-Time Alert Manager (RTAM) is a Web-based, user-specific dashboard that receives, organizes, and prioritizes alerts. These alerts and the dashboard comprise a key feature set of the software and are detailed in the next section. 3. One key prebuilt data source is the PowerCenter repository, which provides a host of metadata on the results of finished workflows. From here, a user can derive statistics about the performance of sessions and workflows, and use that data to measure against jobs recently finished and currently running (data from the PowerCenter SDK). 4. Through the PowerCenter SDK, Informatica provides prebuilt connectors that deliver real-time information about sessions and workflows currently running. This information identifies activities that may be headed for trouble before they cause any damage. Through the API, the software can start, stop, and reschedule workflows for jobs running in the PowerCenter scheduler. 5. The software comes with adapters to track a number of PowerCenter environmental variables, including CPU, memory, and database consumptions. This information can be used to enrich alerts for faster troubleshooting, or alerts can be set on these variables specifically. Your IT team can also build additional listeners or tap into network monitoring tools to enrich the content. 4
Real-Time Alerts and Dashboard With Informatica Proactive Monitoring for PowerCenter Operations, real-time PowerCenter operation alerts can be intelligently distributed through multiple channels, including email or a prebuilt Web application, database, or customized end point (see Figure 2). Alerts may contain embedded links to other systems so that immediate action can be taken to resolve problems. Figure 2. Real-time PowerCenter operation alerts, such as a workflow exceeding a 12-minute SLA, are distributed through multiple channels. Informatica Proactive Monitoring for PowerCenter Operations 5
Built into the software s interface is the Real-Time Alert Manager (RTAM), an easy-to-use, Web-based, real-time alert dashboard that organizes and prioritizes the operation alerts. From this dashboard, which requires no customization, an IT team can respond to an alert immediately by remediating the issue directly, forwarding the case for resolution, or creating a trouble ticket. RTAM uses its own repository and is optimized for delivering and displaying alerts to a wide variety of users. It is modeled to operate in a familiar manner similar to popular email clients. Figure 3 illustrates the RTAM with its numbered components described below. Figure 3. The Real-Time Alert Manager dashboard organizes and prioritizes the operation alerts. 1. RTAM supports access controls and manages unique users. Each user can be exposed to only a small subset of alerts or all of them, as in the case of user pcmonitor in Figure 3. Alerts can be delivered to rolebased user groups as well. User authentication can be plugged into LDAP or managed with the software s authentication module. The user can also specify some preferences related to the color to which a priority alert is assigned. 2. To make it easier to manage and act on pertinent information, alerts can be organized into channels and folders, which can be easily customized and dynamically generated as part of a response. In Figure 3, we see channels for design time, run time, and live run time alerts. 3. The alert list pane contains a list of alerts, typically ordered from newest to oldest. In addition to a text search field, the various filtering capabilities include alert subject, created date, age, and priority. 4. The alert detail pane provides additional information that can range from very simple details to richly formatted HTML. A response can be configured to contain information about the alert and furnish any level of additional content. From here, your IT team can reprioritize, delete, forward to other users, or take a variety of actions. Actions can include emailing, creation of a trouble ticket, or just about any other customization. 6
In addition to the RTAM dashboard, Informatica Proactive Monitoring for PowerCenter Operations provides rule-activation dashboards that display the rules activated over various timeframes. These are displayed in RulePoint, the software s interface for overall administration (see Figure 4). Figure 4. The Real-Time Alert Manager dashboard shows alert activations over time and by category. Informatica Proactive Monitoring for PowerCenter Operations 7
Additional Responders RTAM is only one of the alerting delivery points supplied with Informatica Proactive Monitoring for PowerCenter Operations. Through minor configuration, the software offers a variety of other ways to deliver alerts to additional destinations. Several prebuilt supplementary responders can be made available using the SDK (see Figure 5). Figure 5. Additional responder services can be enables using Web-based prebuilt options. Prebuilt responders include: Email service. Configured by pointing to a mail server; it delivers alerts with similar content to the RTAM dashboard. Email can be text or HTML. Event publisher, event recorder, and event transformer. Used to create and process alerts for additional internal operations. For example, instead of an alert being sent to someone, it might be sent and held in the system for further processing. This enables the creation of multiple simple alerts that can be used to watch for composite events. File output. Writes to a flat file that can easily be opened in Excel or imported into a database table. HTTP service. Enables a rule to be written to fire an alert with contents via HTTP. Instant messaging. Delivers alerts to a Jabber IM client or to an XMPP server. SQL responder. Allows for any alert and supporting information to be inserted via SQL into a database table for use with an organization s own reporting/dashboard tools. Web service responder. Delivers a formatted SOAP message with any alert information. 8
Preconfigured Monitoring and Alerting Rules There are more than 25 prepackaged rules and templates included with Informatica Proactive Monitoring for PowerCenter Operations (see Appendix for a complete list). Templates are prebuilt rules that are easily customizable. Rules span from monitoring workflows that are currently running to identifying problems with finished workflows. A rule automatically and proactively identifies issues and then delivers an alert to the appropriate group (e.g., managers, admins, developers) or to a single recipient. You can choose from a variety of prebuilt rules: Correlate session changes to workflow failures. This rule watches for two different activity types --developer work and operational data integration processes. When a workflow failure occurs, this rule checks to see if the workflow has recently changed and alerts the appropriate people to diagnose the root cause and remediate. Identify and restart zombie session. A zombie process is one that seems to be living but doesn t actually do anything productive. For example, a session might be running for an hour but for the past 15 minutes is processing very few or no rows. You can use this flexible rule to specify what a low watermark is (how many minutes go by processing how few rows) to generate an alert and automatically restart the workflow. Repository service database space exceeded threshold value. This rule monitors environmental variables like CPU, memory, and table space to identify potential process failures. identify potential process failures. Prebuilt alerting rules and templates include: Workflow/session was successful but zero (0) records loaded Rejected records in session Session/workflows failures SLA violations: workflow/folder/repository level Significant increase in workflow/session elapsed times compared to recent averages Significant decrease in applied row count Too many concurrent workflows Too many errors in the same workflow within a few minutes Workflows missing schedules Ping domain/integration service/repository service Integration service node with high CPU or memory usage CPU or memory usage for running process on integration service node exceeds threshold value Your IT organization can develop new alerting rules as long as they pertain to monitoring the health of their PowerCenter environment and jobs. The easy-to-use Web interface (along with ample documentation) helps your team leverage packaged prebuilt sources or specify new ones and then to write alerting rules using any of the rule-writing modes. In addition, prebuilt alerting rules and templates include: Correlate session changes to workflow failures. This rule watches for what seems to be two different activity types (developer work and operational data integration processes). When a workflow failure occurs, this rule also checks to see if there were recent changes to the workflow. An alert notifies the appropriate people that this issue could be related to a recent change, helping them to faster diagnose the root cause and remediate. Identify and restart zombie session. A zombie process is one that seems to be living but doesn t actually do anything productive. For example, a session might be running for an hour but for the past 15 minutes Informatica Proactive Monitoring for PowerCenter Operations 9
is processing very few or no rows. This is a flexible rule that users can easily edit to specify what a low watermark is (how many minutes go by processing how few rows) to not only generate an alert but also automatically restart the workflow. Repository service database space exceeded threshold value. The above three are examples of rules that monitor environmental variables like CPU, memory, and table space. These can provide significant help and early warning for risks of process failures. License allows limitless rules. These rules allow organizations to develop any number of new alerting rules as long as they pertain to monitoring the health of their PowerCenter environment and jobs. The easy-touse Web interface (along with ample documentation) helps users to leverage packaged prebuilt sources or specify new ones and then to write alerting rules using any of the rule-writing modes. Integration service node with high CPU or memory usage CPU or memory usage for running process on integration service node exceeds threshold value Editing and Creating Rules Informatica Proactive Monitoring for PowerCenter Operations gives IT organizations significant control not only to edit and tune existing alerting rules but also to create new ones to best fit their environments. There are three rule-writing modes in which to edit or create rules. Template Mode The template mode is the simplest mode and is used for tuning or making changes to existing rules. Any rule written in wizard or advanced mode can be turned into a template, which exposes only the variables that a user wants to alter. For example, a template can allow a user to easily change the parameters of an alert triggered when a session runs 10 percent longer than the average of the last 10 sessions (see Figure 6). This mode hides any complex logic and makes it very simple to perform quick changes or create additional rules from this template. Figure 6. Template mode is the simplest mode and is used for tuning or making changes to existing rules. 10
Wizard Mode The wizard mode walks users step by step through creating a new rule. For example, a user can name and describe the rule, determine which topic or topics to use (data sources), which conditions to set (rules and patterns), and then how to respond when particular patterns are identified (see Figure 7). Figure 7. Wizard mode walks users step by step through creating a new rule. Informatica Proactive Monitoring for PowerCenter Operations 11
Advanced Mode The advanced mode enables users to create a rule manually using the Detect and Respond Query Language (DRQL). (For more information on DRQL, please consult product technical documentation.) Figure 8 illustrates an example of a rule created with DRQL that identifies illegal commands from a watch list and generates alerts. Figure 8. Advanced mode enables users to create a rule manually using DRQL. 12
Other Features In addition to real-time PowerCenter operation alerts, the RTAM dashboard, the creation of monitoring rules, and the application of rule templates, Informatica Proactive Monitoring for PowerCenter Operations includes other robust features. Watch Lists Watch lists are named collections of information that can be referenced within a rule and dynamically changed by rules. For example, if you want to watch for keywords that can represent security threats, you can have the rule reference a watch list; you don t need to update the rule. Services Services enable users to configure additional source services along with analytic services and responder services. Data sources from the PowerCenter repository and SDK are considered source services. Audit Trail This important feature stores all executed alerts in a table that can be easily audited using existing reporting tools or Microsoft Excel. Administration The product includes an administration console through which users can be authenticated (if not through an LDAP system), user policies can be managed, and a host of information (including logs) can be found. Informatica Proactive Monitoring for PowerCenter Operations 13
Conclusion About Informatica Informatica Corporation (NASDAQ: INFA) is the world s number one independent provider of data integration software. Organizations around the world rely on Informatica for maximizing return on data to drive their top business imperatives. Worldwide, over 4,630 enterprises depend on Informatica to fully leverage their information assets residing on-premise, in the Cloud and across social networks. The explosive growth in data and subsequently increasing demands for its timely delivery in the enterprise has resulted in wide variability in how organizations approach their data integration projects. This variability riddles deployments with errors, inefficiencies, and security threats whose remedies, in turn, only exacerbate the existing issues by increasing system complexity. As a result, demand has grown for a pre-emptive approach to diffusing threats before they erupt into full-blown issues. Using the automated early warning features of Informatica Proactive Monitoring for PowerCenter Operations, IT organizations can drastically reduce the risk of data integration process failures and the resulting downstream impact to systems, data warehouses, and reporting activities. When operations centers or developers proactively monitor and receive alerts on data integration issues, they can dramatically increase their entire organization s productivity. The software offers the following benefits: Reduces risk of data integration process failures Enhances IT s productivity, agility, and responsiveness Streamlines data integration and delivers on SLAs Informatica Proactive Monitoring for PowerCenter Operations empowers IT organizations to become more agile and responsive with the flexibility to quickly change how and whom to alert on different activities. And by using patterns, trends, and deviations from the norm to drive adjustments over time, they can greatly improve deployments and deliver on servicelevel agreements. 14
Appendix This is the shipping list of rules and templates as of April 2012. There may be additions to the actual current package as well as a host of advanced rules (such as Daily Alert Reporting). To get an up-to-date list of alerting rules, email cepsales@informatica.com. Process Running on Integration Service Node is with more than X percent CPU usage CPU Usage of Node running Integration Service is more than X percent Memory Usage of Node running Integration Service is more than X percent Process Running on Integration Service Node is having more than X percent memory usage PowerCenter Scheduled Workflow did not start X minutes after scheduled time PowerCenter Scheduled Workflow missed schedule by X minutes due to running status PowerCenter Session Elapsed Time X percent greater than recent average PowerCenter Session Failed after Session Saved within X minutes PowerCenter Session Loaded Rows X percent lesser than recent average PowerCenter Session Running Time greater than X percent of average of previous X runs PowerCenter Session Exceeds Repository SLA by X minutes PowerCenter Skewed Session to Workflow elapsed time ratio less than X PowerCenter Concurrent Workflows exceeds X (most often 2) PowerCenter 3 errors in same workflow within X minutes PowerCenter Workflow Elapsed Time greater than X percent of recent average of X runs PowerCenter Workflow Load Mobile records running more than X minutes PowerCenter Workflow Running Time greater than Proactive Monitoring SLA by X minutes PowerCenter Workflow Running Time greater by X percent than recent X averages PowerCenter Workflow Running Time greater than Repository SLA by X minutes PowerCenter Zombie session alert rule for X minutes Repository Service Database Table Space usage is more than X percent PowerCenter Session Exceeds Proactive Monitoring SLA by X minutes Informatica Proactive Monitoring for PowerCenter Operations 15
Worldwide Headquarters, 100 Cardinal Way, Redwood City, CA 94063, USA phone: 650.385.5000 fax: 650.385.5500 toll-free in the US: 1.800.653.3871 informatica.com linkedin.com/company/informatica twitter.com/informaticacorp 2012 Informatica Corporation. All rights reserved. Printed in the U.S.A. Informatica, the Informatica logo, and The Data Integration Company are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners. IN09_0812_02008