MONITORING A WEBCENTER CONTENT DEPLOYMENT WITH ENTERPRISE MANAGER Andrew Bennett, TEAM Informatics, Inc. Why We Monitor During any software implementation there comes a time where a question is raised about how mission critical the new system will be. Regardless of the answer, there are always a number of key discussions that need to be had. Typically, these discussions concern backup and recovery time, high availability, and disaster recovery. What is interesting about this process (and rightly so) is that before a single user has jumped onto the new system, we are already thinking about worst case scenarios and how best to mitigate damage. Unfortunately, ours is not a forgiving business; for every minute a system is down (or operating below expectations), someone, somewhere is asking why. Additionally, the person asking why is probably being negatively affected, and will start escalating their concerns (either to a manager, or worse to a public forum). The end result is that the people who designed and built the system are now being called into a very uncomfortable meeting to provide an explanation. Without metrics or other empirical evidence, the actual cause may not be apparent, which can result in guess work, painful sifting through logs (looking for a needle in a haystack), or straight up finger pointing to re-assign blame. As implementers, we bear witness to these events more than we would care to admit (even if this has only happened once, it is one too many). The depressing part is that 90% of the time, the causes of these unexpected issues with a system should be identified well before they get to the point of failure. The reason they are not is a lack of monitoring. While planning for worst case scenarios is a necessary part of software implementation, it seems that active prevention can be somewhat of an afterthought. There is a saying: an ounce of prevention is worth a pound of cure. In the IT world truer words are seldom spoken. Understanding a system s vital signs and knowing how to interpret them is key to running a stable, high-functioning software environment. This is not always a straight forward task, a lot of the time, monitoring only the obvious things (disk space, memory usage, CPU, etc) is not enough. There are so many internal and external influences that affect our systems and it can be hard to keep track of them all. What is of utmost importance though is to evaluate the system as a whole first, then
break it down into the sum of its parts. With a clear understanding of each component that goes into making up the complete system, gauges and high water marks can start being put into place to monitor activity and performance. The goal being twofold: 1. Issues are addressed before they become apparent to the user community 2. In the much rarer case (with monitoring enabled) that something does go wrong, diagnosis is already well underway and answers will be much more forthcoming and backed by real-time data. What to Monitor The WebCenter Content system contains a number of different parts and links to other systems and we need to be monitoring all of these in order to have a comprehensive view of the health of any deployment. The most important components are: Operating System Application Server (WebLogic Server) Java Virtual Machine (JRockit, HotSpot, or other supported JRE) Database User Directory File System Each of these is examined in more detail below Operating System Enterprise Manager has limited capabilities in this case, unless the system monitoring plug-in for hosts is installed. When these are enabled and configured, Enterprise Manager has the ability to monitor OS logs, files and directories (for growth and permissions issues), CPU and I/O utilization, etc. Another important capability is configuration management and other Host management options. If the underlying system OS is Oracle Linux, then the available management pack can also provide provisioning, patching, and other centralized configuration management tools.
Application Server Enterprise Manager 11g is installed by default whenever a new domain is created for Oracle Fusion Middleware. The initial configuration is bare-bones, but the monitoring capabilities of EM with regard to WebLogic Server are powerful. Within a domain running WebCenter managed servers, key metrics include: JVM Cluster, server, and machine health Access and monitor logs Transaction tracing across containers and to other systems such as the database and LDAP Configuration management JVM statistics (see below) Database statistics (see below) EM monitors JVM statistics as part of the Application Server monitoring (as the JRE is part of the App Server deployment). There is a wide range of measurements available, but in our experience the most important are: Heap size and percentage used Garbage collection frequency and statistics Thread info and statistics Out of memory errors Database The database manages metadata indexing and (if used) full text indexing as well. A number of system counters are also managed in tables, too. The easiest integration for EM is obviously the Oracle database, but Enterprise Manager has integrations with other databases available (MS SQL, DB2 and Sybase). Obviously there are many, many aspects of database health monitoring available in EM and that discussion is long and detailed. However, for our purposes, the most important aspects to measure are: CPU and memory usage on the database systems Tablespace and log file usage
SQL monitoring to discover long-running or poorly crafted queries Real-time performance diagnostics Health of connection pools User Directory / LDAP Again, the simplest integration is with Oracle tools, but packs are also available for MS Active Directory. Because user experience is often tied to authentication and authorization delays, monitoring of these connections is very important and can yield valuable tuning and diagnostic data. As WebCenter no longer has the concept of internally-managed users, all users are now authenticated and authorized against directories and it is always recommended that external directory servers are used for enterprise deployments. Important measures include: Health and availability of directory server(s) CPU utilization Statistics on failed and successful authentications / authorizations (can also be used as a flag to trigger security analysis) Average response time (very important for user experience) Transaction analysis File System Oracle has management packs available for its own servers and storage, but there are also plug-ins available for NetApp Filers, EMC Celerra, Symmetrix and Clariion, HP StorageWorks, IBM storage, NEC Storage, Pillar Axiom, and Veritas cluster server and storage foundation. Obviously data capacity and performance are vital to successful running and performance within WebCenter systems the most important measures are: Storage system status and health Filesystem utilization (with triggers for high utilization) Critical event monitoring NFS/CIFS calls per second and average throughput Physical disk and LUN statistics
Additional Areas to Monitor One area not covered in the above lists is network devices and EM also has plugins available for devices such as the f5 BIG-IP system and Brocade ServerIron appliances. Important measures to complete the broad picture of system and application health and performance include: Network device status Switch activity (again may be a trigger for security as well as performance) Pool connections for load balancing Traffic statistics Monitoring Tool Available for Oracle WebCenter Content There are several different graphical user interfaces out of the box that can be used to monitor and manage Oracle WebCenter Content; Oracle WebCenter Content Admin Server, Oracle WebLogic Server Console and Enterprise Manager Fusion Middleware Control. Oracle WebCenter Content Admin Server is a collection of web pages that enable you to configure system-wide settings, view Content Server logs, and view and configure system audit information. Oracle WebLogic Server Administration Console can be used for a myriad of configuration, management and monitoring tasks. These tasks include starting and stopping the managed servers, creating and managing users, configuring and deploying Java EE applications, viewing server and domain log files, and monitoring the managed server, deployments, and application performance. Enterprise Manager Fusion Middleware Control is used to monitor and administer a farm. A farm is a collection of components usually consisting of an Oracle Instance and\or an Oracle WebLogic Server Domain. An Oracle instance contains one or more system components, such as Oracle Web Cache, Oracle HTTP Server, or Oracle Internet Directory. An Oracle WebLogic Server domain contains one Administration Server and one or more Managed Servers. The Fusion Middleware Control is deployed to Administration server and the managed server contains components such as Oracle WebCenter Content. Fusion Middleware Control can monitor a variety of performance metrics for Oracle WebCenter Content. The default metrics shown on the content page are:
Active Threads: The number of active threads. If active threads get too high it can potentially cause slowness when accessing data. It is beneficial to monitor this metrics in order to determine the range where WebCenter Content operates effectively. Active Database Connections: The number of active database connections made by the WebCenter Content Server instance. Where possible, keep the number of database connections at a minimum. The default is 15 and should be acceptable, however, under heavy loads this may need to be changed. It is beneficial to monitor this metric to ensure it correct and\or high enough for the environment. Search Queries Cached: The number of search queries cached (rows). When queries are cached it can improve search performance by reducing the search time, however, it increases memory usage. It is beneficial to monitor this metric to ensure there is a good balance between search performance and memory performance. It may also help determine if more memory is needed. Hit to Miss Ratio: The hit to miss ratio for the number of search queries performed. Documents in GenWWW State: The number of documents waiting to be converted to web-viewable format. It is beneficial to monitor this metric to ensure Inbound Refinery is functioning properly (if used), there are no network issues between WebCenter Content and IBR, and\or WebCenter Content is not having issues copying file to the weblayout (if IBR is not used). Documents Waiting to be indexed in Done State: The number of documents waiting to be indexed. It is beneficial to monitor this metric to ensure the indexer, whether it be database fulltext or metadata, or another third-party, is functioning as it should. Average Requests per Sec: The average number of Services requested per second. It is beneficial to monitor this metric to monitor if the number increases and determine if more memory is needed or if it coincides or due to the active thread increase. To view interactive charts of these metrics go the Performance Summary page within EM. The Performance Summary Page contains a set of performance charts that show the values of the seven default performance metrics that are specific to WebCenter Content. Other metrics can be added or removed from the Performance page and the charts can be customized to help isolate potential performance issues by correlating and comparing specific metric data.
Below is a screenshot of the Performance Summary screen The chart above doesn t show much activity at this time. During normal business hours the charts may show higher numbers than the example above and probably show some spikes. Monitoring these metrics over several days\weeks should give a good baseline for how WebCenter Content should perform. To correlate metric data from day-to-day or week-to-week overlay the previous day's performance information on top of the current day's data. This will help identify patterns and potential problems when the data varies from the normal ranges. Below is a screenshot of when the entire Farm is healthy with all charts showing green.
Below is a screenshot of when an instance is down or not functioning properly. The servers or instances that are down or not functioning properly are depicted in red.