IBM SmartCloud Analytics - Log Analysis Anomaly App Version 1.2
IBM SmartCloud Analytics - Log Analysis Anomaly App Version 1.2
Note Before using this information and the product it supports, read the information in Notices, on page 7. Edition notice This edition applies to IBM SmartCloud Analytics - Log Analysis and to all subsequent releases and modifications until otherwise indicated in new editions. References in content to IBM products, software, programs, services or associated technologies do not imply that they will be available in all countries in which IBM operates. Content, including any plans contained in content, may change at any time at IBM's sole discretion, based on market opportunities or other factors, and is not intended to be a commitment to future content, including product or feature availability, in any way. Statements regarding IBM's future direction or intent are subject to change or withdrawal without notice and represent goals and objectives only. Please refer to the developerworks terms of use for more information. Copyright IBM Corporation 2014. US Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents Chapter 1. About this publication.... 1 Audience............... 1 Publications.............. 1 Accessing terminology online........ 1 Accessibility.............. 1 Tivoli technical training........... 1 Providing feedback............ 2 Conventions used in this publication...... 2 Typeface conventions.......... 2 Chapter 2. Anomaly app........ 3 Configuring the Anomaly app training..... 3 Running model training.......... 4 Appendix. Notices.......... 7 Trademarks............... 8 Copyright IBM Corp. 2014 iii
iv IBM SmartCloud Analytics - Log Analysis: Anomaly App
Chapter 1. About this publication This guide contains information about how to use IBM SmartCloud Analytics - Log Analysis. Audience Publications This publication is for users of the IBM SmartCloud Analytics - Log Analysis product. This section provides information about the IBM SmartCloud Analytics - Log Analysis publications. It describes how to access and order publications. Accessing terminology online The IBM Terminology Web site consolidates the terminology from IBM product libraries in one convenient location. You can access the Terminology Web site at the following Web address: http://www.ibm.com/software/globalization/terminology. Accessibility Accessibility features help users with a physical disability, such as restricted mobility or limited vision, to use software products successfully. In this release, the IBM SmartCloud Analytics - Log Analysis user interface does not meet all accessibility requirements. Tivoli technical training Accessibility features This information center, and its related publications, are accessibility-enabled. To meet this requirement the user documentation in this information center is provided in HTML and PDF format and descriptive text is provided for all documentation images. Related accessibility information You can view the publications for IBM SmartCloud Analytics - Log Analysis in Adobe Portable Document Format (PDF) using the Adobe Reader. IBM and accessibility For more information about the commitment that IBM has to accessibility, see the IBM Human Ability and Accessibility Center. The IBM Human Ability and Accessibility Center is at the following web address: http://www.ibm.com/able (opens in a new browser window or tab) For Tivoli technical training information, refer to the following IBM Tivoli Education Web site at http://www.ibm.com/software/tivoli/education. Copyright IBM Corp. 2014 1
Providing feedback We appreciate your comments and ask you to submit your feedback to the IBM SmartCloud Analytics - Log Analysis community. Conventions used in this publication This publication uses several conventions for special terms and actions, operating system-dependent commands and paths, and margin graphics. Typeface conventions This publication uses the following typeface conventions: Bold Italic v v v v v Lowercase commands and mixed case commands that are otherwise difficult to distinguish from surrounding text Interface controls (check boxes, push buttons, radio buttons, spin buttons, fields, folders, icons, list boxes, items inside list boxes, multicolumn lists, containers, menu choices, menu names, tabs, property sheets), labels (such as Tip:, and Operating system considerations:) Keywords and parameters in text Citations (examples: titles of publications, diskettes, and CDs Words defined in text (example: a nonswitched line is called a point-to-point line) v Emphasis of words and letters (words as words example: "Use the word that to introduce a restrictive clause."; letters as letters example: "The LUN address must start with the letter L.") v New terms in text (except in a definition list): a view is a frame in a workspace that contains data. v Variables and values you must provide:... where myname represents... Monospace v Examples and code examples v File names, programming keywords, and other elements that are difficult to distinguish from surrounding text v Message text and prompts addressed to the user v Text that the user must type v Values for arguments or command options 2 IBM SmartCloud Analytics - Log Analysis: Anomaly App
Chapter 2. Anomaly app Configure the anomaly app so that IBM SmartCloud Analytics - Log Analysis users can use it to help them to identify and investigate anomalies in the log data. Before IBM SmartCloud Analytics - Log Analysis users can use the app, you must configure the Anomaly app for your installation of IBM SmartCloud Analytics - Log Analysis After you configure the app, you must contact a data scientist. The data scientist runs the data model training. When the data scientist runs the training, the app creates a baseline of historical data. This baseline is modeled as a histogram that is based on the data that is ingested by IBM SmartCloud Analytics - Log Analysis during the time period that is required by the model. The app supports two statistical models for data analysis, the mean-variance model and the auto-regression model. After the app is configured and the model training is complete, IBM SmartCloud Analytics - Log Analysis users can use the app to identify anomalous events. The app uses historical data to detect deviations from normal system behavior. This behavior is derived from time-stamped text data from logs or events that are indexed in IBM SmartCloud Analytics - Log Analysis. It does not use structured performance measurement data. To open the app, log in to IBM SmartCloud Analytics - Log Analysis. After the configuration and model training are complete, the app is displayed under Custom apps. The app displays a histogram for each data source. The app calculates the anomalies for each data source type by comparing current data to the baseline that the data scientist created. If all the data sources for a specified data source type contain anomalous events, the model name is displayed in red. If all the trained models for a data source declare that the current data to be anomalous, the model name is displayed in yellow. If all the trained models for a data source declare that the current is not anomalous, the name is displayed in green. Configuring the Anomaly app training Before you can use the Anomaly app, you must configure it so that it works with your installation of IBM SmartCloud Analytics - Log Analysis. Before you begin v Extract the anomalyapptraining.tar file from the <HOME>/AppFramework/CLIApps/ directory to a suitable directory, for example <AnomalyAppTraining>. v To use this app, you must install R and a required Java add-on, rjava version 0.95. R is a software environment for statistical computing. To install R and the required add-on, complete the following steps: 1. Create a folder that serves as the installation folder, referred to here as <R>. 2. Download the following packages to the new folder: R-3.0.2.tar.gz from http://cran.rstudio.com/src/base/r-3/r-3.0.2.tar.gz Copyright IBM Corp. 2014 3
rjava_0.9-5.tar.gz from http://www.rforge.net/src/contrib/rjava_0.9-5.tar.gz 3. Copy the installr.sh script that you extracted to the <AnomalyAppTraining> folder in the first prerequisite to the <R> directory. The installr.sh file is located in the <AnomalyAppTraining>/bin directory. 4. To add the location of your IBM SmartCloud Analytics - Log Analysis installation to the installr.sh script, edit the UNITY_HOME parameter. 5. Run the installr.sh command from the <R> directory. The command creates a folder that is called <R>/R-3.0.2 referred to here as <R_HOME>. R requires specific prerequisite packages. For more information, see: v http://cran.r-project.org/doc/manuals/r-admin.html#simple-compilation v http://cran.r-project.org/doc/manuals/r-admin.html#essential-and-usefulother-programs-under-a-unix_002dalike The installr.sh script assumes that the prerequisites for R, except for the X11 and Redline packages, are already installed. If these prerequisite packages are not installed, an error message is displayed during the installation that lists the missing packages. If this situation arises, install the missing packages, delete the <R_HOME> directory on your installation, and start the installation again. Procedure Open the env.sh script in the <AnomalyAppTraining>/bin directory and add values for the following variables: <UNITY_HOME> Add the path to the directory where IBM SmartCloud Analytics - Log Analysis is installed. For example, /home/user/ibm/loganalyticsworkgroup. <R_HOME> Add the path for the <R_HOME> directory that you created as a prerequisite. Results Running model training After you configure the training, a data scientist must run the model training. After you configure the Anomaly app, a data scientist runs the model training. Before you begin Before the data scientist can run the model training, the IBM SmartCloud Analytics - Log Analysis administrator or user must configure the training. See Configuring the Anomaly app training on page 3. To complete the procedure, the reader needs both the domain and statistical knowledge that is provided by a specialized data scientist. The procedure is only for use by data scientists. Procedure 1. In the AnomalyAppTraining/config/trainingFiles folder, create one folder for each source type that you want to use with the app. Use the name of the 4 IBM SmartCloud Analytics - Log Analysis: Anomaly App
source type as the name of the folder. For example, the AnomalyAppTraining/ config/trainingfiles/wassystemout folder is created by default for the WASSystemOut source type. 2. In the source type folder, create one or more training files for the corresponding source type. Use the sample files that are stored in the AnomalyAppTraining/ config/trainingfiles folder as a reference. Specify the following values in the training file: Filter expression Use the training file to define a filter expression that is applied to the log data stream. The training file changes the log data stream in to an event stream. For example: severity=e, Node=MyMachine Time window Specify a time window during which events from the log data stream are counted for the data source. Model to be used Select either a mean variance or regression analysis-based statistical model. Model-specific training parameter Specify the parameters for the chosen statistical model. For example, if you use a mean variance-based model, you must specify a value for the Acceptable_P_Value parameter that used to create the mean variance model. 3. Run the model training. To run the training file, run the run.sh command that is in the AnomalyAppTraining/bin/ folder. The command contains two optional parameters that you can specify: - logsources=<comma_seperated_list_of_data_sources> Use this parameter to list specific data sources. You use this parameter if you want to run the app on the listed data sources. If you do not enter a value, the app runs on all data sources. - overwrite= true or false If you want to train a data source that is already trained, set this parameter to true. The already trained data source is trained again. The default value is false. Results The app uses the model training to create a historical base line. When the IBM SmartCloud Analytics - Log Analysis users runs the app from the UI, the app compares the current data against the data that is generated by the model training to identify anomalies. The outcome of the training is stored in the summaryfile.txt file in the AnomalyAppTraining directory. Chapter 2. Anomaly app 5
6 IBM SmartCloud Analytics - Log Analysis: Anomaly App
Appendix. Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-ibm product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to: Intellectual Property Licensing Legal and Intellectual Property Law IBM Japan, Ltd. 1623-14, Shimotsuruma, Yamato-shi Kanagawa 242-8502 Japan The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement might not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-ibm Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. Copyright IBM Corp. 2014 7
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact: IBM Corporation 2Z4A/101 11400 Burnet Road Austin, TX 78758 U.S.A. Such information may be available, subject to appropriate terms and conditions, including in some cases payment of a fee. The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equivalent agreement between us. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurement may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. Information concerning non-ibm products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-ibm products. Questions on the capabilities of non-ibm products should be addressed to the suppliers of those products. All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. If you are viewing this information in softcopy form, the photographs and color illustrations might not be displayed. Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml. 8 IBM SmartCloud Analytics - Log Analysis: Anomaly App
Adobe, Acrobat, PostScript and all Adobe-based trademarks are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, other countries, or both. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Other product and service names might be trademarks of IBM or other companies. Appendix. Notices 9
10 IBM SmartCloud Analytics - Log Analysis: Anomaly App
Product Number: Printed in USA