Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices



Similar documents
Configuring Apache Derby for Performance and Durability Olav Sandstå

1.0 Hardware Requirements:

How to handle Out-of-Memory issue

DMS Performance Tuning Guide for SQL Server

Symantec Backup Exec.cloud

Storage Sync for Hyper-V. Installation Guide for Microsoft Hyper-V

Sage Grant Management System Requirements

Hardware and Software Requirements for Installing California.pro

Update: About Apple RAID Version 1.5 About this update

Windows Server 2008 R2 Essentials

Technical Specification Data

Minimum Requirements for Cencon 4 with Microsoft R SQL 2008 R2 Express

BMC Recovery Manager for Databases: Benchmark Study Performed at Sun Laboratories

Release Notes. LiveVault. Contents. Version Revision 0

Virtuoso and Database Scalability

Very Large Enterprise Network, Deployment, Users

Product Guide. Sawmill Analytics, Swindon SN4 9LZ UK tel:

Installation and Setup: Setup Wizard Account Information

Upgrade to Webtrends Analytics 8.7: Best Practices

VBAK Agent 64-bit for Windows Version Release Notes 30 th March 2007

Planning Your Installation or Upgrade

Disaster Recovery Strategies: Business Continuity through Remote Backup Replication

Stellar Phoenix. SQL Database Repair 6.0. Installation Guide

ManageEngine EventLog Analyzer. Best Practices Document

Analyzing Network Servers. Disk Space Utilization Analysis. DiskBoss - Data Management Solution

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

Every organization has critical data that it can t live without. When a disaster strikes, how long can your business survive without access to its

Configuring Apache Derby for Performance and Durability Olav Sandstå

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Frequently Asked Questions

DocuShare Installation Guide

ImageNow for Microsoft SQL Server

A+ Guide to Software: Managing, Maintaining, and Troubleshooting, 5e. Chapter 3 Installing Windows

Atempo, Inc. LIVE BACKUP DEPLOYMENT GUIDE PLANNING AND DEPLOYING LIVE BACKUP IN YOUR CORPORATE ENTERPRISE. Author: Amy Gracer,

Attix5 Pro Server Edition

DOCUMENTATION FILE BACKUP

Very Large Enterprise Network Deployment, 25,000+ Users

PARALLELS SERVER BARE METAL 5.0 README

FileCruiser Backup & Restoring Guide

Deployment Planning Guide

5-Bay Raid Sub-System Smart Removable 3.5" SATA Multiple Bay Data Storage Device User's Manual

Preparing a SQL Server for EmpowerID installation

Managing your Red Hat Enterprise Linux guests with RHN Satellite

Web Hosting. Hosting. Cloud File Hosting. The Genio Group (214)

SOLUTIONS INC. BACK-IT UP. Online Backup Solution

Building an efficient and inexpensive PACS system. OsiriX - dcm4chee - JPEG2000

Software Pricing. Operating System

ITA Mail Archive Setup Guide

Product Brief. it s Backed Up

System Requirements Table of contents

Delphi+ System Requirements

32-bit and 64-bit BarTender. How to Select the Right Version for Your Needs WHITE PAPER

Install Instructions and Deployment Options

Backup architectures in the modern data center. Author: Edmond van As Competa IT b.v.

ProSystem fx Engagement. Deployment Planning Guide

AKCess Pro Server Backup & Restore Manual

Distribution One Server Requirements

AP ENPS ANYWHERE. Hardware and software requirements

Aras Innovator 11. Platform Specifications

7.x Upgrade Instructions Software Pursuits, Inc.

2-Bay Raid Sub-System Smart Removable 3.5" SATA Multiple Bay Data Storage Device User's Manual

Web Server (Step 1) Processes request and sends query to SQL server via ADO/OLEDB. Web Server (Step 2) Creates HTML page dynamically from record set

DATABASE. Pervasive PSQL Performance. Key Performance Features of Pervasive PSQL. Pervasive PSQL White Paper

Administrator Manual Across Personal Edition v6 (Revision: February 4, 2015)

DISK DEFRAG Professional

Enterprise Network Deployment, 10,000 25,000 Users

How To Use Attix5 Pro For A Fraction Of The Cost Of A Backup

Xythos WebFile Server Architecture A Technical Guide to the Core Technology, Components, and Design of the Xythos WebFile Server Platform

Online Backup Client User Manual Linux

F-Secure Internet Gatekeeper Virtual Appliance

Delphi 2015 SP1-AP1 System Requirements

Virtual Server and Storage Provisioning Service. Service Description

Database as a Service (DaaS) Version 1.02

We look beyond IT. Cloud Offerings

Replicating File Data with Snap Enterprise Data Replicator (Snap EDR)

Archiving File Data with Snap Enterprise Data Replicator (Snap EDR): Technical Overview

Recommended hardware system configurations for ANSYS users

AFS Usage and Backups using TiBS at Fermilab. Presented by Kevin Hill

The safer, easier way to help you pass any IT exams. Exam : E Backup Recovery - Avamar Expert Exam for Implementation Engineers.

Main Memory Data Warehouses

Tivoli Endpoint Manager for Remote Control Version 8 Release 2. User s Guide

Data Backup Options for SME s

AVG 8.5 Anti-Virus Network Edition

Charter Business Desktop Security Administrator's Guide

Windows Server Performance Monitoring

ENTERPRISE INFRASTRUCTURE CONFIGURATION GUIDE

Symantec System Recovery 2013 Management Solution Administrator's Guide

Department of Technology Services UNIX SERVICE OFFERING

ORACLE VIRTUAL DESKTOP INFRASTRUCTURE

Symantec Backup Exec TM 11d for Windows Servers. Quick Installation Guide

VMware vcenter Update Manager Administration Guide

Minimum Requirements for Cencon 4 with Microsoft R SQL 2008 R2 Standard

CHANGES IN GECS 3.50 PACKAGES

OPTIONS / AGENTS DESCRIPTION BENEFITS

Getting Started with ESXi Embedded

General Pipeline System Setup Information

GoGrid Implement.com Configuring a SQL Server 2012 AlwaysOn Cluster

Transcription:

Sawmill Log Analyzer Best Practices!! Page 1 of 6 Sawmill Log Analyzer Best Practices!

Sawmill Log Analyzer Best Practices!! Page 2 of 6 This document describes best practices for the Sawmill universal log analysis software. Log File Management Log files are your company s asset and irreplaceable. We strongly recommend that you retain your historical log files for as long as possible. If space is a concern consider compressing your log files. Log files typically compress down very nicely to anywhere between 90-98% of their original size. Sawmill is also capable of natively reading compressed files so there s no need to spend time decompressing files to process them with Sawmill. Compressed log files also reduce bandwidth if you are reading log files remotely, or moving files between remote servers. The most important reason to keep log files is to be able to rebuild your Sawmill database in the event of a catastrophic failure. If you retain your historical log files recovering a database is always possible, and it s a simple operation to rebuild your database. If you choose to follow only one of the best practices and guidelines, make the choice to retain your historical log data. Historical logs are cheap insurance (and typically use very little disk space when compressed) and having them will save you from data loss if you ever need to recover a database from the ground up. Backup Your Database We strongly recommend that if you build larger datasets that you create a backup plan for your Sawmill databases. Having a backup of your database will save you many hours (or even days) of reprocessing logs and rebuilding your database in the event of a database corruption or catastrophic failure. Though database corruptions or failures are infrequent, they can happen, and if these types of situations do occur you ll be glad you had a backup of your Sawmill databases so that you can be back up and running quickly. The Sawmill internal database is a flat file database and can be backed up by most any commonly available backup software. It s important that you schedule backups when the Sawmill database is not actively updating. If you use an external database like MySQL, Oracle, or MS SQL check with your Database Administrator about what options exist for backing up or restoring your Sawmill database. Both the Sawmill internal database, and external databases created by Sawmill can also be exported to plain text files manually, and then compressed, and later decompressed and reloaded manually if no better backup facility is available. Sawmill Scheduled Tasks Sawmill s built-in scheduler is very powerful and very flexible and allows you to setup and run many types of operations in an unattended manner. With this power it s possible to quickly run into trouble and overwhelm your server if you set up too many concurrent tasks. We

Sawmill Log Analyzer Best Practices!! Page 3 of 6 recommend that you do not schedule multiple database builds or multiple database updates concurrently. Sawmill tasks use a large amount of resources (memory, processor, disk space and disk access) on your server when you are processing and building large datasets. To avoid issues it s best to schedule large operations consecutively, even on systems with multiple processors. Disable Antivirus Active Scanning of the Sawmill Installation Directory Active-scanning antivirus software (and automatic disk defragmentation software, or indexing software), which runs continually and monitors all files on the disk, can interfere with Sawmill's operation. This software may also incorrectly conclude that Sawmill's database files contain viruses (they don't), causing it to modify or delete the files, resulting in database corruption. This can cause log processing, database building and report generations to fail. This type of software also considerably slows Sawmill processing and reporting down, in very extreme cases, making it as much as 20 times slower. So, you must disable scanning of the directory where Sawmill is installed, or Sawmill databases are located in order for Sawmill to operate properly. It is not necessary to disable the activescanning software entirely; just configure active scanning so it excludes the Sawmill installation directory and subdirectories and files. Most antivirus products can be configured to exclude scanning of certain directories. Best Practices For Setting Up Your Server Operating System and CPU Platform System requirements Sawmill will run on any platform, but we recommend x64 (64-bit) Linux. You can use any distribution, but Red Hat Enterprise Linux is a good choice for maximum speed, compatibility, and stability. On other distributions, it may be necessary to build Sawmill from source code. Other x64 operating systems are also reasonable choices, including x64 Windows, x64 Mac OS, x64 FreeBSD, and x64 Solaris. CPU architectures other than x64 will work, but overall we see better performance with x64 than with SPARC and other RISC architectures. 32-bit architectures are not recommended for any large dataset. The address space limitations of 32-bit operating systems can cause errors in Sawmill when processing large datasets. We consider a large installation of Sawmill to be a dataset of 10GB or more. For datasets greater than 10 GBs we strongly recommend using a 64-bit system, and it is best to have a dedicated Sawmill server. Sawmill can run on a virtual system, but performance can be considerably slower than running Sawmill on physical server. We occasionally see errors with Sawmill running in a virtual environment, so physical hardware is always recommended, especially for larger datasets.

Sawmill Log Analyzer Best Practices!! Page 4 of 6 Disk and Space You will need between 200% and 400% of the size of your uncompressed log data to store the Sawmill database. Databases tend towards the high side (400%) on 64-bit systems, especially when tracking a very large number of numerical fields (more than ten or so). For instance, if you want to report on 1 terabyte (TB) of log data in a single profile, you would need up to 4 TB of disk space for the database. This is the total data in the database, not the daily data added; if you have 1 TB of log data per day, and want to track 30 days, then that is a 30TB dataset, and requires between 60TB and 120TB of disk space. IMPORTANT: If your profile uses a database filter which sorts the main table, e.g., a "sessions" analysis (most web server logs do this by default), there is a stage of database build where the entire main table of the database is copied and sorted. During this period, the database will use up to twice the disk space; after the build, the disk space will drop back down to the final size. So for the example above, if there is a "sessions" analysis or other sorting database filter (another example is a "concurrent streams" analysis), it will temporarily use between 4TB and 8TB of space per day, before returning to 2TB-4TB at the end of the build. If you are using a separate SQL database server, you will need space to accommodate the server; the databases use this disk space, and the remainder of Sawmill will fit in a smaller space, so 1GB should be sufficient. Sawmill uses the disk intensively during database building and report generation, for best performance use a fast disk. Ideally use a RAID 10 array of fast disks. RAID 5 or RAID 6 will hurt performance significantly (about 2x slower than RAID 10 for database builds) and is not recommended. Write buffering on the RAID controller should be turned on if possible as it provides an additional 2x performance for database builds. Network mounts will usually work for storage of the Sawmill database but are not recommended for performance reasons. We sometimes see errors apparently due to locking and synchronization issues with network mounts. Memory On the Sawmill server, we recommend 2GB of RAM per core for large datasets. Processor(s) To estimate the amount of processing power you need, start with the assumption that Sawmill Enterprise processes 2000 log lines per second, per processor core for Intel or AMD processors; or 1000 lines per second for SPARC or other processors. Note: This is a conservative assumption; Sawmill can be much faster than this on some datasets reaching speeds of 10,000-20,000 lines per second per core in some cases. However for sizing

Sawmill Log Analyzer Best Practices!! Page 5 of 6 your processor, it is best to use a conservative estimate to ensure that the specified system is sufficient. Compute the number of lines in your daily dataset, 200 bytes per line is a good estimate. This will tell you how many seconds Sawmill will require to build the database. Convert that to hours, if it is more than six hours you will need more than one processor. You should have enough processors that when you divide the number of hours by the number of processors, it is less than 6. For example: 50 Gigabytes (GB) of uncompressed log data per day divide by 200 ~268 million lines of log data divide by 2000 ~134 million seconds divide by 3600 ~37 hours divide by 6 6 processors The use of six hours is based upon the assumption that you don't want to spend more than six hours per night updating your database to add the latest data. A six hour nightly build time is a good starting point. It provides some flexibility to modify or tune the database and filters that can slow down processing and keep within the processing time available each day. The dataset above could be processed in 9 hours using four processors; if a 9 hour nightly build time is acceptable. Use Internal Database For Best Performance The internal database is optimized for performance and database size for Sawmill s specific usage, and generally outperforms an external SQL server on equivalent hardware. However, an enterprise-scale SQL server, especially a cluster, may provide higher performance than the internal database, if the SQL server hardware is much faster than the Sawmill server. When using MS SQL, performance of ODBC imports are sometimes very slow, on the order of 100 rows per second. For best performance, configure the profile to use BULK_INSERT to load data into Sawmill, which can run 10,000 rows per second or faster. When using an external SQL database, Sawmill cannot use its efficient representation of unique fields like visitors or sessions, in cross-reference groups. This can make crossreference groups very large and slow when using a high volume of data. For best performance, remove unique fields from cross-reference groups (and associated reports), or remove the crossreference groups entirely. See below for more information on cross-reference groups.

Sawmill Log Analyzer Best Practices!! Page 6 of 6 Profile Tuning For High Volume A default profile created in Sawmill is optimized for maximum report generation speed. These settings can be very problematic for very large datasets (more than 200 million lines of log data), requiring very large amount of time and memory and disk space to build, and should be modified before any large import begins. It is best to start by turning off most of the database indices (all fields are indexed by default), in Config->Database Fields, and most of the crossreferences (in Config->Cross-reference Groups). Turning off cross-reference groups will make the database build faster, but will make the corresponding report much slower (minutes instead of seconds). Turning off indices will make the database build faster and the database smaller, but will make filtering on the corresponding field slower. The date/time cross-reference group should generally be left enabled, as it is used in nearly every report, and is not very expensive. Field complexity must also be managed for large datasets. Any normalized (non-numerical) field with more than about 10 million unique values can become a performance liability, and should be simplified or removed if possible. Fields can be simplified using a Log Filter, for instance setting the field value to a single constant value if it is not needed. Database fields can be removed completely using the command line remove_database_field action. During a large database build, it is good to monitor the web interface progress display for the build (click Show Reports on the profile), to see if there are any performance warnings appearing. These warnings also appear in the TaskLog. Single monolithic databases can also be a performance issue. Whenever possible very large datasets should be segmented into separate profiles, each with its own database, for instance one profile per server, or one profile per month, or one profile per region. Pre-Sales Support Strongly Recommended For Large Datasets If you are considering Sawmill for a very large dataset (more than 200 million lines of data), it is recommended that you contact Sawmill technical support in advance, to get expert guidance with your profile configuration. There is no charge for pre-sales technical consultations, and it is very likely to improve your initial experience of Sawmill. Contact support@sawmill.net for presales support. Document v2013-04-03-1