Log Management: Best Practices for Security and Compliance

Similar documents
Best Practices for Log File Management (Compliance, Security, Troubleshooting)

Protecting Data with a Unified Platform

How Configuration Management Tools Address the Challenges of Configuration Management

Streamlining Web and Security

Protecting Data with a Unified Platform

Steps to Migrating to a Private Cloud

Protecting Data with a Unified Platform

Securing Endpoints without a Security Expert

Developing a Backup Strategy for Hybrid Physical and Virtual Infrastructures

Beyond the Hype: Advanced Persistent Threats

Managing for the Long Term: Keys to Securing, Troubleshooting and Monitoring a Private Cloud

Best Practices in Deploying Anti-Malware for Best Performance

Collaborative and Agile Project Management

The Business Case for Security Information Management

Collaborative and Agile Project Management

How Traditional Physical Backup Imaging Technology Fits Into a Virtual Backup Solution

Mitigating Risks and Monitoring Activity for Database Security

Controlling and Managing Security with Performance Tools

Malware, Phishing, and Cybercrime Dangerous Threats Facing the SMB State of Cybercrime

Tips and Best Practices for Managing a Private Cloud

What Are Cloud Connected Data Protection Services About?

Making Endpoint Encryption Work in the Real World

Fulfilling HIPAA Compliance by Eliminating

What Are Certificates?

The Definitive Guide. Active Directory Troubleshooting, Auditing, and Best Practices Edition Don Jones

The Evolving Threat Landscape and New Best Practices for SSL

How to Install SSL Certificates on Microsoft Servers

Enabling Useful Active Directory Auditing

Becoming Proactive in Application Management and Monitoring

Real World Considerations for Implementing Desktop Virtualization

How Are Certificates Used?

Maximizing Your Desktop and Application Virtualization Implementation

Maximizing Your Desktop and Application Virtualization Implementation

10 Must-Have Features for Every Virtualization Backup and Disaster Recovery Solution

The Definitive Guide. Monitoring the Data Center, Virtual Environments, and the Cloud. Don Jones

The Essentials Series. PCI Compliance. sponsored by. by Rebecca Herold

Account Access Management - A Primer

Managing Your Virtualized Environment: Migration Tools, Backup and Disaster Recovery

Administration Challenges

Virtual Machine Environments: Data Protection and Recovery Solutions

Maximizing Your Desktop and Application Virtualization Implementation

Reducing Backups with Data Deduplication

Why Endpoint Encryption Can Fail to Deliver

How to Use SNMP in Network Problem Resolution

The Next-Generation Virtual Data Center

Maximizing Your Desktop and Application Virtualization Implementation

Auditing File and Folder Access

The Art of High Availability

Realizing the IT Management Value of Infrastructure Management

The Definitive Guide to Cloud Acceleration

Real World Considerations for Implementing Desktop Virtualization

Matching High Availability Technology with Business Needs

Where Do I Start With Virtual Desktops?

whitepaper Ten Essential Steps for Achieving Continuous Compliance: A Complete Strategy for Compliance

The Essentials Series: Enterprise Identity and Access Management. Authentication. sponsored by. by Richard Siddaway

Can You Trust a Cloud-based Security Solution?

How To Understand The Difference Between Network Analysis And Network Monitoring

Data Protection in a Virtualized Environment

Best Practices Report

Quickly Recovering Deleted Active Directory Objects

Server Monitoring: Centralize and Win

Tips and Tricks Guide tm. Windows Administration. Don Jones and Dan Sullivan

To Cloud or Not to Cloud? Growing a Managed Services Portfolio

Non-Native Options for High Availability

PCI DSS Reporting WHITEPAPER

The Essentials Series: Enterprise Identity and Access Management. Authorization. sponsored by. by Richard Siddaway

Understanding the Business Benefits of Managed Services

Solving the Storage Challenge Across Platforms: Transparent Compression for Windows Operating Systems

How to Install SSL Certificates on Microsoft Servers

Managed Service Plans

How To Manage A Privileged Account Management

Boosting enterprise security with integrated log management

Using Web Security Services to Protect Portable Devices

Replication and Recovery Management Solutions

Scalability in Log Management

Logging and Alerting for the Cloud

Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations

Eradicating PST Files from Your Network

Business Communications Tools and Solutions

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

How the Software-Defined Data Center Is Transforming End User Computing

The Shortcut Guide To. Availability, Continuity, and Disaster Recovery. Dan Sullivan

Firewalls Overview and Best Practices. White Paper

Lowering Costs of Data Protection through Deduplication and Data Reduction

Client Monitoring with Microsoft System Center Operations Manager 2007

Understanding Enterprise Cloud Governance

Active Directory 2008 Operations

IBM Software InfoSphere Guardium. Planning a data security and auditing deployment for Hadoop

Breach Found. Did It Hurt?

Understanding & Improving Hypervisor Security

White Paper. PCI Guidance: Microsoft Windows Logging

Enterprise Computing Solutions

The Administrator Shortcut Guide tm. Active Directory Security. Derek Melber, Dave Kearns, and Beth Sheresh

Fifty Critical Alerts for Monitoring Windows Servers Best practices

How to Install SSL Certificates on Microsoft Servers

MANAGED FILE TRANSFER: 10 STEPS TO SOX COMPLIANCE

Pr oactively Monitoring Response Time and Complex Web Transactions Working with Partner Organizations... 2

The Sumo Logic Solution: Security and Compliance

The Shortcut Guide To. Implementing Virtualization in the Small Environment. Greg Shields

Best Practices for Building a Security Operations Center

Transcription:

Log Management: Best Practices for Security and Compliance The Essentials Series sponsored by

Introduction to Realtime Publishers by Don Jones, Series Editor For several years now, Realtime has produced dozens and dozens of high quality books that just happen to be delivered in electronic format at no cost to you, the reader. We ve made this unique publishing model work through the generous support and cooperation of our sponsors, who agree to bear each book s production expenses for the benefit of our readers. Although we ve always offered our publications to you for free, don t think for a moment that quality is anything less than our top priority. My job is to make sure that our books are as good as and in most cases better than any printed book that would cost you $40 or more. Our electronic publishing model offers several advantages over printed books: You receive chapters literally as fast as our authors produce them (hence the realtime aspect of our model), and we can update chapters to reflect the latest changes in technology. I want to point out that our books are by no means paid advertisements or white papers. We re an independent publishing company, and an important aspect of my job is to make sure that our authors are free to voice their expertise and opinions without reservation or restriction. We maintain complete editorial control of our publications, and I m proud that we ve produced so many quality books over the past years. I want to extend an invitation to visit us at http://nexus.realtimepublishers.com, especially if you ve received this publication from a friend or colleague. We have a wide variety of additional books on a range of topics, and you re sure to find something that s of interest to you and it won t cost you a thing. We hope you ll continue to come to Realtime for your educational needs far into the future. Until then, enjoy. Don Jones i

Introduction to Realtime Publishers... i Ar ticle 1: The Importance of Log Management to Your Security and Compliance Practices.. 1 Understanding Log Files... 1 Log Forwarding... 2 Lo g File Uses... 2 Compliance... 3 Health and Troubleshooting... 3 Ce ntralization... 4 Integrity... 4 Reporting and Alerting... 5 Conclusion... 5 Ar ticle 2: How to Leverage Your Logs to Secure Your Environment... 6 Scenario 1: The Security Incident... 6 Scenario 2: A Visit from the Auditor... 7 Scenario 3: It s Been Going on for Weeks... 8 Additional Things to Consider... 9 Conclusion... 10 Article 3: Best Practices for Log File Management (Compliance, Security, Troubleshooting)... 11 Ar chitecting the Infrastructure... 11 Extending Centralization Beyond Servers... 12 Log File Retention... 13 Estimate Storage Requirements... 13 Optimizing Bandwidth... 14 Le verage the Logs... 15 The Database... 16 Conclusion... 17 ii

Copyright Statement 2011 Realtime Publishers. All rights reserved. This site contains materials that have been created, developed, or commissioned by, and published with the permission of, Realtime Publishers (the Materials ) and this site and any such Materials are protected by international copyright and trademark laws. THE MATERIALS ARE PROVIDED AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. The Materials are subject to change without notice and do not represent a commitment on the part of Realtime Publishers its web site sponsors. In no event shall Realtime Publishers or its web site sponsors be held liable for technical or editorial errors or omissions contained in the Materials, including without limitation, for any direct, indirect, incidental, special, exemplary or consequential damages whatsoever resulting from the use of any information contained in the Materials. The Materials (including but not limited to the text, images, audio, and/or video) may not be copied, reproduced, republished, uploaded, posted, transmitted, or distributed in any way, in whole or in part, except that one copy may be downloaded for your personal, noncommercial use on a single computer. In connection with such use, you may not modify or obscure any copyright or other proprietary notice. The Materials may contain trademarks, services marks and logos that are the property of third parties. You are not permitted to use these trademarks, services marks or logos without prior written consent of such third parties. Realtime Publishers and the Realtime Publishers logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. If you have any questions about these terms, or if you would like information about licensing materials from Realtime Publishers, please contact us via e-mail at info@realtimepublishers.com. iii

Article 1: The Importance of Log Management to Your Security and Compliance Practices Virtually all information technology systems, applications, or appliances that an enterprise deploys shares a common thread no matter what type of operating system (OS) they run or application they are. What they share is that to one degree or another, details regarding the operations they perform are captured in log files. The log files that systems and applications create can contain a vast wealth of information about the health and daily activity of the infrastructure. However, these logs are generally local to the system or applications that generate them. This highly distributed creation and storage of log files creates significant challenges when an enterprise wants to leverage the logs in a way that will benefit the enterprise from a security and compliance perspective. This series will focus on the benefits of centralizing logs and best practices for leveraging them for troubleshooting, handling incident response, and maintaining compliance with existing and new regulations. To best leverage log files, you must develop an effective strategy for centralizing the collection of logs and the type of systems from which logs should be collected. Understanding Log Files The first step to leveraging log files is to develop a general understanding of the types of data recorded by the various operating systems (OSs) and applications that reside within the typical enterprise information technology infrastructure. OSs of all types will log system and application activity as well as authentication and configuration changes. Each event that is logged, no matter what the actual log storage mechanism is, will contain data regarding the date and time the event was created and, when possible, the account that performed the action. Log files are created in several formats, from flat text files, those that adhere to standards like W3C for Web servers, to those that are completely proprietary. 1

Log file locations also vary greatly depending on the OS or application. Unix/Linux OSs have a standard /var/logs/ directory that most applications adhere to. Microsoft Windows has moved toward an xml based log system. However, third party vendors may not leverage the event log system, choosing instead to store text logs in different locations. In fact, there are some Microsoft services that don t leverage the built in log structure. Internet Information Server (IIS) and Windows Firewall with Advanced Security utilize text files. In most cases, the degree of detail that logs can capture is also configurable and ranges in scope from nothing to everything. By default, most OSs and applications will log detail somewhere in the middle, and vendors will only recommend logging everything for very short periods of time for troubleshooting purposes. Log Forwarding Modern operating systems, appliances, and network equipment all contain built in functionality to forward event logs another location, either for the purposes of centralization or simply to archive logs to other systems. Microsoft operating systems, for example now contain a built in log forwarding mechanism which is subscription based. In their implementation, one system subscribes to the events of another. For non windows systems, network equipment, printers, and appliances, SYSLOG is the established standard for centralizing events. Log File Uses Log files can be used for multiple purposes; the most common use is by support staff to troubleshoot system, application, or configuration issues. Enterprises that limit the use of log files to troubleshooting are failing to take advantage of the additional benefits logs can provide. However, these businesses may not be able to take full advantage because they have not invested in log centralization. Without an effective centralization strategy, properly leveraging the data in log files is a very labor intensive process. For example, log files can be a critical component when investigating a security incident. The files will likely contain the necessary information needed to answer the required questions of who, what, where, and when. Taken one step further, centralization of logs enables events from different systems to be analyzed in one place. In the case of a security incident, multiple systems might be compromised and there may be common indicators in the logs that one could look for. The valuable insight can be used to detect what systems were impacted or where the actor/malware went once inside the infrastructure. Without centralized logs, the investigators would have to check logs on each system, which could be a very labor intensive process. Note The next article will discuss in greater detail the benefits of centralized logging and how tools can be leveraged to make detection of security incidents easier. 2

Compliance Additionally, log files can be leveraged for maintaining regulatory compliance. Many organizations must comply with regulations like the Sarbanes Oxley (SOX) Act, which requires auditing of activities like the provisioning of user accounts or access to financial systems. There are a number of industries that are beginning to fall under additional regulatory requirements both within the United States and internationally, such as SOX, the Health Insurance Portability and Accountability Act (HIPAA), and Dodd Frank. Internationally, versions of Sarbanes Oxley have been implemented in Japan and the European Union. This ever growing list of regulations and their increased scope further necessitates that enterprises leverage the benefits of centralized log collection as a tool to help maintain their compliance. Logs play a key role in maintaining compliance because they provide supporting evidence during an audit. The auditor may take a sample of a particular activity (user provisioning, for example) and will require evidence of how those accounts were provisioned to verify that established processes and procedures were followed. The lack of centralized logging or tools to extract the right data for an auditor could result in wasted time collecting the right information (provided the individual systems still have it) or worst case, the required data has been overwritten, tampered with, or lost. Health and Troubleshooting Log files also contain a vast wealth of information about the activity and health of the IT infrastructure. Files collect data points like system uptime, resource utilization, and user activity, and logs often contain the information necessary to avoid a service disruption. However, to employ logs to avoid service disruption requires monitoring of those logs. For example, consider a server with storage problems, either running out of available disk space or a hard drive that s failing. Systems will log these events, and in many cases, well before an actual failure. Given that a common practice by systems administrators is to only review logs when there is a real problem, the events may go unnoticed until a complete failure has occurred. If logs had been centralized and tools deployed to alert on these types of issues, corrective action could have been taken in a controlled manner to avoid unexpected service disruptions. The detection of overall health problems also includes considerations less catastrophic than hardware failure but potentially just as important. It may be a situation where a service is degraded and users notice it but don t notify IT that they are seeing a problem. In this case, the issue may go unnoticed by IT for an extended period of time. At some point, IT is made aware, and a review of the logs indicates that the problem has been going on for weeks. If only those events had been forwarded to a central location, they could have been detected and corrective action taken. 3

Centralization Centralized log collection provides a tremendous amount of benefit, but there are also challenges that need to be overcome. The first and most important is ensuring that the amount of available storage is sufficient to hold what will be collected. Failure to plan a sufficient amount of storage will nullify many of the benefits that centralization affords, so it is critical that a significant amount of planning and analysis be performed before embarking on centralizing logs. One of the several factors to consider when planning centralized log collection is to determine what systems to collect from. Services should be prioritized based on the sensitivity of the data that they hold. Those that contain authentication, financial, personnel, or company proprietary data should be considered the most important. Centralizing log collection from these systems benefits both security and compliance monitoring. From there, a review of the other services can be prioritized based on their relative importance to the most critical systems. From a security perspective, one area that is often overlooked is users workstations. If resources permit, collect workstation logs because these are generally the first systems to be compromised. The centralized collection of workstation logs, coupled with the ability to analyze what s collected, enables more rapid detection of a compromised system. This can help prevent the compromise of a single system from becoming a more widespread security incident. Workstation logging can also be used by support organizations to more proactively detect application and configuration issues within the environment. This helps to mitigate widespread user disruption do to issues caused by common activities like patch or application deployment. Integrity Another challenge of centralized logging pertains to the integrity of what is collected. One should assume that any log file can be tampered with or modified. This is one of the most common activities performed by malware and hackers to hide or mask their presence on a system. There are a couple of ways that this can be avoided. The first is to make sure log events are streamed in real time as they are created to the central repository. Solutions that provide this type of functionality eliminate the risk of an attacker compromising a system and then deleting the local logs to hide their activity. However, real time streaming of logs can negatively impact network performance in very large environments due to the sheer volume of traffic that it could generate. This can be mitigated by a properly architected log infrastructure, which will be discussed in more detail in the third article in this series. Another way tampering can be detected is to leverage products that verify the integrity of collected events. This is generally done by using hashing algorithms to compare the source event to the one received by the collector. If the hashes of the source and destination logs match, then one can be assured the log wasn t tampered with. 4

Reporting and Alerting Although centralization provides a single repository for all log data, there is still a major challenge with turning the collected data into useful, actionable information. Depending on the flexibility of the logging that a system or application provides to control how much is logged, there may be a significant portion of the data that is of little or no use in support of compliance or security monitoring. In many cases, this extra data creates noise that makes finding the important bits equivalent to searching for a needle in a haystack. The noise that logs contain also translates directly to wasted labor because it extends the amount of time required to identify and transform the collected data into actionable information. This is also true in the case of an audit where resources are wasted trying to identify the events that the auditor requires. Noise also creates opportunities for critical information to be overlooked, or analyzed incorrectly. Therefore, when selecting a tool for log centralization, one must evaluate the tool s ability to transform the collected data into useful information by eliminating the noise. Conclusion With the discussion of the importance of log forwarding and centralization concluded, the next article dives deeper into the various ways that centralized logging can be fully utilized. Included are real world scenarios where log centralization coupled with appropriate tools to analyze, report, and alert on the collected events enables an enterprise to better manage their information technology infrastructure. 5

Article 2: How to Leverage Your Logs to Secure Your Environment The first article in this series discussed the importance of log files and began to make a case for centralizing log collection. This article will bring more attention to centralization and present more detailed examples of the benefits it provides with respect to security, compliance, and troubleshooting. To accomplish this, three real world scenarios will be presented. Each will compare and contrast the difference in response when logs have been centralized versus when they aren t. Alerting and reporting will also be highlighted in these scenarios, as they play a critical role in driving people to action and increasing efficiency. Scenario 1: The Security Incident The information security term that has gained the most media attention recently is Advanced Persistent Threat (APT). Although the term has been applied to various types of attacks, the simplest way to describe an APT is a security breach that uses multiple attack techniques to circumvent common security controls like firewalls and antivirus software and go for extended periods of time undetected. In this example scenario, imagine that one or more systems have been compromised by an APT style attack that is attempting to gain access to sensitive information on other systems. At some point during this attack, a system has been identified as being compromised and the investigation reveals event log entries that would not be generated under normal operating conditions. These events can now be considered indicators of the APT activity and are looked for during subsequent investigations of other systems. At this point, one of two things will take place depending on whether log files are centralized. Without centralization, information security personnel or systems administrators decide that every system should be examined to see whether the indicator events appear anywhere else. Depending on the size of the enterprise, this task will be a very time consuming process. Although there may be scripts or other tools available to perform the analysis, there remains the requirement to touch every system. 6

In contrast, if event logs from these systems had been centralized, determining the scope of the attack would be as simple as looking for the indicator event on the log collector. Taking this one step further, the event collector may have the capability to send an alert if the event is collected from any system. Action can then be taken immediately, like pulling the system off the network to prevent the attack from spreading to other systems. The final benefit of event log centralization is the fact that all log entries contain timestamps that indicate when the event occurred. In this scenario, the timestamps potentially could be used to determine how long the APT has been active and the systems that have been impacted as well as identify the first system that was compromised. With all of this information at hand, IT security personnel and systems administrators can gain a solid understanding as to how the APT was able to get into the infrastructure. They can then take steps to prevent it from happening again. An important consideration in this case is the assumption that the compromised system was actually forwarding its logs. One common mistake when developing a strategy for log centralization is a failure to consider forwarding workstation logs because the overhead of doing so is too expensive. In large enterprises, collecting workstation logs could generate terabytes of data that could be very expensive to store and could generate a substantial amount of WAN traffic. There are ways that this can be mitigated and, going back to the APT scenario, it is very likely that this type of attack would start on a workstation. The bad guys also know that enterprises often don t forward events from workstations and use this to their advantage. They will compromise workstations and then launch attacks from there in ways that appear on target systems as normal activity. Executed this way, malware is capable of operating for extended periods of time completely undetected. Note The third article in this series offers suggestions as to how a balance can be achieved between the need to collect workstation logs and the resources required to accomplish it. Scenario 2: A Visit from the Auditor The first article in this series touched on the benefits of log centralization for maintaining compliance. In this example scenario, suppose an enterprise must be Sarbanes Oxley (SOX) compliant. Maintaining this compliance requires an annual, in person audit where systems administrators have to demonstrate that they have the appropriate controls in place to log access to a critical financial system. The enterprise has developed and documented processes and controls for the request, approval, and creation of user accounts and the delegation of rights to users who have the ability to access financial data. The auditor is provided with a sample set of existing accounts from which they identify several that will be the subjects of the audit. The requirements of the audit for systems administrators is that they must provide evidence that individuals with proper authorization created the accounts and permitted them to access the financial system in question. There are several aspects of this scenario where the required evidence would not exist in log files, so for the purposes of this discussion, those will be ignored. 7

First, the auditor requests the list of accounts that were created for a particular time period. Although there are a couple ways this list could be provided, one of them would be from the centralized logs because events for the creation of a user are recorded. Second, to provide evidence that the account was provisioned by authorized individuals, the centralized logs are critical. The events that are captured when a user is created would come from the authentication service. The event where a user was granted access to the financial system may come from the financial system itself. It s also likely that multiple systems will comprise both the authentication and financial infrastructures. For example, if Microsoft Active Directory (AD) is used for authentication, one of the domain controllers will have the account creation event. Without log centralization, the logs of each domain controller would have to be searched independently to find the event. With the logs centralized, a single search can be performed for the user in question that would return both the account provisioning event and the rights delegation event even though they were on completely different systems. Of course, these log entries would also include the identity of the individuals that performed the action as well as when the actions were performed. This would be the evidence required by the auditor indicating that the account was provisioned and granted access by someone that is authorized to do so. It s important to also point out here that the scenario could be reversed to detect the provisioning and delegation of an account by someone that is NOT authorized to do so, but for whatever reason they had the appropriate rights to do so. In this case, log centralization could then be leveraged to alert people to this act so that appropriate action could be taken to revoke the rights and prevent unauthorized creation and delegation from happening again. This may become one of the most critical aspects of successfully passing the audit in this scenario because the unauthorized actions would have been detected, documented, and reverted before the audit took place. Scenario 3: It s Been Going on for Weeks The first two scenarios addressed the benefits of log centralization for security and compliance. This third and final scenario will examine the benefits of log centralization to enhance troubleshooting and proactive problem resolution. Suppose every user in the enterprise relies on a Web based timecard application in order to account for their time worked. For the past several weeks, a portion of the users have experienced slowness when trying to update their timecards, but they didn t question or report it to IT because they thought it was normal. 8

It turns out that one of the Web servers that participates in the cluster providing the Web interface has a configuration problem that has been logged to the Web logs; however, because those logs aren t centralized or reviewed, the problem went undetected. The support staff had encountered issues with the application in the past and had developed a custom monitoring script that is scheduled to run on a daily basis. Some period of time later, they forgot the script was in place and the scheduled task had stopped executing, so it was no longer monitoring. Additionally, the individuals who support the application use it with the same frequency as all the other users. They just happened to have been using a server that didn t have the problem. The rationale for centralizing the Web logs of these servers becomes obvious. If Web log centralization had been in place with a tool that monitors and alerts on Web service events, the support staff would have been made aware of the problem right away. They were also relying on a long forgotten scheduled task that was intended to notify them if there was a problem. Expanding on that aspect, the issues with the scheduled task failing to start could have been reported on. The events that the task failed to start or if the account used to run the task had an expired password, multiple failed login attempts would have triggered an alert that could have been acted upon. Instead, IT was relying on users detecting and reporting the problem. Additional Things to Consider All three of these scenarios highlight the benefits of centralizing log collection, but there are two critical aspects that also need to be considered. The first is the real time streaming of events to the collector. In all three scenarios, real time streaming is important because one has to assume that log files on a given system can be modified. This is of particular importance for the security incident scenario because malware like the APT has the capability to avoid detection by modifying the local event logs. Real time streaming of events avoids this risk because the events are sent in real time to the collector as they are created on the systems before malware has any opportunity to delete them. In fact, a review of an affected system may not have some of the events that were captured by the central collector. This too can be considered an indicator of malicious activity and, depending on the tool, leveraged to alert IT staff of malicious activity. Real time streaming also applies to the auditing and troubleshooting scenarios. In addition to the scenario mentioned earlier, another example is a complete system failure. If this were to happen and the local logs on the failed system were unrecoverable, it would be impossible to recover events required for an audit or be able to assess what happened on the system right before the failure. In these cases, real time streaming behaves much like a flight data recorder on an airplane, capturing events right before a crash except in this case, it s not the physical enclosure that keeps the data safe, it s the fact that events are sent to an external system in real time. Real time streaming also prevents data loss for more benign configurations like maximum log sizes reached resulting in events getting overwritten when the log rolls and starts over. 9

The other aspect that needs to be mentioned, which augments real time streaming, is the need to ensure the integrity of the events that are forwarded. In order for the forwarded events to be trusted, the collector must have the ability to validate that each event it receives has not been tampered with. The most common way to accomplish this is by creating hashes of the event on the system before being forwarded. Then, once the event is received by the collector, the same hashing algorithm is used. If the hashes of the source and centralized events are the same, one can be confident that the log was not tampered with. Conclusion The example scenarios outlined earlier demonstrate the need for log centralization as well as the benefits it provides; however, those benefits can only be realized through the implementation of a well planned and designed infrastructure. Furthermore, in order to take advantage of what is collected, the right tools that provide alerting and reporting must also be selected. The next article examines the criteria for gathering requirements and implementing a log centralization strategy. 10

Article 3: Best Practices for Log File Management (Compliance, Security, Troubleshooting) The final article in this series will move beyond the details of specific logs and scenarios in which they can be used to a discussion on best practices for implementing and leveraging centralized log management. Architecting the Infrastructure In order to develop an effective centralized log management strategy, the first task is the development of requirements for what will be collected, from which systems, and for how long logs will be retained. To determine what systems from which logs will be collected, the simplest thing to do is break systems into tiers based on the service they provide. For example, systems that hold critical business, financial, and authentication data would be required to have logs centralized, whereas systems that perform less critical business tasks could be excluded. One of the best methods to use when breaking systems into tiers is to examine regulatory requirements, as these may identify systems that are audited. Another valuable source to determine whether logs should be centralized is the company s disaster recovery plan. If a disaster recovery plan has already been developed, it s very likely that all of the systems that comprise the infrastructure have already had their criticality assessed so that in the event of a disaster, the most critical systems are restored first. Those that rank the highest on the disaster recovery plan should have their logs centralized. Figure 1 provides an example of a three tier model with Tier 1 being the most important systems for which to centralize logging. 11

Tier Role 1 Network Infrastructure 1 Financial Systems 1 Personally Identifiable Information (PII) 1 Identity/Authentication Systems 1 DMZ Systems (Internet Accessible) 2 Management Systems (Patch, Configuration, Etc.) 2 Non Business Critical 3 Workstations 3 Development/Test Systems Figure 1: Three tier role categorization. Extending Centralization Beyond Servers Of course, log centralization should not be limited to just systems. Both the network and security infrastructures should be required to have their logs centralized. These devices or appliances will not only contain valuable information regarding the overall health of the infrastructure, their logs will be some of the first to be examined during a security incident. Having the logs centralized will enable the incident response team to track any incident throughout the enterprise. Note The first thing being evaluated when gathering requirements is which systems will have their logs centralized. It was intentional to mention this before any other aspect because this is the required first step in determining what the ultimate centralization solution will look like and will drive all subsequent requirements. Under ideal circumstances, the log centralization infrastructure will be architected to support what has been identified to be collected as opposed to limiting what can be collected based on predetermined tools or storage limitations. 12

Log File Retention The next area for requirements gathering is a determination as to how long logs will be retained. In conjunction with that, one must identify the type of access to the centralized logs required throughout the retention period. For example, there may be a requirement to retain logs for 7 years, but immediate access to log data may only be required for 1 year. In these situations, the centralization architecture can be designed to include an archival process where the most recent year s worth of data is readily available, and data from years 2 through 7 are archived and thus require slightly more effort to access. The benefit of implementing an archiving strategy is that the retention requirements are met, but the cost of meeting the requirement can be reduced. An example of a centralization strategy that leverages archiving would be one where the last year s worth of data is stored in a database that offers rapid access to the data, and everything beyond the one year period is stored as flat files that can be compressed or stored using less expensive hardware. Another way to gain the most efficiency with log storage is to determine retention periods per system based on the role that the system provides. Regulatory requirements may dictate extended retention periods for certain systems, while with other systems, there may be no value in retaining the logs beyond a much shorter period of time. In the previous article, the value of centralizing workstation logs was mentioned citing the potential benefit of detecting security incidents; however, this may be the first category of devices that are eliminated from log centralization due to the increased storage requirements that it would impose. This represents a prime example of establishing retention periods based on role because the usefulness of centralized workstation logs may only be something like 90 days or less. Given the benefit that centralizing workstation logs provides, it would be advantageous to have the ability to adjust the retention period as a means to control cost versus the total elimination of the collection itself. Estimate Storage Requirements The last requirement to be collected is an estimate of the log sizes themselves. This will likely be the most challenging part of the requirements gathering process because the size of the logs on a particular system are directly related to the services that it provides or the applications that are installed. To help with this determination, vendors will often have estimates of log sizes based on other conditions such as how many users access the system. Beyond that, it s really just performing the leg work to gather data on existing log sizes and, where possible, projecting growth. Although this may be the most challenging task, it is also the most important because implementing an improperly sized centralized log infrastructure can render the collection useless, especially if there isn t sufficient storage space to collect and retain what has been identified. 13

One way to make log centralization as efficient as possible is to limit what is forwarded to specific events. Microsoft and many third party Web sites provide detailed lists of events and explanations of what generates them. These resources can be utilized to identify the specific event IDs that need to be collected. A good example for AD domain controllers is to only centralize events that pertain to authentication attempts (both successful and unsuccessful), changes to group memberships, and creation/deletion of user accounts. Taking this approach can dramatically reduce the number of events that are forwarded to the collector, thereby reducing network and storage requirements. There is, however, a word of caution with this approach. A significant amount of analysis must be performed to ensure that a sufficient number of events are collected that will satisfy security and audit requirements. If great care is not taken, it s likely that some will be missed and as such the collection of forwarded events will not be able to create a complete picture of system activity. If resources permit, it is much more advantageous to forward all events and rely on the centralization tool to perform the required filtering. This setup will avoid a circumstance where a required event wasn t being forwarded, and therefore being overlooked during an analysis. Optimizing Bandwidth Once the requirements have been collected, the next step is to determine the mechanism that will perform the centralization and the impact that it will have on WAN bandwidth. Much of the design will be dictated by how dispersed the company is. Companies that have locations spread throughout the country or the globe will likely want to implement a tiered approach to centralization. A tiered architecture would have regional collectors which then consolidate everything to a central location. This approach provides the benefit of collection logs closer to the actual clients, thereby reducing WAN traffic while at the same time centralizing the collection of all logs. The number of tiers will be largely dependent on the network topology and the number of clients at each site. Figures 2 and 3 depict a flat architecture and a tiered model with regional collectors that forward to a central collector. The main takeaway from this is that the log centralization architecture can be designed to minimize WAN traffic while at the same time centralizing all the logs. 14

Central Collector Central Collector Network Equipment Figure 2: Single Tier Architecture Firewalls Workstations Servers West Regional Collector Central Regional Collector East Regional Collector Network Equipment Firewalls Workstations Servers Figure 3: Two Tier Architecture Leverage the Logs Once the architecture is in place and systems are forwarding all their events, the next step is to leverage the information that is collected. There are multiple ways that this can be accomplished. The first is to establish criteria for automated alerts when specific events occur. The previous articles in this series provide examples of real time alerts, but those were limited in scope. Additional alerts to consider include multiple failed login attempts or accounts being locked multiple times over a short period of time. This type of activity may be something benign like someone forgetting to update a scheduled task or service account, but can also be an indication of malicious activity. Multiple failed login attempts also provides a good example of tailoring an alert based on a threshold of occurrences over a period of time because it s likely that a single failed login attempt does not warrant immediate notification, but if that same account has multiple failed login attempts over a short period of time, an alert is warranted. Having the ability to set thresholds on the number of events before an alert will prevent false positives and serve to create the overall effectiveness of the alerting process. Provisioning of accounts or accounts being added to highly sensitive groups is another way systems administrators can employ real time alerts. In addition, infrastructure health should be configured for real time alerting. Events like systems that are low on available disk space or services that failed to start are good candidates. 15

Ideally, each alert should be configured to notify the appropriate individuals and configured so that the alert they receive contains actionable information. Part of this process might require the development of processes and procedures that are to be followed when the alerts are received. Real time alerts should also be set up for network infrastructure events like excessive firewall denies on a particular port or events that are captured by intrusion detection systems. As mentioned previously, these alerts, if implemented properly, will become the key to preventing widespread security incidents and compliance issues. Tools that centralize log collection will often have reporting features that should also be leveraged. Reports that are capable of creating heat maps of events over a period of time can aid in the diagnosis of a problem in a particular region. For example, consider a circumstance where a patch was deployed to a particular set of systems in a given region that then breaks a critical service. A report that displays the failed start event for the broken service for that set of systems could aid in the correlation between the patch deployment and the failed service start. Reporting could also be leveraged to collect metrics on uptime status for systems. For example, all Microsoft Windows systems write events to the log with respect to uptime and boot time. These collected events could be leveraged to generate uptime or last reboot reports for systems in the enterprise. The Database Some of the centralization products available also use a database to store all the forwarded events. This can be very advantageous because it allows systems administrators the flexibility to extend the usefulness of the collected data beyond the tools and interface the vendor has provided. There may be circumstances where the data needs to be analyzed in ways that aren t provided by the vendor. With some knowledge of the underlying database, queries can be written to very specific needs. Being able to query a database can also play a key role in optimizing the retention and archival process. It may be the case where only certain events need to be retained for an extended period of time, and with the ability to interact directly with the database, a systems administrator could be enabled to extract and store those specific events on another external system. This setup would not only reduce the storage requirements of the collector but also eliminate unnecessary events from the archival process. Direct database access would also allow custom dissemination of the data either via a Web service like SQL Reporting Services or other custom means. 16

Conclusion This series has examined the importance of the data stored in log files and the benefits of centralizing collection so that logs can be leveraged for incident response, compliance management, and troubleshooting. Real world examples were used to demonstrate the benefits that centralization provides. Finally, a strategy was outlined and best practices were identified for collecting, alerting, and reporting on the centralized events. All of these tasks would result in an enterprise solution that enables systems administrators, information security staff, and compliance personnel to operate more efficiently and effectively because data that was once highly distributed is now centralized and readily accessible. 17