Integrating QRadar with Hadoop A White Paper



Similar documents
F-Secure Messaging Security Gateway. Deployment Guide

Juniper Secure Analytics

Adaptive Log Exporter Users Guide

Juniper Secure Analytics

NetIQ Sentinel Quick Start Guide

Integrating VoltDB with Hadoop

IBM Security QRadar Version (MR1) WinCollect User Guide

How To Install Powerpoint 6 On A Windows Server With A Powerpoint 2.5 (Powerpoint) And Powerpoint On A Microsoft Powerpoint 4.5 Powerpoint (Powerpoints) And A Powerpoints 2

IBM WebSphere Application Server Version 7.0

Oracle Service Bus Examples and Tutorials

QRadar SIEM 7.2 Flows Overview

DEPLOYMENT GUIDE Version 1.1. Deploying the BIG-IP LTM v10 with Citrix Presentation Server 4.5

Setting Up SSL on IIS6 for MEGA Advisor

Introduction to Mobile Access Gateway Installation

Installation and Deployment

Kaltura On-Prem Evaluation Package - Getting Started

Setting Up Scan to SMB on TaskALFA series MFP s.

Web Application Firewall

Data processing goes big

Salesforce Integration

Leveraging SAP HANA & Hortonworks Data Platform to analyze Wikipedia Page Hit Data

Deploying Remote Desktop Connection Broker with High Availability Step-by-Step Guide

IBM Security QRadar SIEM Version MR1. Log Sources User Guide

SysPatrol - Server Security Monitor

Data Domain Profiling and Data Masking for Hadoop

IBM Security QRadar Version WinCollect User Guide V7.2.2

Integrate Websense Web Security Gateway (WSG)

Configuration Information

Deploying Windows Streaming Media Servers NLB Cluster and metasan

FioranoMQ 9. High Availability Guide

DiskPulse DISK CHANGE MONITOR

Administrator s Guide

RSA SecurID Ready Implementation Guide

Eucalyptus User Console Guide

IBM Security QRadar SIEM Version MR1. Vulnerability Assessment Configuration Guide

Deployment Guide: Unidesk and Hyper- V

2X ApplicationServer & LoadBalancer Manual

Administrator s Guide

McAfee SMC Installation Guide 5.7. Security Management Center

WhatsUp Gold v16.3 Installation and Configuration Guide

PHD Virtual Backup for Hyper-V

Resonate Central Dispatch

enicq 5 System Administrator s Guide

Sophos for Microsoft SharePoint startup guide

Kaseya Server Instal ation User Guide June 6, 2008

Extreme Networks Security WinCollect User Guide

Introduction to the EIS Guide


FileMaker Server 13. Getting Started Guide

Secure Web Appliance. SSL Intercept

AlienVault. Unified Security Management (USM) 5.1 Running the Getting Started Wizard

Application Server Installation

STARTER KIT. Infoblox DNS Firewall for FireEye

NETWRIX EVENT LOG MANAGER

Test Case 3 Active Directory Integration

Uptime Infrastructure Monitor. Installation Guide

escan SBS 2008 Installation Guide

Deployment Guide for Citrix XenDesktop

NS DISCOVER 4.0 ADMINISTRATOR S GUIDE. July, Version 4.0

Load Balancing Microsoft Sharepoint 2010 Load Balancing Microsoft Sharepoint Deployment Guide

VMware vcenter Log Insight Getting Started Guide

WebSpy Vantage Ultimate 2.2 Web Module Administrators Guide

PRIMEQUEST Integration

IBM Security QRadar SIEM Version MR1. Administration Guide

WildFire Reporting. WildFire Administrator s Guide 55. Copyright Palo Alto Networks

WEBTITAN CLOUD. User Identification Guide BLOCK WEB THREATS BOOST PRODUCTIVITY REDUCE LIABILITIES

VMware vcenter Log Insight Getting Started Guide

IBM. Implementing SMTP and POP3 Scenarios with WebSphere Business Integration Connect. Author: Ronan Dalton

Specify the location of an HTML control stored in the application repository. See Using the XPath search method, page 2.

LT Auditor Windows Assessment SP1 Installation & Configuration Guide

Sisense. Product Highlights.

Deploying Microsoft Operations Manager with the BIG-IP system and icontrol

Using HP Systems Insight Manager to achieve high availability for Microsoft Team Foundation Server

This presentation covers virtual application shared services supplied with IBM Workload Deployer version 3.1.

McAfee Enterprise Security Manager 9.3.2

HDA Integration Guide. Help Desk Authority 9.0

After you have created your text file, see Adding a Log Source.

Secure Web Appliance. Reverse Proxy

Emerald. Network Collector Version 4.0. Emerald Management Suite IEA Software, Inc.

Using RADIUS Agent for Transparent User Identification

How To - Implement Single Sign On Authentication with Active Directory

intertrax Suite resource MGR Web

Load balancing Microsoft IAG

Team Foundation Server 2012 Installation Guide

SSL VPN. Virtual Appliance Installation Guide. Virtual Private Networks

Introducing the Microsoft IIS deployment guide


Dell SupportAssist Version 2.0 for Dell OpenManage Essentials Quick Start Guide

Tracking Network Changes Using Change Audit

Cloudera Manager Training: Hands-On Exercises

Getting Started with HC Exchange Module

System Administration Training Guide. S100 Installation and Site Management

Installation Guide for Windows May 2016

Symantec Endpoint Protection Shared Insight Cache User Guide

ITA Mail Archive Setup Guide

Sage CRM Connector Tool White Paper

Installing GFI MailSecurity

Configuration Information

Implementation notes on Integration of Avaya Aura Application Enablement Services with Microsoft Lync 2010 Server.

11.1. Performance Monitoring

Transcription:

Integrating QRadar with Hadoop A White Paper Ben Wuest Research & Integration Architect benwuest@ca.ibm.com Security Intelligence Security Systems IBM April 16 th, 2014

2 OVERVIEW 3 BACKGROUND READING 4 PRE- REQUISITE SOFTWARE 4 DATA FORMAT 4 CUSTOM PROPERTIES 5 CONFIGURING THE DATA LINK 5 QRADAR TO HADOOP DATA FLOW CONFIGURATION 5 JSON Enablement on for QRadar 5 QRadar Routing Rules and Forwarding Destinations 6 Routing Rules 6 Forwarding Destinations 6 Flume Receiver/Agent Configuration 7 Agent Definition 7 Source Definition 8 Channel Definition 8 Sink Definition 9 HADOOP TO QRADAR DATA FLOW CONFIGURATION 9 API Services Role Configuration 9 Authorized Service Configuration 10 DEPLOYMENT CONSIDERATIONS 11 DATA STORAGE CONFIGURATION 11 Deployment Example 11 MULTI- AGENT FLUME CONFIGURATIONS 11 FORWARDING PROFILES 12 TECHNICAL DETAILS 12 EDITING THE FORWARDING PROFILE 12 Header Attributes 13 Name and Custom Property Configuration 13 Pre- amble 13 ISO Time Formatting 13 Field Properties 14 WORKING WITH INSIGHTS IN QRADAR CORRELATION ENGINE 14 BIG INSIGHTS DATA ACCESS 16 JAQL AND QRADAR DATA 16 BIG SHEETS AND QRADAR DATA 16 USE CASES 18 ESTABLISHING BASELINE 18 ADVANCED PERSISTENT THREAT DETECTION 18 Beaconing 18 Data Leakage 19 DOMAIN ATTRIBUTION 19 QRADAR BIG INSIGHTS RPM 20 APPENDIX 21 JSON DATA FIELDS 21

3 Overview The world of Security Intelligence is evolving. In today s security picture organizations are looking to identify linkages and patterns in their organizational data. This data includes more than just cyber data. This type of deep analysis requires the offline processing and data- flows that are enabled by a Hadoop environment. The integration of the QRadar Security Information Event Management System (SIEM) with a Hadoop environment provides a framework for performing these types of analyses. This integration (outlined in Fig 1.0) includes simple connectors to allow normalized and enriched data to flow from QRadar to a Hadoop based platform and for insights to flow back. These insights can then be considered in the advanced real- time correlation engine of the QRadar SIEM. Fig 1.0: Integration Overview In this integration, analytics are performed on the Big Data platform. These are analytics that combine traditional cyber data sources with non- traditional feeds, such as social media, external threat feeds, domain registration data, web site feeds etc. In this paper you will find details on the interoperable data format, data link configuration, deployment considerations, advanced profile manipulation and details on some example use cases to consider when integrating the QRadar SIEM with a Hadoop Environment.

4 Background Reading A good understanding of QRadar and Hadoop is important to the information provided in this document. Please refer to IBM documentation on QRadar and to your Hadoop specific implementation documentation. The open source sites are great sources of information for this. Pre- requisite Software The software pre- requisites for this document include: a. IBM QRadar SIEM v7.2 Maintenance Release 1 patched to v7.2 Maintenance Release 1 Patch 3 (With Interim Fix 1); and b. IBM InfoSphere Big Insights 2.1.0.0 (for any Big Insights specific applications); and c. Apache Flume 1.3 or higher (1.4 is preferred but some Hadoop Flavors do not currently support this version) Data Format Central to the integration is a JSON Data format; which is the data format exported from QRadar on the data link (over TCP Syslog). This format is a simple JSON key- value pair format. It contains all the normalized and enriched fields that are extracted by QRadar. An example even data record is shown below: {"category": "Misc POST Request ", "credibility": "5", "devtimeepoch": "1391648638000", "devtimeiso": "2014-02-05T21:03:58.000-04:00", "dst": "192.168.18.13", "dstiploc": "other", "dstnetname": "Net-10-172- 192.Net_192_168_0_0", "dstport": "80", "dstpostnatport": "0", "dstprenatport": "0", "eventdescription": "An HTTP POST request was issued but there is no available s-action information.", "eventname": "Misc POST Request","hasIdentity": "false", "hasoffense": "false", "highlevelcategory": "Access", "isotimeformat": "yyyy-mmdd\'t\'hh:mm:ss.sssz", "logsource": "BlueCoat", "logsourcegroup": "Other", "logsourcetype": "Bluecoat SG Appliance", "lowlevelcategory": "Misc Network Communication Event", "name": "json_default_profile", "payload": "<182>Feb 05 21:03:58 10.1.1.2 \"[05/Feb/2014 21:03:58-0500]\" 1 bluecoat.proxysg.test.com REYESA - - OBSERVED \"none\" - 0 - POST - http 216.155.194.147 80 / - - \"Mozilla/5.0\" 192.168.18.13 0 687 - \n", "protocolid": "255", "relevance": "6", "sev": "3", "src": "10.1.1.2", "srciploc": "other", "srcnetname": "Net-10-172-192.Net_10_0_0_0", "srcport": "0", "srcpostnatport": "0", "srcprenatport": "0", "starttimeepoch": "1391630555106", "starttimeiso": "2014-02- 05T16:02:35.106-04:00", "storagetimeepoch": "1391630555106", "storagetimeiso": "2014-02-05T16:02:35.106-04:00", "type": "Event", "usrname": "REYESA", "version": "1.0" } For a complete record of all the values available in this format see Appendix A.

5 Custom Properties In addition to the values defined in Appendix A, the data format supports Custom Properties defined in QRadar. Only Custom Properties that are have the option: optimize parsing for rules, reports, and searches will be included in the data that is sent to Hadoop. For more information on defining Custom Properties, please refer to the QRadar documentation. When designing analytics it is important to decide whether or not to use a Custom Property. It may be more valuable to perform the further parsing and analytics in the offline processing of the Hadoop Cluster. Each field must be looked at on a case- by- case basis. Configuring the Data Link The data link between QRadar and a Hadoop environment is established through some configuration on both sides. One configuration dictates how to send the data from QRadar to Hadoop and the other configuration is for the QRadar Platform API to consume insights back from the big data platform. QRadar to Hadoop Data Flow Configuration Configuring the data flow between QRadar to Hadoop involves the configuration of the following components: a. JSON Forwarding Enablement; and b. QRadar Routing Rules and Destinations c. Flume Receiver Configuration (Hadoop Cluster). JSON Enablement on for QRadar By default, QRadar v7.2 mr1 p3 does not expose the JSON option. In order to enable the following option (see below) must be added to the following files on the QRadar Console of the given deployment: 1. /opt/qradar/conf/nva.conf 2. /store/configservices/staging/globalconfig/nva.conf Line to add: FORWARDING_DESTINATION_JSON_FORMAT_ENABLE=true No deployment is necessary after this activity is complete. The system will pick up this change. However, if you have browser windows open to the Routing Rules and Destinations; you must close them and re- open them for the system to pick up the changes.

6 QRadar Routing Rules and Forwarding Destinations Routing Rules and Forwarding Destinations allow data to flow out QRadar. This section will go over the basics for defining routing rules and destinations. Routing Rules and Destinations can be found in the QRadar Admin Tab (see Fig 1.0). Fig 2.0 Admin Panel Routing Rules Routing Rules allow the user to filter data from the QRadar system and direct it to a given destination. Routing Rules developed for forwarding to a Hadoop Environment must have the following properties: a. They must have a mode of offline. b. In the routing forward options they should be directed to a forwarding destination that has been setup for your Big Data Cluster. Forwarding Destinations Forwarding Destinations in QRadar allow the user to specify a destination for data along with a format, port and protocol. Forwarding Destinations for a Hadoop Based Cluster require the following properties: a. The Event Format must be set to JSON; and b. The Protocol must be TCP; and c. The Destination Address and Port must be that of the corresponding Flume Receiver on the Hadoop Cluster; and d. The check for prefixing a syslog header if it is missing or invalid should be checked.

7 Flume Receiver/Agent Configuration Apache Flume (http://flume.apache.org) is currently the recommended approach for receiving data from QRadar into a Hadoop Big Data Cluster. The general approach is to develop a series of sources, channels and sinks that write data to HDFS in the Hadoop Cluster. For larger deployments, a complex network of Flume Agents is generally required on a set of dedicated hardware outside the Big Data Cluster. This configuration is dependent upon how the data is being stored into HDFS and how much data is being forwarded. We will touch upon this in the Example Deployment Section (later in this document). This section will show how a single Flume Receiver is configured. For more in- depth details on how Flume Receivers can be configured and their accompanying options please refer to the Flume Documentation. As indicated at the beginning of the paper, Flume 1.4 is recommended for the following descriptions. A basic flume receiver/agent consists of sources, channels and sinks. The Flume Users Guide (http://flume.apache.org/flumeuserguide.html) documents more complex configurations. Flume Agents are defined in a configuration file and are started through the following command: flume-ng agent -n $agent_name -c conf -f $agent.conf In the above command $agent_name is name of the Flume Agent and $agent.conf is the full path to the configuration file that specifies the configuration of the receiver. A full configuration file is quite large so we will dissect the components of a sample agent configuration file here: a. Agent Definition b. Source Definition c. Channel Definition d. Sink Definition The following sections will go into basic detail of these definitions, for a more in- depth understanding of these definitions, please consult the flume documentation. Agent Definition Below is the agent definition excerpt from a basic flume configuration. It defines that the agent QRadarAgent will have: a. A source qradar ; and b. A channel FileChannel ; and c. A sink HDFS ; and d. The qradar source and the HDFS Sink will both use the channel File Channel.

8 This is the basic architecture of our flume agent. #Syslog TCP Source from QRadar # QRadar Source Channel and Sink QRadarAgent.sources = qradar QRadarAgent.channels = FileChannel QRadarAgent.sinks = HDFS # QRadar File Channel QRadarAgent.sources.qradar.channels = FileChannel QRadarAgent.sinks.HDFS.channel = FileChannel Source Definition The source for this example receiver defines a syslogtcp receiver listening on port 5555 on bigdatahost.com. If we recall back to the section on the forwarding destination configuration in QRadar, this host and port will be the same as what is specified in the Forwarding Destination. The event size is important or there will be truncated data. If the QRadar Cluster is collecting events with larger payloads, it is recommended that the event Size be adjusted accordingly. # Configuration for the QRadar Single Port Source QRadarAgent.sources.qradar.type = syslogtcp QRadarAgent.sources.qradar.port = 5555 QRadarAgent.sources.qradar.portHeader = port QRadarAgent.sources.qradar.host = bigdatahost.com QRadarAgent.sources.qradar.eventSize = 25000 # Flume File Stamp Interceptor QRadarAgent.sources.qradar.interceptors = i3 QRadarAgent.sources.qradar.interceptors.i3.type = TIMESTAMP Channel Definition A channel can be on disk or in memory. It is recommended that for configuring data from QRadar to a Hadoop based cluster that a file channel be deployed. Below is a sample file based channel in our agent configuration file. It specifies the type file and a checkpoint and data directory. It is recommended for the best performance that these directories be on different disks. # Each channel's type is defined. QRadarAgent.channels.FileChannel.type = file QRadarAgent.channels.FileChannel.checkpointDir = $CheckPointDir QRadarAgent.channels.FileChannel.dataDirs=$DataDir QRadarAgent.channels.FileChannel.transactionCapacity = 500000

9 Sink Definition The below sink definition defines a HDFS sink that is stored in compressed form. The path and the roll Size allow the data to be stored in different directories in HDFS. This document will touch upon the various ways to land data in HDFS in the Example Deployment section. It is recommended because of the heaviness (size) of the JSON format that some sort of compression be deployed when landing data on the QRadar cluster. ## sink properties ## hdfs sinks properties QRadarAgent.sinks.HDFS.type = hdfs QRadarAgent.sinks.HDFS.hdfs.writeFormat = Text QRadarAgent.sinks.HDFS.hdfs.codeC = cmx QRadarAgent.sinks.HDFS.hdfs.fileType = CompressedStream QRadarAgent.sinks.HDFS.hdfs.rollInterval = 0 QRadarAgent.sinks.HDFS.hdfs.idleTimeout = 120 QRadarAgent.sinks.HDFS.hdfs.batchSize = 50000 QRadarAgent.sinks.HDFS.hdfs.txnEventMax = 50000 # not roll based on file size QRadarAgent.sinks.HDFS.hdfs.rollSize = 4194304000 # not roll based on number of events QRadarAgent.sinks.HDFS.hdfs.rollCount = 0 QRadarAgent.sinks.HDFS.hdfs.filePrefix = host-%y%m%d%h00 QRadarAgent.sinks.HDFS.hdfs.fileSuffix=.cmx QRadarAgent.sinks.HDFS.hdfs.path = hdfs://host:9000/%y-%m-%d/%h00 Hadoop to QRadar Data Flow Configuration The QRadar platform accepts data from Hadoop through its Platform Reference Set API; which provides and interface to manipulating the reference data stored in the QRadar Cluster. Documentation for the reference data API can be found by logging into the QRadar Cluster as admin and accessing: https://<consoleip>/restapi/doc. The data returned to QRadar, from a Big Security point of view,is information gleaned from the Offline Batch Analytics performed on the Hadoop Platform. In order to use the API a role and an authorized service must be configured on the QRadar Platform. API Services Role Configuration To configure an API Role, access the Users icon in the User Management section of the admin panel. Create a new role: APIRole that has the API and Reference Data API Options Checked (see Figure 3.0). Perform a deploy after this.

10 Fig 3.0: API Role Configuration Authorized Service Configuration An authorized service is required to authorize applications (in this case a Hadoop Workflow or the Big Insights Publishing Application) to publish data through the QRadar Reference Set Platform API to QRadar. In the QRadar admin panel access the Authorized Services Section. In here, you will need to add an Authorized Service: API Service with a user Role of the API Role that was previously created and set an expiry (Expiry is dependent on your use case). Fig 4.0: Authorized Service Creation

11 Deployment Considerations In this section, we will talk about a number of deployment considerations when configuring a QRadar Cluster with a Hadoop Based Cluster. Things to consider: a. Data Storage in HDFS b. Multi- Agent Flume Configuration Data Storage Configuration As this document has shown, a Flume Sink Guides how data is landed in the Big Data Cluster. There are a number of parameters on a Flume Sink that guide: a. The maximum size of data files in HDFS; and b. How the directory structure is formed (Date Parameters). To make sense of this configuration, one has to develop a custom architecture of flume agents, routing rules and forwarding destinations. The routing rules will define what data from which event processors are sent and the flume agents define how the data is landed in HDFS. The best way to describe this is through an example. Deployment Example In this example we will use QRadar deployment consisting of a console and three (3) Event Processors. In this fictitious scenario the three EPs are all collecting Windows and Bluecoat logs. In this example, the data will be stored in HDFS separated by device group type. This will entail: a. Defining Two (2) Flume Agents for each of the group types (Windows and Blue Coat for this particular example); and b. Defining Two (2) Forwarding Destinations for each group type to route data to the corresponding flume agent; and c. Defining two (2) routing rules for each group on each Event Processor (EP) (a total of 6 routing rules) to route the data to the appropriate destination. Multi- Agent Flume Configurations For large QRadar deployments, the sheer amount of data to be transferred from a QRadar cluster to a Hadoop environment requires a series of flume relay machines to route the data to HDFS. Some QRadar deployments can consume over 100,000 events per second (eps) and the feasibility of having all the Event Processors in the QRadar cluster send all their data to the Hadoop Clusters console is simply not practical. In these situations, a series of flume agents need to be

12 deployed on dedicated hardware outside the Big Data Cluster to collect and relay the sink the data appropriately to the cluster. Forwarding Profiles As was shown earlier, in configuring the data link, the Forwarding Destinations contain the directions for the type of data to send, where to send it and the protocol to use. Behind the scenes, every forwarding destination has a forwarding profile. These profiles direct what data is sent from the overall subset of available attributes (see Appendix A) and some properties regarding how it is sent. This allows for some highly configurable data feeds from QRadar. Currently the configuration of this profile is something that has to be done with shell access to the QRadar Cluster console. Technical Details In QRadar v7.2 mr1 p3, a forwarding profile for each destination is initialized in the /opt/qradar/conf/ directory. These files are will appear named as forwardingprofile_json.xml.1, forwardingprofile_json.xml.2, etc. The number at the end of the file (in /opt/qradar/conf) corresponds to the internal id of the destination in the corresponding postgres table: selectivefwd_destination. In order to find out the corresponding file for a given destination this table has to be queried by name. For example, if the destination is named mydestination, the corresponding database query would be: select id from selectivefwd_destination where name = mydestination ; If the above query returned 2 as the respective id then the forwarding profile for that destination would reside in /opt/qradar/conf/forwardingprofile_json.xml.2. Because this is a backend modification, you must first disable the corresponding destination before editing it s forwarding profiles. Once you are finished, enable the destination and the forwarding profile changes should be picked up. Editing the Forwarding Profile The XML Forwarding Profile allows you to configure: a. The name of the forwarding profile; and b. Whether custom properties are sent (currently all or none); and c. A pre- amble for each event (this is provided for syslog receivers that are particular about the format of the syslog header); and d. The time format for any ISO time field; and

13 e. Properties for any of the available fields to send. Header Attributes The Header Attributes include (a- d above) and are all the attributes on a profile that apply to the profile as a whole. Name and Custom Property Configuration Below is an excerpt from a forwarding profile that shows how the profile name and the custom properties settings can be controlled. <profile name="myprofile" enabled="true" version="1.0" includeallcustomproperties="true"> </profile> Pre- amble The preamble is controlled by it s own xml element (see below). By default the preamble is set to a priority code 01 and the string hostname which simulates a host name. The pre- amble is static and is here to support syslog receivers, which may require additional formatting. <preamble data="<01>- hostname "></preamble> ISO Time Formatting There are a number of fields, which are formatted in ISO time. The profile allows for a change to global formatting of time (see below). This should be only changed when absolutely necessary. This is because the system is optimized to send the time formats in the default form. Another important point is that if the ISO format data time fields are not required, it is recommended that these fields are disabled to improve the overall performance of the data link. <attribute tag="isotimeformat" enabled="true" name="isotimeformat" defaultvalue="yyyy-mm-dd't'hh:mm:ss.sssz" enabledefaultvalue="false"> </attribute>

14 Field Properties For each field there are the following configurable options in the forwarding profile (see sample field element below): a. Enabled / Disabled (send or don t send); and b. The name of the field to use in the JSON feed (tag will correspond to the available attributes) but the name field controls how it is names; and c. Default Value (the default value to use for the field if is null or not); and d. Whether the Default Value is enabled (enabledefaultvalue). This controls whether to send a default value if one is not present. <attribute tag="category" enabled="true" name="category" defaultvalue="" enabledefaultvalue="false"> </attribute> Working with Insights in QRadar Correlation Engine The data that is returned to QRadar is in some form of reference data. Reference data can data can be used with in the correlation engine in QRadar to identify new alerts based off these insights. This section will go through a small demonstration of this. In this example, the Hadoop Environment has already published a set of user names into a reference set in the QRadar system. This data can be made useful by writing a correlation rule that looks for events containing those user names. This response for these rules can be to adjust the severity of these events or to actually report an incident depending on the actual severity level considered for the users in this list. To create an event rule based on a reference set you can use built in tests in the Rule Wizard that apply to reference data. You can access these in Test Group: Event Property Tests. The below example (Fig 5.0) shows adding the Event Property Test: when any of these event properties are contained in any of these reference set(s).

15 Fig 5.0: Rule Wizard with Reference Set This test can be combined with other tests or left alone. With this single test, the rule will simply perform the actions specified for the given rule when an event enters the system containing a user name in the riskyusers reference set. If over time the Hadoop Environment updates the set, the rule will dynamically pick up these changes. In this fashion, analytics can constantly be running in an offline fashion driving insights into the QRadar SIEM and correlation rules will dynamically adapt to the new data. For more information on the QRadar Correlation engine please consult the QRadar Documentation.

16 Big Insights Data Access For Big Insights Hadoop environments there are a couple of easy ways to access the data that has been received from QRadar. This section will look at a couple of simple methods using Jaql and Big Sheets. More information on these Big Insights Features can be found in the Big Insights Documentation. JAQL and QRadar Data JAQL is primarily a query language for JavaScript Object Notation (JSON), but it supports more than just JSON. This makes it a nice fit for the data coming from a QRadar System. The below script is a very simple example of how you can open a QRadar Data File in Big Insights. jsonlines = fn(location) lines (location, inoptions = { converter: " com.ibm.jaql.io.hadoop.converter.fromjsontextconverter"}); events = read(jsonlines("<full HDFS FILE PATH>")); events -> top 1; quit; The above script can be executed using the following command from the prompt (logged in as biadmin): /opt/ibm/biginsights/jaql/bin/jaqlshell -b sample.jaql This should output the first event in the file you have specified. Big Sheets and QRadar Data Big Sheets within Big Insights can also be used to access data received from the QRadar System. To view any file in Big Sheets click on the file in the file browser. On the right click on Sheets radio button (See Fig 6.0 below)

17 Fig 6.0: Big Sheets Access At this point the data will show up as one column, where you should be able to see each JSON record in its entirety. To split the records up into their appropriate columns, click on the pencil next to text Line Reader on the left and select the JSON Object Robust Reader (see figure 7.0 below). Fig 7.0: JSON Object Robust Reader At this point, the data is in a spreadsheet format that you can save as a Master Workbook and be able to perform any operation available in the Big Sheets Toolbox.

18 Use Cases To this point, this paper has talked technically about integrating QRadar with a Hadoop Environment. There are a number of use cases surrounding this. They all boil down to the need to consider non- traditional (with respect to cyber) information into the Security Picture. This section talks about example use cases for: a. Establishing Baseline b. Advanced Persistent Threat Detection (Beaconing, Leakage) c. Domain Attribution These use cases are meant to serve as examples of what can be accomplished in Hadoop. Central to these examples are whether they provide value or not to the Security Picture of given enterprise. There is a considerable amount of planning involved to identify the data sources and analytics to support identified security use cases. In addition, every use case will have an accompanied data workflow and model. Establishing Baseline A classic use case for working with QRadar data in Hadoop involves base lining behavior. The Hadoop cluster provides a container for long- term information storage and with that trends can be detected over long periods. These trends can be established using long running map reduce jobs to understand things like: a. User Behavior; and b. Network Activity on Identified Assets; and c. Domain Attribution. These are just a list of a few examples. Because of the diverse programming capabilities of a Hadoop Based platform, the data scientist has the ability to incorporate various clustering / machine learning capabilities on the normalized cyber data received from QRadar. Advanced Persistent Threat Detection Advanced Persistent Threats (APTs) are complicated to detect. There isn t a single formula for detecting every APT out there but the Hadoop platform provides a vehicle for analyzing data points over long periods of time to understand patterns in behavior. A couple of classic examples of this are Beaconing and Data Leakage. Beaconing The classic example of beaconing is analyzing the traffic over time to identify local sources that are talking remotely periodically (hourly, daily etc.) with a very small

19 amount of bytes (less than 512 bytes). These attributes are common in command and control domains. The malware is calling home periodically. These small communications can be missed in a real- time system but using offline batch analytics they can be identified. Data Leakage Another case similar to beaconing is data leakage. Instead of transferring data out of the organization, data is trickled consistently from one local source. This can be identified by looking for consistent transmissions on a local source to remote destination over time. Domain Attribution The domain attribution use case is the workflow that is deployed in the Big Insights specific sample RPM. The workflow for this use case is outlined in the diagram below. This workflow sets up a base for correlating data from QRadar with external Domain Registration data feeds. The end result is to feed insights back to the QRadar system on identified Risky Domains, IPs and Users that can be utilized by the correlation engine and other places in the Security Intelligence Platform. Fig 8.0: Domain Attribution Workflow

20 In this use- case domain data is extracted from various sources. Domain registration data for these domains is looked up using an external service. Risk Models can be developed on this data. These risk models can look at attributes like: a. Domain Age; and b. Correlation between Domain Data with known black lists; and c. How often data on this domain is changing. Once risky domains are identified from the traffic flow, they are cross- referenced with the immediate browsing history to extract a set of Risky Users and IPs. The Insights Data Publisher (installed with the Big Insights RPM) is then called upon to publish this data back to QRadar. QRadar Big Insights RPM This document has detailed all the manual steps for setting up the communication channel between QRadar and a Hadoop based system. There is an RPM available for Big Insights Clusters that will automatically perform the following actions: a. Setup one Flume Channel between QRadar Cluster and Big Insights (For complex clusters it is recommended that this step is skipped and architecture be followed in this document); and b. Setup and install the Big Insights Publishing Application with all the required credentials; and c. Install a sample workflow for Domain Registry Analysis (see readme on the RPM for more details). Please see the accompanying documentation with QRadar Big Insights RPM for details on installation and configuration.

21 APPENDIX JSON Data Fields NAME ENABLED TYPE DEFAULT VALUE DEFAULT ENABLED category TRUE COMMON FALSE protocolid TRUE COMMON FALSE sev TRUE COMMON FALSE src TRUE COMMON 0.0.0.0 FALSE dst TRUE COMMON 0.0.0.0 FALSE srcport TRUE COMMON FALSE dstport TRUE COMMON FALSE usrname TRUE COMMON FALSE relevance TRUE COMMON FALSE credibility TRUE COMMON FALSE storagetimeepoch TRUE COMMON FALSE storagetimeiso TRUE COMMON FALSE devtimeepoch TRUE EVENT FALSE devtimeiso TRUE EVENT FALSE srcprenat TRUE EVENT 0.0.0.0 FALSE dstprenat TRUE EVENT 0.0.0.0 FALSE srcpostnat TRUE EVENT 0.0.0.0 FALSE dstpostnat TRUE EVENT 0.0.0.0 FALSE srcmac TRUE EVENT 0:0:0:0:0:0 FALSE dstmac TRUE EVENT 0:0:0:0:0:0 FALSE srcprenatport TRUE EVENT FALSE dstprenatport TRUE EVENT FALSE srcpostnatport TRUE EVENT FALSE dstpostnatport TRUE EVENT FALSE identsrc TRUE EVENT 0.0.0.0 FALSE identhostname TRUE EVENT FALSE identusername TRUE EVENT FALSE identnetbios TRUE EVENT FALSE identgrpname TRUE EVENT FALSE identmac TRUE EVENT 0:0:0:0:0:0 FALSE hasidentity TRUE EVENT false FALSE payload TRUE EVENT FALSE firstpackettimeepoch TRUE FLOW FALSE firstpackettimeiso TRUE FLOW FALSE flowtype TRUE FLOW FALSE cmpappid TRUE FLOW FALSE appid TRUE FLOW FALSE srcasnlist TRUE FLOW FALSE

22 dstasnlist TRUE FLOW FALSE srcbytes TRUE FLOW FALSE totalsrcbytes TRUE FLOW FALSE dstbytes TRUE FLOW FALSE totaldstbytes TRUE FLOW FALSE srcpackets TRUE FLOW FALSE totalsrcpackets TRUE FLOW FALSE dstpackets TRUE FLOW FALSE totaldstpackets TRUE FLOW FALSE srctos TRUE FLOW FALSE dsttos TRUE FLOW FALSE inputiflist TRUE FLOW FALSE outputiflist TRUE FLOW FALSE flowintidlist TRUE FLOW FALSE asymetric TRUE FLOW FALSE srcports TRUE FLOW FALSE srcips TRUE FLOW 0.0.0.0 FALSE dstports TRUE FLOW FALSE dstips TRUE FLOW 0.0.0.0 FALSE flowcnt TRUE FLOW FALSE srciploc TRUE COMMON other FALSE dstiploc TRUE COMMON other FALSE eventname TRUE EVENT FALSE lowlevelcategory TRUE EVENT FALSE highlevelcategory TRUE EVENT FALSE eventdescription TRUE EVENT FALSE srcassetname TRUE EVENT FALSE dstassetname TRUE EVENT FALSE protocolname TRUE EVENT FALSE logsource TRUE EVENT FALSE srcnetname TRUE EVENT FALSE dstnetname TRUE EVENT FALSE direction TRUE FLOW FALSE bias TRUE FLOW FALSE sourcedscp TRUE FLOW FALSE sourceprecedence TRUE FLOW FALSE destdscp TRUE FLOW FALSE destprecedence TRUE FLOW FALSE icmpcode TRUE FLOW FALSE icmptype TRUE FLOW FALSE sourcetcpflags TRUE FLOW FALSE applicationname TRUE FLOW FALSE