Classifying malware using network traffic analysis. Or how to learn Redis, git, tshark and Python in 4 hours.



Similar documents
Redis 101. A whirlwind tour of the next big thing in NoSQL data storage PETER COOPER.

CS 145: NoSQL Activity Stanford University, Fall 2015 A Quick Introdution to Redis

Understanding the Top 5 Redis Performance Metrics

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1

Wireshark Deep packet inspection with Wireshark

Networks and Security Lab. Network Forensics

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Using Wireshark to Create Network-Usage Baselines

Version Control with. Ben Morgan

Oracle Data Miner (Extension of SQL Developer 4.0)

Lab 5: HTTP Web Proxy Server

A Tutorial Introduc/on to Big Data. Hands On Data Analy/cs over EMR. Robert Grossman University of Chicago Open Data Group

Large-Scale TCP Packet Flow Analysis for Common Protocols Using Apache Hadoop

Data Intensive Computing Handout 5 Hadoop

Spark. Fast, Interactive, Language- Integrated Cluster Computing

Attacking the TCP Reassembly Plane of Network Forensics Tools

Automating Linux Malware Analysis Using Limon Sandbox Monnappa K A monnappa22@gmail.com

Data Intensive Computing Handout 6 Hadoop

Recovering and Analyzing Deleted Registry Files

Log management with Logstash and Elasticsearch. Matteo Dessalvi

Tcpdump Lab: Wired Network Traffic Sniffing

Grid Data Management. Raj Kettimuthu

List of FTP commands for the Microsoft command-line FTP client

The Hadoop Eco System Shanghai Data Science Meetup

Scaling up = getting a better machine. Scaling out = use another server and add it to your cluster.

The Einstein Depot server

Data Intensive Computing Handout 4 Hadoop

Getting Started with Hadoop with Amazon s Elastic MapReduce

An Introduction to Incident Detection and Response Memory Forensic Analysis

EMPLOYEE TRAINING MANAGER USER MANUAL

Practice Exercise March 7, 2016

Setting up the Oracle Warehouse Builder Project. Topics. Overview. Purpose

Tutorial. Reference for more thorough Mininet walkthrough if desired

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

UFTP AUTHENTICATION SERVICE

High-Speed In-Memory Analytics over Hadoop and Hive Data

Exchange Migration Guide

TEST AUTOMATION FRAMEWORK

MyOra 3.5. User Guide. SQL Tool for Oracle. Kris Murthy

grep, awk and sed three VERY useful command-line utilities Matt Probert, Uni of York grep = global regular expression print

Integrating VoltDB with Hadoop

Xiaoming Gao Hui Li Thilina Gunarathne

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Internet Information Services Agent Version Fix Pack 2.

Threat Advisory: Trivial File Transfer Protocol (TFTP) Reflection DDoS

Hadoop Distributed Filesystem. Spring 2015, X. Zhang Fordham Univ.

Integrating Big Data into the Computing Curricula

WildFire Cloud File Analysis

Powering Monitoring Analytics with ELK stack

Creating a remote command shell using default windows command line tools

Networks & Security Course. Web of Trust and Network Forensics

: Application Layer. Factor the Content. Bernd Paysan. EuroForth 2011, Vienna

MarkLogic Server. Node.js Application Developer s Guide. MarkLogic 8 February, Copyright 2016 MarkLogic Corporation. All rights reserved.

MarkLogic Server. MarkLogic Connector for Hadoop Developer s Guide. MarkLogic 8 February, 2015

netflow-indexer Documentation

How To Gather Log Files On A Pulse Secure Server On A Pc Or Ipad (For A Free Download) On A Network Or Ipa (For Free) On An Ipa Or Ipv (For An Ubuntu) On Your Pc

PROCESSES LOADER 9.0 SETTING. Requirements and Assumptions: I. Requirements for the batch process:

Introduction to Shell Programming

Using Redis as a Cache Backend in Magento

Big data variety, 179 velocity, 179 volume, 179 Blob storage containers

Assignment # 1 (Cloud Computing Security)

Log Analyzer Reference

Tutorial 0A Programming on the command line

Table of Contents Introduction Supporting Arguments of Sysaxftp File Transfer Commands File System Commands PGP Commands Other Using Commands

System Administration Training Guide. S100 Installation and Site Management

Job Reference Guide. SLAMD Distributed Load Generation Engine. Version 1.8.2

Call Recorder Quick CD Access System

D&D of malware with exotic C&C

FTP Client Engine Library for Visual dbase. Programmer's Manual

FEEG Applied Programming 3 - Version Control and Git II

Cloudera Certified Developer for Apache Hadoop

To reduce or not to reduce, that is the question

Exercise 7 Network Forensics

Grandstream Networks, Inc. UCM6100 Series IP PBX Appliance CDR and REC API Guide

Big Data and Scripting map/reduce in Hadoop

6. INTRODUCTION TO THE LABORATORY: SOFTWARE TOOLS

Open source framework for interactive data exploration in server based architecture

Teradata SQL Assistant Version 13.0 (.Net) Enhancements and Differences. Mike Dempsey

Hands-On UNIX Exercise:

Unix Scripts and Job Scheduling

Building a Python Plugin

Network Traffic Analysis

Introduc)on to Hadoop

nfdump and NfSen 18 th Annual FIRST Conference June 25-30, 2006 Baltimore Peter Haag 2006 SWITCH

q for Gods Whitepaper Series (Edition 1) Multi-Partitioned kdb+ Databases: An Equity Options Case Study

Lab 2 : Basic File Server. Introduction

CLC Server Command Line Tools USER MANUAL

"SQL Database Professional " module PRINTED MANUAL

SyncTool for InterSystems Caché and Ensemble.

Site Audit ( /site_audit) Generated on Fri, 22 Aug :14:

Complex Data and Object-Oriented. Databases

Extreme computing lab exercises Session one

Package hive. July 3, 2015

Chulalongkorn University International School of Engineering Department of Computer Engineering Computer Programming Lab.

TRIFORCE ANJP. THE POWER TO PROVE sm USER S GUIDE USER S GUIDE TRIFORCE ANJP VERSION 3.10

Data Tool Platform SQL Development Tools

Cloud Storage Service

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package Data Federation Administration Tool Guide

Spectrum Technology Platform. Version 9.0. Spectrum Spatial Administration Guide

AmbrosiaMQ-MuleSource ESB Integration

Data processing goes big

Transcription:

Classifying malware using network traffic analysis. Or how to learn Redis, git, tshark and Python in 4 hours. Alexandre Dulaunoy January 18, 2016

Problem Statement We have more 5000 pcap files generated per day for each malware execution in a sandbox We need to classify 1 the malware into various sets The project needs to be done in less than a day and the code shared to another team via GitHub 1 Classification parameters are defined by the analyst 2 of 32

File Format and Filename... 0580c82f6f90b75fcf81fd3ac779ae84.pcap 05a0f4f7a72f04bda62e3a6c92970f6e.pcap 05b4a945e5f1f7675c19b74748fd30d1.pcap 05b57374486ce8a5ce33d3b7d6c9ba48.pcap 05bbddc8edac3615754f93139cf11674.pcap 05bf1ff78685b5de06b0417da01443a9.pcap 05c3bccc1abab5c698efa0dfec2fd3a4.pcap... <MD5 hash of the malware).pcap MD5 values 2 of malware samples are used by A/V vendors, security researchers and analyst. 2 https://www.virustotal.com/ as an example 3 of 32

MapReduce and Network Forensic MapReduce is an old concept in computer science The map stage to perform isolated computation on independent problems The reduce stage to combine the computation results Network forensic computations can easily be expressed in map and reduce steps: parsing, filtering, counting, sorting, aggregating, anonymizing, shuffling... 4 of 32

Processing and Reading pcap files ls -1 parallel --gnu tcpdump -s0 -A -n -r {1} Nice for processing the files but... How do we combine the results? How do we extract the classification parameters? (e.g. sed, awk, regexp?) 5 of 32

tshark tshark -G fields Wireshark is supporting a wide range of dissectors tshark allows to use the dissectors from the command line tshark -E header=yes -E separator=, -Tfields -e ip.dst -r my 6 of 32

Concurrent Network Forensic Processing To allow concurrent processing, a non-blocking data store is required To allow flexibility, a schema-free data store is required To allow fast processing, you need to scale horizontally and to know the cost of querying the data store To allow streaming processing, write/cost versus read/cost should be equivalent 7 of 32

Redis: a key-value/tuple store Redis is key store written in C with an extended set of data types like lists, sets, ranked sets, hashes, queues Redis is usually in memory with persistence achieved by regularly saving on disk Redis API is simple (telnet-like) and supported by a multitude of programming languages http://www.redis.io/ 8 of 32

Redis: installation Download Redis 3.0.6 (stable version) tar xvfz redis-3.0.6.tar.gz cd redis-3.0.6 make 9 of 32

Keys Keys are free text values (up to 2 31 bytes) - newline not allowed Short keys are usually better (to save memory) Naming convention are used like keys separated by colon 10 of 32

Value and data types binary-safe strings lists of binary-safe strings sets of binary-safe strings hashes (dictionary-like) pubsub channels 11 of 32

Running redis and talking to redis... screen cd./src/ &&./redis-server new screen session (crtl-a c) redis-cli DBSIZE 12 of 32

Commands available on all keys Those commands are available on all keys regardless of their type TYPE [key] gives you the type of key (from string to hash) EXISTS [key] does the key exist in the current database RENAME [old new] RENAMENX [old new] DEL [key] RANDOMKEY returns a random key TTL [key] returns the number of sec before expiration EXPIRE [key ttl] or EXPIRE [key ts] KEYS [pattern] returns all keys matching a pattern (!to use with care) 13 of 32

Commands available for strings type SET [key] [value] GET [key] MGET [key1] [key2] [key3] MSET [key1] [valueofkey1]... INCR [key] INCRBY [key] [value]! string interpreted as integer DECR [key] INCRBY [key] [value]! string interpreted as integer APPEND [key] [value] 14 of 32

Commands available for sets type SADD [key] [member] adds a member to a set named key SMEMBERS [key] return the member of a set SREM [key] [member] removes a member to a set named key SCARD [key] returns the cardinality of a set SUNION [key...] returns the union of all the sets SINTER [key...] returns the intersection of all the sets SDIFF [key...] returns the difference of all the sets S...STORE [destkey key...] same as before but stores the result 15 of 32

Commands available for list type RPUSH - LPUSH [key] [value] LLEN [key] LRANGE [key] [start] [end] LTRIM [key] [start] [end] LSET [key] [index] [value] LREM [key] [count] [value] 16 of 32

Sorting SORT [key] SORT [key] LIMIT 0 4 17 of 32

Commands availabled for sorted set type ZADD [key] [score] [member] ZCARD [key] ZSCORE [key] [member] ZRANK [key] [member] get the rank of a member from bottom ZREVRANK [key] [member] get the rank of a member from top 18 of 32

Atomic commands GETSET [key] [newvalue] sets newvalue and return previous value (M)SETNX [key] [newvalue] sets newvalue except if key exists (useful for locking) MSETNX is very useful to update a large set of objects without race condition. 19 of 32

Database commands SELECT [0-15] selects a database (default is 0) MOVE [key] [db] move key to another database FLUSHDB delete all the keys in the current database FLUSHALL delete all the keys in all the databases SAVE - BGSAVE save database on disk (directly or in background) DBSIZE MONITOR what s going on against your redis datastore (check also redis-stat) 20 of 32

Redis from shell? ret=$(redis-cli SADD dns:${md5} ${rdata}) num=$(redis-cli SCARD dns:${md5}) Why not Python? import redis r = redis.strictredis(host= localhost, port=6379, db=0) r.set( foo, bar ) r.get( foo ) 21 of 32

How do you integrate it? ls -1./pcap/*.pcap parallel --gnu "cat {1} tshark -E header=yes -E separator=, -Tfields -e http.server python import.py -f {1} " Code need to be shared? 22 of 32

Structure and Sharing As you start to work on a project and you are willing to share it One approach is to start with a common directory structure for the software to be shared /PcapClassifier/bin/ -> where to store the code /etc/ -> where to store configuration /data/ README.md 23 of 32

import.py (first iteration) # Need to catch the MD5 filename of the malware ( from the pcap filename ) import argparse import sys argparser = argparse. ArgumentParser ( description = Pcap classifier ) argparser. add_argument ( -f, action = append, help = Filename ) args = argparser. parse_args () if args.f is not None : filename = args.f for line in sys. stdin : print ( line. split ("," )[0]) else : argparser. print_help () 24 of 32

Storing the code and its revision Initialize a git repository (one time) cd PcapClassifier git init Add a file to the index (one time) git add./ bin / import.py Commit a file to the index (when you do changes) git commit./ bin / import.py 25 of 32

import.py (second iteration) # Need to know and store the MD5 filename processed import argparse import sys import redis argparser = argparse. ArgumentParser ( description = Pcap classifier ) argparser. add_argument ( -f, action = append, help = Filename ) args = argparser. parse_args () r = redis. StrictRedis (host = localhost, port =6379, db =0) if args.f is not None : md5 = args.f [0]. split ("." )[0] r. sadd ( processed, md5 ) for line in sys. stdin : print ( line. split ("," )[0]) else : argparser. print_help () 26 of 32

Redis datastructure Now we have a set of the file processed (including the MD5 hash of the malware): {processed}={abcdef..., deadbeef..., 0101aba...} A type set to store the field type decoded from the pcap (e.g. http.user agent) {type}={ http.user agent, http.server } A unique set to store all the values seen per type {e : http.user agent}={ Mozilla..., Curl... } 27 of 32

import.py (third iteration) import argparse import sys import redis argparser = argparse. ArgumentParser ( description = Pcap classifier ) argparser. add_argument ( -f, action = append, help = Filename ) args = argparser. parse_args () r = redis. StrictRedis ( host = localhost, port =6379, db =0) if args.f is not None : md5 = args.f [0]. split ("." )[0] r. sadd ( processed, md5 ) lnumber =0 fields = None for line in sys. stdin : if lnumber == 0: fields = line. rstrip (). split (",") for field in fields : r. sadd ( type, field ) else : elements = line. rstrip (). split (",") i=0 for element in elements : try : r. sadd ( e: + fields [i], element ) except IndexError : print (" Empty fields ") i=i+1 else : lnumber = lnumber + 1 argparser. print_help () 28 of 32

How to link a type to a MD5 malware hash? Now, we have set of value per type {e : http.user agent}={ Mozilla..., Curl... } Using MD5 to hash type value and create a set of type value for each malware {v : MD5hex( Mozilla... )}={ MD5 of malware,... } 29 of 32

import.py (fourth iteration) import argparse import sys import redis import hashlib argparser = argparse. ArgumentParser ( description = Pcap classifier ) argparser. add_argument ( -f, action = append, help = Filename ) args = argparser. parse_args () r = redis. StrictRedis ( host = localhost, port =6379, db =0) if args.f is not None : md5 = args.f [0]. split ("." )[0] r. sadd ( processed, md5 ) lnumber =0 fields = None for line in sys. stdin : if lnumber == 0: # print ( line. split (",")[0]) fields = line. rstrip (). split (",") for field in fields : r. sadd ( type, field ) else : elements = line. rstrip (). split (",") i=0 for element in elements : try : r. sadd ( e: + fields [i], element ) ehash = hashlib. md5 () ehash. update ( element. encode ( utf -8 )) ehhex = ehash. hexdigest () if element is not "": r. sadd ( v: + ehhex, md5 ) except IndexError : print (" Empty fields ") i=i+1 lnumber = lnumber + 1 else : argparser. print_help () 30 of 32

graph.py - exporting graph from Redis import redis import networkx as nx import hashlib r = redis. StrictRedis (host = localhost, port =6379, db =0) g = nx. Graph () for malware in r. smembers ( processed ): g. add_node ( malware ) for fieldtype in r. smembers ( type ): g. add_node ( fieldtype ) for v in r. smembers ( e: +fieldtype. decode ( utf -8 )): g. add_node (v) ehash = hashlib. md5 () ehash. update (v) ehhex = ehash. hexdigest () for m in r. smembers ( v: + ehhex ): print (m) g. add_edge (v,m) nx. write_gexf (g,"/tmp /out.gexf ") 31 of 32

graph.py - exporting graph from Redis 32 of 32