netflow-indexer Documentation Release 0.1.28 Justin Azoff May 02, 2012
CONTENTS 1 Installation 2 1.1 Install prerequisites............................................ 2 1.2 Install netflow-indexer.......................................... 2 2 Configuration 3 2.1 Example Configuration files....................................... 3 2.2 Cron................................................... 4 2.3 Daily compaction............................................. 4 3 Example Session 5 3.1 Indexing data............................................... 5 3.2 Searching the index........................................... 5 3.3 Specifying output columns........................................ 6 3.4 Dumping data.............................................. 6 4 API 7 4.1 Searching with the API.......................................... 7 4.2 Example................................................. 7 4.3 File metadata............................................... 8 4.4 Searching with pynfdump........................................ 8 5 Indices and tables 10 Index 11 i
netflow-indexer Documentation, Release 0.1.28 Netflow-indexer is a program that uses xapian to index the flat file databases used by nfdump or flow-tools. Contents: CONTENTS 1
CHAPTER ONE INSTALLATION 1.1 Install prerequisites netflow indexer uses the python xapian bindings. The IPy module is used for some subnet calculations to support CIDR searching. On debian you can install all the dependencies using: # apt-get install python-pip python-xapian xapian-tools python-ipy 1.2 Install netflow-indexer I recommend installing netflow-indexer into a virtual environment: # pip install -s -E /usr/local/python_env/ netflowindexer-0.1.9.tar.gz Then modify your path or source the activation script: # PATH=/usr/local/python_env/bin/:$PATH 2
CHAPTER TWO CONFIGURATION Netflow-indexer uses a small configuration file that setups the type of indexer to use and the location of the files on disk. It has the following settings: indexer - the type of indexer to use. nfdump, nfdump_full, flowtools, or flowtools_full dbpath - the path to save the indexes to fileglob - the shell glob that will expand to the flow data files for the current hour allfileglob - the shell glob that will expand to all flow data files pathregex - a regular expression or simple string used to extract metadata from flow file paths. 2.1 Example Configuration files 2.1.1 nfdump using full indexing(recommended) [nfi] indexer = nfdump_full dbpath = /data/nfdump_xap flowpath = /data/nfsen/profiles/live/podium fileglob = %(flowpath)s/nfcapd.%(year)s%(month)s%(day)s* allfileglob = %(flowpath)s/nfcapd.* pathregex = /profiles/:profile/:source/nfcapd 2.1.2 nfdump [nfi] indexer = nfdump dbpath = /data/nfdump_xap flowpath = /data/nfsen/profiles/live/podium fileglob = %(flowpath)s/nfcapd.%(year)s%(month)s%(day)s* allfileglob = %(flowpath)s/nfcapd.* pathregex = /profiles/:profile/:source/nfcapd 2.1.3 flow-tools 3
netflow-indexer Documentation, Release 0.1.28 [nfi] indexer = flowtools dbpath = /usr/local/var/db/flows/nfi flowpath = /usr/local/var/db/flows/packeteer fileglob= %(flowpath)s/%(year)s/%(year)s-%(month)s/%(year)s-%(month)s-%(day)s/ft-v05.%(year)s-%(month allfileglob = %(flowpath)s/*/*/*/ft-v05.* 2.2 Cron Netflow-indexer should be run from cron 5 minutes after every hour when using the nfdump indexer and every 5 minutes when using the nfdump_full indexer: MAILTO=root PATH=/usr/local/python_env/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 45 0 * * * cd /data/nfdump_xap/ &&./daily_compact > /dev/null */5 * * * * sleep 30;netflow-index-update /data/nfdump_xap/nfdump.ini 55 0 * * * netflow-index-cleanup /data/nfdump_xap/nfdump.ini -d 2.3 Daily compaction xapian allows you to compact an index for read-only use. Compaction yields disk usage and speed improvements. daily compaction is a work in progress 2.3.1 examples/daily_compact.sh #!/bin/sh DAY= date +"%Y%m%d" -d "60 minutes ago"./xap_compact ${DAY}.db 2.3.2 examples/xap_compact.sh #!/bin/sh orig="$1" tmp=tmp_$$.db tmp2=tmp2_$$.db if [ -e $orig/.compacted ] ; then exit 0 fi xapian-compact -F $orig $tmp && mv $orig $tmp2 && mv $tmp $orig && rm -rf $tmp2 && touch $orig/.compa 2.2. Cron 4
CHAPTER THREE EXAMPLE SESSION 3.1 Indexing data Tell the netflow indexer to index the current netflow files For this example I deleted todays index so it can be re-created netflow@nf:~$ netflow-index-update /data/nfdump_xap/nfdump.ini read /data/nfsen/profiles/live/podium/nfcapd.201205010000 in 2.4 seconds. 64501 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205010005 in 2.5 seconds. 70830 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205010010 in 3.8 seconds. 120925 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205010015 in 2.7 seconds. 65676 ips.... read /data/nfsen/profiles/live/podium/nfcapd.201205010240 in 1.3 seconds. 54040 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205010245 in 1.3 seconds. 52391 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205010250 in 1.2 seconds. 49993 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205010255 in 1.2 seconds. 52161 ips. Flush took 7.4 seconds.... read /data/nfsen/profiles/live/podium/nfcapd.201205011615 in 7.4 seconds. 159399 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205011620 in 7.1 seconds. 155225 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205011625 in 5.7 seconds. 110510 ips. Flush took 28.9 seconds. Running the indexer when more data is available does an incremental update: netflow@nf:~$ netflow-index-update /data/nfdump_xap/nfdump.ini read /data/nfsen/profiles/live/podium/nfcapd.201205011630 in 3.7 seconds. 110257 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205011635 in 3.7 seconds. 116742 ips. read /data/nfsen/profiles/live/podium/nfcapd.201205011640 in 4.2 seconds. 107927 ips. Flush took 7.0 seconds. netflow@nf:~$ netflow-index-update /data/nfdump_xap/nfdump.ini netflow@nf:~$ When performing an index for the first time you should use the full-index or -f option to index all the data. By default netflow-indexer only tries indexing files that match fileglob: netflow-index-update /data/nfdump_xap/nfdump.ini --full-index 3.2 Searching the index Search the index for 2011-04-18: 5
netflow-indexer Documentation, Release 0.1.28 # 59.124.163.60 is an address that just scanned us remote@nf:~$ time netflow-index-search /data/nfdump_xap/nfdump.ini /data/nfdump_xap/20110419.db 59.12 2011-04-19 05:35:00 2011-04-19 05:40:00 2011-04-19 05:45:00 2011-04-19 05:50:00 2011-04-19 05:55:00 2011-04-19 06:00:00 2011-04-19 06:05:00 2011-04-19 07:40:00 2011-04-19 07:45:00 2011-04-19 07:50:00 2011-04-19 07:55:00 real 0m0.072s This output shows that it was present in the index in 11 5 minute chunks. Searching the 30 day index takes only slightly longer and returns the same results: remote@nf:~$ netflow-index-search-all /data/nfdump_xap/nfdump.ini 59.124.163.60 Searching for an IP that does not exist in the index is very fast: remote@nf:~$ time netflow-index-search-all /data/nfdump_xap/nfdump.ini 9.254.9.254 real 0m0.097s 3.3 Specifying output columns netflow-index-search and netflow-index-search-all support a -c option which selects what columns should be output. By default only time is output. The other built-in field is filename. Additional fields are made available by using the pathregex configuration option. Columns can be selected by using two methods: -c time -c filename or: -c time,filename 3.4 Dumping data netflow-index-search and netflow-index-search-all support a -d option which automatically runs the appropriate netflow tool for you: remote@nf:~$ time netflow-index-search-all /data/nfdump_xap/nfdump.ini 59.124.163.60 -d head 2011-04-19 05:38:36.468 1.696 TCP 59.124.163.60:39432 -> 123.123.2.245:22 4 2011-04-19 05:38:36.468 1.776 TCP 59.124.163.60:50920 -> 123.123.2.246:22 4 2011-04-19 05:38:36.468 1.428 TCP 123.123.2.245:22 -> 59.124.163.60:39432 4 2011-04-19 05:38:36.472 0.828 TCP 59.124.163.60:36167 -> 123.123.2.247:22 3... You can also use the -f option to pass an additional filter: remote@nf:~$ netflow-index-search-all /data/nfdump_xap/nfdump.ini 59.124.163.60 -d -f not port 22 3.3. Specifying output columns 6
CHAPTER FOUR API 4.1 Searching with the API class netflowindexer.main.searcher(ini_file) Create a new searcher instance. Call with the path to the ini file list_databases() Return a list of database files in the dbpath directory search(database, ips, dump=false, filter=none, mode=none) Search a specific database file Parameters database The full path to a database file. ips a list of ip addresses to search for. dump if True dump the full netflow records, otherwise just the seen timeslots filter optional additional netflow search filter to be used when dump=true mode set to pipe to have nfdump list pipe delimited records search_all(ips, dump=false, filter=none, mode=none) Search all database files. Takes the same parameters as search() 4.2 Example The Searcher class can be used to search for records: >>> from netflowindexer import Searcher >>> s = Searcher("/spare/tmp/netflow/nfdump.ini") >>> print s.list_databases() [ /spare/tmp/netflow/20110408.db ] >>> for record in s.search_all([ 8.8.8.8 ]):... print record 2011-04-08 15:00:00 2011-04-08 15:05:00 2011-04-08 15:10:00 2011-04-08 15:15:00 2011-04-08 15:20:00... 7
netflow-indexer Documentation, Release 0.1.28 >>> for record in s.search_all([ 8.8.8.8 ], dump=true):... print record 2011-04-08 14:59:32.696 0.000 UDP 111.222.121.54:53241 -> 8.8.8.8:53 2 2011-04-08 14:59:32.708 0.028 UDP 8.8.8.8:53 -> 111.222.121.54:53241 2 2011-04-08 14:59:38.416 0.000 UDP 111.222.121.127:51528 -> 8.8.8.8:53 1 2011-04-08 14:59:38.396 0.000 UDP 8.8.8.8:53 -> 111.222.121.127:51528 1 2011-04-08 14:59:38.400 0.000 UDP 111.222.121.127:60043 -> 8.8.8.8:53 1 2011-04-08 14:59:38.368 0.000 UDP 8.8.8.8:53 -> 111.222.121.127:60043 1 2011-04-08 14:59:41.516 0.000 UDP 111.222.121.54:60128 -> 8.8.8.8:53 1 2011-04-08 14:59:41.516 0.000 UDP 111.222.121.54:63357 -> 8.8.8.8:53 1 4.3 File metadata Search results are actually an object. str() will return simply the time of the matching flow records, but there are other fields available: >>> for record in s.search_all([ 8.8.8.8 ]):... print repr(record) SearchResult(filename=/spare/tmp/netflow/nfcapd.201104081500, time=2011-04-08 15:00:00, profile=tmp) SearchResult(filename=/spare/tmp/netflow/nfcapd.201104081505, time=2011-04-08 15:05:00, profile=tmp) SearchResult(filename=/spare/tmp/netflow/nfcapd.201104081510, time=2011-04-08 15:10:00, profile=tmp) >>> for record in s.search_all([ 8.8.8.8 ]):... print record.time, record.profile 2011-04-08 15:00:00 tmp 2011-04-08 15:05:00 tmp 2011-04-08 15:10:00 tmp These field extractions are done via the pathregex configuration option. 4.4 Searching with pynfdump pynfdump 1 is another module I have written. You can easily use netflow indexer with pynfdump: >>> from netflowindexer import Searcher >>> import pynfdump >>> d=pynfdump.dumper() >>> s = Searcher("/spare/tmp/netflow/nfdump.ini") >>> records = s.search_all(["8.8.8.8"], dump=true, filter= dst port 53, mode= pipe ) >>> for rec in d.parse_search(records):... print rec[ dstip ], rec[ bytes ] 8.8.8.8 138 8.8.8.8 77 8.8.8.8 77 8.8.8.8 85 8.8.8.8 86 8.8.8.8 85 8.8.8.8 86 8.8.8.8 86 8.8.8.8 55 1 http://packages.python.org/pynfdump/ 4.3. File metadata 8
netflow-indexer Documentation, Release 0.1.28 8.8.8.8 55 8.8.8.8 68 The above example used netflowindexer to find all flows to 8.8.8.8, then used nfdump to filter it by dst port 53, and finally handed it off to pynfdump for parsing. 4.4. Searching with pynfdump 9
CHAPTER FIVE INDICES AND TABLES genindex search 10
INDEX L list_databases() (netflowindexer.main.searcher method), 7 S search() (netflowindexer.main.searcher method), 7 search_all() (netflowindexer.main.searcher method), 7 Searcher (class in netflowindexer.main), 7 11