NetStore: An Efficient Storage Infrastructure for Network Forensics and Monitoring

Size: px
Start display at page:

Download "NetStore: An Efficient Storage Infrastructure for Network Forensics and Monitoring"

Transcription

1 NetStore: An Efficient Storage Infrastructure for Network Forensics and Monitoring Paul Giura and Nasir Memon Polytechnic Intitute of NYU, Six MetroTech Center, Brooklyn, NY Abstract. With the increasing sophistication of attacks, there is a need for network security monitoring systems that store and examine very large amounts of historical network flow data. An efficient storage infrastructure should provide both high insertion rates and fast data access. Traditional row-oriented Relational Database Management Systems (RDBMS) provide satisfactory query performance for network flow data collected only over a period of several hours. In many cases, such as the detection of sophisticated coordinated attacks, it is crucial to query days, weeks or even months worth of disk resident historical data rapidly. For such monitoring and forensics queries, row oriented databases become I/O bound due to long disk access times. Furthermore, their data insertion rate is proportional to the number of indexes used, and query processing time is increased when it is necessary to load unused attributes along with the used ones. To overcome these problems we propose a new column oriented storage infrastructure for network flow records, called NetStore. NetStore is aware of network data semantics and access patterns, and benefits from the simple column oriented layout without the need to meet general purpose RDBMS requirements. The prototype implementation of NetStore can potentially achieve more than ten times query speedup and ninety times less storage size compared to traditional row-stores, while it performs better than existing open source columnstores for network flow data. 1 Introduction Traditionally intrusion detection systems were designed to detect and flag malicious or suspicious activity in real time. However, such systems are increasingly providing the ability to identify the root cause of a security breach. This may involve checking a suspected host s past network activity, looking up any services run by a host, protocols used, the connection records to other hosts that may or may not be compromised, etc. This requires flexible and fast access to network flow historical data. In this paper we present the design, implementation details and the evaluation of a column-oriented storage infrastructure called NetStore, designed to store and analyze very large amounts of network flow data. Throughout this paper we refer to a flow as an unidirectional data stream between two endpoints, to a flow record as a quantitative description of a flow, and to a flow ID as the key that uniquely identifies a flow. In our research the flow ID is S. Jha, R. Sommer, and C. Kreibich (Eds.): RAID 2010, LNCS 6307, pp , c Springer-Verlag Berlin Heidelberg 2010

2 278 P. Giura and N. Memon Fig. 1. Flow traffic distribution for one day and one month. In a typical day the busiest time interval is 1PM - 2PM with 4,381,876 flows, and the slowest time interval is 5AM - 6AM with 978,888 flows. For a typical month we noticed the slow down in week-ends and the peek traffic in weekdays. Days marked with * correspond to a break week. composed of five attributes: source IP, source port, destination IP, destination port and protocol. We assume that each flow record has associated a start time and an end time representing the time interval when the flow was active in the network. Challenges. Network flow data can grow very large in the number of records and storage footprint. Figure 1 shows network flow distribution of traffic captured from edge routers in a moderate sized campus network. This network with about 3,000 hosts, commonly reaches up to 1,300 flows/second, an average 53 million flows daily and roughly 1.7 billion flows in a month. We consider records with the average size of 200 Bytes. Besides CISCO NetFlow data [18] there may be other specific information that a sensor can capture from the network such as the IP, transport and application headers information. Hence, in this example, the storage requirement is roughly 10 GB of data per day which adds up to at least 310 GB per month. When working with large amounts of disk resident data, the main challenge is no longer to ensure the necessary storage space, but to minimize the time it takes to process and access the data. An efficient storage and querying infrastructure for network records has to cope with two main technical challenges: keep the insertion rate high, and provide fast access to the desired flow records. When using a traditional row-oriented Relational Database Management Systems (RDBMS), the relevant flow attributes are inserted as a row into a table as they are captured from the network, and are indexed using various techniques [6]. On the one hand, such a system has to establish a trade off between the insertion rate desired and the storage and processing overhead employed by the use of auxiliary indexing data structures. On the other hand, enabling indexing for more attributes ultimately improves query performance but also increases the storage requirements and decreases insertion rates. At query time, all the columns of the table have to be loaded in memory even if only a subset of the attributes are relevant for the query, adding a significant I/O penalty for the overall query processing time by loading unused columns.

3 NetStore: An Efficient Storage Infrastructure 279 When querying disk resident data, an important problem to overcome is the I/O bottleneck caused by large disk to memory data transfers. One potential solution would be to load only data that is relevant to the query. For example, to answer the query What is the list of all IPs that contacted IP X between dates d 1 and d 2?, the system should load only the source and destination IPs as well as the timestamps of the flows that fall between dates d 1 and d 2.TheI/Otime can also be decreased if the accessed data is compressed since less data traverses the disk-memory boundary. Further, the overall query response time can be improved if data is processed in compressed format by saving decompression time. Finally, since the system has to insert records at line speed, all the preprocessing algorithms used should add negligible overhead while writing to disk. The above requirements can be met quite well by utilizing a column oriented database as described below. Column Store. The basic idea of column orientation is to store the data by columns rather than by rows, where each column holds data for a single attribute of the flow and is stored sequentially on disk. Such a strategy makes the system I/O efficient for read queries since only the required attributes related to a query can be read from the disk. The performance benefits of column partitioning were previously analyzed in [9, 2], and some of the ideas were confirmed by the results in the databases academic research community [16, 1, 21] as well as in industry [19, 11, 10, 3]. However, most of commercial and open-source column stores were conceived to follow general purpose RDBMSs requirements, and do not fully use the semantics of the data carried and do not take advantage of the specific types and data access patterns of network forensic and monitoring queries. In this paper we present the design, implementation details and the evaluation of NetStore, a column-oriented storage infrastructure for network records that, unlike the other systems, is intended to provide good performance for network records flow data. Contribution. The key contributions in this paper include the following: Simple and efficient column oriented design of NetStore, a network flow historical storage system that enables quick access to large amounts of data for monitoring and forensic analysis. Efficient compression methods and selection strategies to facilitate the best compression for network flow data, that permit accessing and querying data in compressed format. Implementation and deployment of NetStore using commodity hardware and open source software as well as analysis and comparison with other open source storage systems used currently in practice. The rest of this paper is organized as follows: we present related work in Section 2, our system architecture and the details of each component in Section 3. Experimental results and evaluation are presented in Section 4 and we conclude in Section 5.

4 280 P. Giura and N. Memon 2 Related Work The problem of discovering network security incidents has received significant attention over the past years. Most of the work done has focused on near-real time security event detection, by improving existing security mechanisms that monitor traffic at a network perimeter and block known attacks, detect suspicious network behavior such as network scans, or malicious binary transfers [12, 14]. Other systems such as Tribeca [17] and Gigascope [4], use stream databases and process network data as it arrives but do not store the date for retroactive analysis. There has been some work done to store network flow records using a traditional RDBMS such as PostgreSQL [6]. Using this approach, when a NIDS triggers an alarm, the database system builds indexes and materialized views for the attributes that are the subject of the alarm, and could potentially be used by forensics queries in the investigation of the alarm. The system works reasonably well for small networks and is able to help forensic analysis for events that happened over the last few hours. However, queries for traffic spanning more than a few hours become I/O bound and the auxiliary data used to speed up the queries slows down the record insertion process. Therefore, such a solution is not feasible for medium to large networks and not even for small networks in the future, if we consider the accelerated growth of internet traffic. Additionally, a time window of several hours is not a realistic assumption when trying to detect the behavior of a complex botnet engaged in stealthy malicious activity over prolonged periods of time. In the database community, many researchers have proposed the physical organization of database storage by columns in order to cope with poor read query performance of traditional row-based RDBMS [16,21,11,15,3]. As shown in [16, 2, 9, 8], a column store provides many times better performance than a row store for read intensive workloads. In [21] the focus is on optimizing the cache-ram access time by decompressing data in the cache rather than in the RAM. This system assumes the working columns are RAM resident, and shows a performance penalty if data has to be read from the disk and processed in the same run. The solution in [16] relies on processing parallelism by partitioning data into sets of columns, called projections, indexed and sorted together, independent of other projections. This layout has the benefit of rapid loading of the attributes belonging to the same projection and referred to by the same query without the use of auxiliary data structure for tuple reconstruction. However, when attributes from different projections are accessed, the tuple reconstruction process adds significant overhead to the data access pattern. The system presented in [15] emphasizes the use of an auxiliary metadata layer on top of the column partitioning that is shown to be an efficient alternative to the indexing approach. However, the metadata overhead is sizable and the design does not take into account the correlation between various attributes. Finally, in [9] authors present several factors that should be considered when one has to decide to use a column store versus a row store for a read intensive workload. The relative large number of network flow attributes and the workloads

5 NetStore: An Efficient Storage Infrastructure 281 with the predominant set of queries with large selectivity and few predicates favor the use of a column store system for historical network flow records storage. NetStore is a column oriented storage infrastructure that shares some of the features with the other systems, and is designed to provide the best performance for large amounts of disk resident network flow records. It avoids tuple reconstruction overhead by keeping at all times the same order of elements in all columns. It provides fast data insertion and quick querying by dynamically choosing the most suitable compression method available and using a simple and efficient design with a negligible meta data layer overhead. 3 Architecture In this section we describe the architecture and the key components of NetStore. We first present the characteristics of network data and query types that guide our design. We then describe the technical design details: how the data is partitioned into columns, how columns are partitioned into segments, what are the compression methods used and how a compression method is selected for each segment. We finally present the metadata associated with each segment, the index nodes, and the internal IPs inverted index structure, as well as the basic set of operators. 3.1 Network Flow Data Network flow records and the queries made on them show some special characteristics compared to other time sequential data, and we tried to apply this knowledge as early as possible in the design of the system. First, flow attributes tend to exhibit temporal clustering, that is, the range of values is small within short time intervals. Second, the attributes of the flows with the same source IP and destination IP tend to have the same values (e.g. port numbers, protocols, packets sizes etc.). Third, columns of some attributes can be efficiently encoded when partitioned into time based segments that are encoded independently. Finally, most attributes that are of interest for monitoring and forensics can be encoded using basic integer data types. The records insertion operation is represented by bulk loads of time sequential data that will not be updated after writing. Having the attributes stored in the same order across the columns makes the join operation become trivial when attributes from more than one column are used together. Network data analysis does not require fast random access on all the attributes. Most of the monitoring queries need fast sequential access to large number of records and the ability to aggregate and summarize the data over a time window. Forensic queries access specific predictable attributes but collected over longer periods of time. To observe their specific characteristics we first compiled a comprehensive list of forensic and monitoring queries used in practice in various scenarios [5]. Based on the data access pattern, we identified five types among the initial list. Spot queries (S) that target a single key (usually an IP address or port number)

6 282 P. Giura and N. Memon and return a list with the values associated with that key. Range queries (R) that return a list with results for multiple keys (usually attributes corresponding to the IPs of a subnet). Aggregation queries (A) that aggregate the data for the entire network and return the result of the aggregation (e.g. traffic sent out for network). Spot Aggregation queries (SA) that aggregate the values found for one key in a single value. Range Aggregation queries (RA) that aggregate data for multiple keys into a single value. Examples of these types of queries expressed in plain words: (S) What applications are observed on host X between dates d 1 and d 2? (R) What is the list of destination IPs that have source IPs in a subnet between dates d 1 and d 2? (A) What is the total number of connections for the entire network between dates d 1 and d 2? (SA) What is the number of bytes that host X sent between dates d 1 and d 2? (RA) What is the number of hosts that each of the hosts in a subnet contacted between dates d 1 and d 2? 3.2 Column Oriented Storage Columns. In NetStore, we consider that flow records with n attributes are stored in the logical table with n columns and an increasing number of rows (tuples) one for each flow record. The values of each attribute are stored in one column and have the same data type. By default almost all of the values of a column are not sorted. Having the data sorted in a column might help get better compression and faster retrieval, but changing the initial order of the elements requires the use of auxiliary data structure for tuple reconstruction at query time. We investigated several techniques to ease tuple reconstruction and all methods added much more overhead at query time than the benefit of better compression and faster data access. Therefore, we decided to maintain the same order of elements across columns to avoid any tuple reconstruction penalty when querying. However, since we can afford one column to be sorted without the need to use any reconstruction auxiliary data, we choose to first sort only one column and partially sort the rest of the columns. We call the first sorted column the anchor column. Note that after sorting, given our storage architecture, each segment can still be processed independently. The main purpose of the anchor column choosing algorithm is to select the ordering that facilitates the best compression and fast data access. Network flow data express strong correlation between several attributes and we exploit this characteristic by keeping the strongly correlated columns in consecutive sorting order as much as possible for better compression results. Additionally, based on previous queries data access pattern, columns are arranged by taking into account the probability of each column to be accessed by future queries. The columns with higher probabilities are arranged at the beginning of the sorting order. As such, we maintain the counting probabilities associated with each of the columns given by the formula P (c i )= ai t,wherec i is the i-th column, a i the number of queries that accessed c i and t the total number of queries.

7 NetStore: An Efficient Storage Infrastructure 283 Segments. Each column is further partitioned into fixed sets of values called segments. Segments partitioning enables physical storage and processing at a smaller granularity than simple column based partitioning. These design decisions provide more flexibility for compression strategies and data access. At query time only used segments will be read from disk and processed based on the information collected from segments metadata structures called index nodes. Each segment has associated a unique identifier called segment ID. For each column, a segment ID represents an auto incremental number, started at the installation of the system. The segment sizes are dependent of the hardware configuration and can be set in such a way to use the most of available main memory. For better control over data structures used, the segments have the same number of values across all the columns. In this way there is no need to store a record ID for each value of a segment, and this is one major difference compared to some existing column stores [11]. As we will show in Section 4 the performance of the system is related to the segment size used. The larger the segment size, the better the compression performance and query processing times. However, we notice that records insertion speed decreases with the increase of segment size, so, there is a trade off between the query performance desired and the insertion speed needed. Most of the columns store segments in compressed format and, in a later section we present the compression algorithms used. Column segmentation design is an important difference compared to traditional row oriented systems that process data a tuple at a time, whereas NetStore processes data segment at a time, which translates to many tuples at a time. Figure 3 shows the processing steps for the three processing phases: buffering, segmenting and query processing. Fig. 2. NetStore main components: Processing Engine and Column-Store. Fig. 3. NetStore processing phases: buffering, segmenting and query processing. Column Index. For each column we store the meta data associated with each of the segments in an index node corresponding to the segment. The set of all index nodes for the segments of a column represent the column index. The information in each index node includes statistics about data and different features that are used in the decision about the compression method to use and optimal data

8 284 P. Giura and N. Memon access, as well as the time interval associated with the segment in the format [min start time, max end time]. Figure 4 presents an intuitive representation of the columns, segments and index for each column. Each column index is implemented using a time interval tree. Every query is relative to a time window T. At query time, the index of every column accessed is looked up and only the segments that have the time interval overlapping window T are considered for processing. In the next step, the statistics on segment values are checked to decide if the segment should be loaded in memory and decompressed. This two-phase index processing helps in early filtering out unused data in query processing similar to what is done in [15]. Note that the index nodes do not hold data values, but statistics about the segments such as the minimum and the maximum values, the time interval of the segment, the compression method used, the number of distinct values, etc. Therefore, index usage adds negligible storage and processing overhead. From the list of initial queries we observed that the column for the source IP attribute is most frequently accessed. Therefore, we choose this column as our first sorted anchor column, and used it as a clustered index for each source IP segment. However, for workloads where the predominant query types are spot queries targeting a specific column other than the anchor column, the use of indexes for values inside the column segments is beneficial at a cost of increased storage and slowdown in insertion rate. Thus, this situation can be acceptable for slow networks were the insertion rate requirements are not too high. When the insertion rate is high then it is best not to use any index but rely on the meta-data from the index nodes. Internal IPs Index. Besides the column index, NetStore maintains another indexing data structure for the network internal IP addresses called the Internal IPs index. Essentially the IPs index is an inverted index for the internal IPs. That is, for each internal IP address the index stores in a list the absolute positions where the IP address occurs in the column, sourceip or destip,asifthecolumn is not partitioned into segments. Figure 5 shows an intuitive representation of the IPs index. For each internal IP address the positions list represents an array of increasing integer values that are compressed and stored on disk on a daily basis. Because IP addresses tend to occur in consecutive positions in a column, we chose to compress the positions list by applying run-length-encoding on differences between adjacent values. 3.3 Compression Each of the segments in NetStore is compressed independently. We observed that segments within a column did not have the same distribution due to the temporal variation of network activity in working hours, days, nights, weekends, breaks etc. Hence segments of the same column were best compressed using different methods. We explored different compression methods. We investigated methods that allow data processing in compressed format and do not need decompression of all the segment values if only one value is requested. We also looked at methods

9 NetStore: An Efficient Storage Infrastructure 285 Fig. 4. Schematic representation of columns, segments, index nodes and column indexes Fig. 5. Intuitive representation of the IPs inverted index that provide fast decompression and reasonable compression ratio and speed. The decision on which compression algorithm to use is done automatically for each segment, and is based on the data features of the segment such as data type, the number of distinct values, range of the values and number of switches between adjacent values. We tested a wide range of compression methods, including some we designed for the purpose or currently used by similar systems in [1,16,21,11], with needed variations if any. Below we list the techniques that emerged effective based on our experimentation: Run-Length Encoding (RLE): is used for segments that have few distinct repetitive values. If value v appears consecutively r times, and r>1, we compress it as the pair (v, r). It provides fast compression as well as the ability to process data in compressed format. Variable Byte Encoding: is a byte-oriented encoding method used for positive integers. It uses a variable number of bytes to encode each integer value as follows: if value < 128 use one byte (set highest bit to 0), for value < use 2 bytes (first byte has highest bit set to 1 and second to 0) and so on. This method can be used in conjunction with RLE for both values and runs. It provides reasonable compression ratio and good decompression speed allowing the decompression of only the requested value without the need to decompress the whole segment. Dictionary Encoding: is used for columns with few distinct values and sometimes before RLE is applied (e.g. to encode protocol attribute). Frame Of Reference: considers the interval bounded by the minimum and maximum values as the frame of reference for the values to be compressed [7]. We use it to compress non-empty timestamp attributes within a segment (e.g. start time, end time, etc.) that are integer values representing the number of seconds from the epoch. Typically the time difference between minimum and maximum timestamp values in a segment is less than few hours, therefore the encoding of the difference is possible using short values of 2 bytes instead of integers of 4 bytes. It allows processing data in compressed format by decompressing each timestamp value individually without the need to decompress the whole segment.

10 286 P. Giura and N. Memon Generic Compression: we use the DEFLATE algorithm from the zlib library that is a variation of the LZ77 [20]. This method provides compression at the binary level, and does not allow values to be individually accessed unless the whole segment is decompressed. It is chosen if it enables faster data insertion and access than the value-based methods presented earlier. No Compression: is listed as a compression method since it will represent the base case for our compression selection algorithm. Method Selection. The selection of a compression method is done based on the statistics collected in one pass over the data of each segment. As mentioned earlier, the two major requirements of our system are to keep records insertion rates high and to provide fast data access. Data compression does not always provide better insertion and better query performance compared to No compression, and for this we developed a model to decide on when compression is suitable and if so, what method to choose. Essentially, we compute a score for each candidate compression method and we select the one that has the best score. More formally, we assume we have k + 1 compression methods m 0,m 1,...,m k, with m 0 being the No Compression method. We then compute the insertion time as the time to compress and write to disk, and the access time, to read from disk and decompress, as functions of each compression method. For value-based compression methods, we estimate the compression, write, read and decompression times based on the statistics collected for each segment. For the generic compression we estimate the parameters based on the average results obtained when processing sample segments. For each segment we evaluate: insertion (m i )=c (m i )+w (m i ), i =1,...,k access (m i )=r (m i )+d (m i ), i =1,...,k As the base case for each method evaluation we consider the No Compression method. We take I 0 to represent the time to insert an uncompressed segment which is represented by only the writing time since there is no time spent for compression and, similarly A 0 to represent the time to access the segment which is represented by only the time to read the segment from disk since there is no decompression. Formally, following the above equations we have: insertion (m 0 )=w (m 0 )=I 0 and access (m 0 )=r (m 0 )=A 0 We then choose the candidate compression methods m i only if we have both: insertion (m i ) <I 0 and access (m i ) <A 0 Next, among the candidate compression methods we choose the one that provides the lowest access time. Note that we primarily consider the access time as the main differentiator factor and not the insertion time. The disk read is the most frequent and time consuming operation and it is many times slower than disk write of the same size file for commodity hard drives. Additionally, insertion time can be improved by bulk loading or by other ways that take into account that the network traffic rate is not steady and varies greatly over time,

11 NetStore: An Efficient Storage Infrastructure 287 whereas the access mechanism should provide the same level of performance at all times. The model presented above does not take into account if the data can be processed in compressed format and the assumption is that decompression is necessary at all times. However, for a more accurate compression method selection we should include the probability of a query processing the data in compressed format in the access time equation. Since forensic and monitoring queries are usually predictable, we can assume without affecting the generality of our system, that we have a total number of t queries, each query q j having the probability of occurrence p j with p j = 1. We consider the probability of a t segment j=1 s being processed in compressed format as the probability of occurrence of the queries that process the segment in compressed format. Let CF be the set of all the queries that process s in compressed format, we then get: P (s) = p j, CF = {q j q j processes s in compressed format} q j CF Now, a more accurate access time equation can be rewritten taking into account the possibility of not decompressing the segment for each access: access (m i )=r (m i )+d (m i ) (1 P (s)), i =1,...,k (1) Note that the compression selection model can accommodate any compression, not only the ones mentioned in this paper, and is also valid in the cases when the probability of processing the data in compressed format is Query Processing Figure 3 illustrates NetStore data flow, from network flow record insertion to the query result output. Data is written only once in bulk, and read many times for processing. NetStore does not support transaction processing queries such as record updates or deletes, it is suitable for analytical queries in general and network forensics and monitoring queries in special. Data Insertion. Network data is processed in several phases before being delivered to permanent storage. First, raw flow data is collected from the network sensors and is then preprocessed. Preprocessing includes the buffering and segmenting phases. Each flow is identified by a flow ID represented by the 5-tuple [sourceip, sourceport, destip, destport, protocol]. In the buffering phase, raw network flow information is collected until the buffer is filled. The flow records in the buffer are aggregated and then sorted. As mentioned in Section 3.3, the purpose of sorting is twofold: better compression and faster data access. All the columns are sorted following the sorting order determined based on access probabilities and correlation between columns using the first sorted column as anchor.

12 288 P. Giura and N. Memon In the segmenting phase, all the columns are partitioned into segments, that is, once the number of flow records reach the buffer capacity the column data in the buffer is considered a full segment and is processed. Each of the segments is then compressed using the appropriate compression method based on the data it carries. The information about the compression method used and statistics about the data is collected and stored in the index node associated with the segment. Note that once the segments are created, the statistics collection and compression of each segment is done independent of the rest of the segments in the same column or in other columns. By doing so, the system takes advantage of the increasing number of cores in a machine and provides good record insertion rates in multi threaded environments. After preprocessing all the data is sent to permanent storage. As monitoring queries tend to access the most recent data, some data is also kept in memory for a predefined length of time. NetStore uses a small active window of size W and all the requests from queries accessing the data in the time interval [NOW - W, NOW] are served from memory, where NOW represents the actual time of the query. Query Execution. For flexibility NetStore supports limited SQL syntax and implements a basic set of segment operators related to the query types presented in Section 3.1. Each SQL query statement is translated into a statement in terms of the basic set of segment operators. Below we briefly present each general operator: filtersegs (d 1,d 2 ): Returns the set with segment IDs of the segments that overlap with the time interval [d 1,d 2 ]. This operator is used by all queries. filteratts(segids, pred 1 (att 1 ),...,pred k (att k )): Returns the list of pairs (segid, pos list), where pos list represents the intersection of attribute position lists in the corresponding segment with id segid, for which the attribute att i satisfies the predicate pred i,withi =1,...,k. aggregate (segids, pred 1 (att 1 ),...,pred k (att k )): Returns the result of aggregating values of attribute att k by att k 1 by... att 1 that satisfy their corresponding predicates pred k,...,pred 1 in segments with ids in segids. The aggregation can be summation, counting, min or max. The queries considered in section 3.1 can all be expressed in terms of the above operators. For example the query: What is the number of unique hosts that each of the hosts in the network contacted in the interval [d 1,d 2 ]? can be expressed as follows: aggregate(filter segs(d 1,d 2 ), sourceip = /16, destip ). After the operator filter segs is applied, only the sourceip and destip segments that overlap with the time interval [d 1,d 2 ] are considered for processing and their corresponding index nodes are read from disk. Since this is a range aggregation query, all the considered segments will be loaded and processed. If we consider the query What is the number of unique hosts that host X contacted in the interval [d 1,d 2 ]? it can be expressed as follows: aggregate(filter segs(d 1,d 2 ), sourceip =X,destIP ). For this query the number of relevant segments can be reduced even more by discarding the ones that do

13 NetStore: An Efficient Storage Infrastructure 289 not overlap with the time interval [d 1,d 2 ], as well as the ones that don t hold the value X for sourceip by checking corresponding index nodes statistics. If the value X represents the IP address of an internal node, then the internal IPs index will be used to retrieve all the positions where the value X occurs in the sourceip column. Then a count operation is performed of all the unique destip addresses corresponding to the positions. Note that by using internal IPs index, the data of sourceip column is not touched. The only information loaded in memory is the positions list of IP X as well as the segments in column destip that correspond to those positions. 4 Evaluation In this section we present an evaluation of NetStore. We designed and implemented NetStore using the Java programming language on the FreeBSD 7.2- RELEASE platform. For all the experiments we used a single machine with 6 GB DDR2 RAM, two Quad-Core 2.3 Ghz CPUs, 1TB SATA MB Buffer 7200 rpm disk with a RAID-Z configuration. We consider this machine representative of what a medium scale enterprise will use as a storage server for network flow records. For experiments we used the network flow data captured over a 24 hour period of one weekday at our campus border router. The size of raw text file data was about 8 GB, 62,397,593 network flow records. For our experiments we considered only 12 attributes for each network flow record, that is only the ones that were meaningful for the queries presented in this paper. Table 1 shows the attributes used as well as the types and the size for each attribute. We compared NetStore s performance with two open source RDBMS, a row-store, PostgreSQL [13] and a column-store, LucidDB [11]. We chose PostgreSQL over other open source systems because we intended to follow the example in [6] which uses it for similar tasks. Additionally we intended to make use of the partial index support for internal IPs that other systems don t offer in order to compare the performance of our inverted IPs index. We chose LucidDB as the column-store to compare with as it is, to the best of our knowledge, the only stable open source column-store that yields good performance for disk resident data and provides reasonable insertion speed. We chose only data captured over one day, with size slightly larger than the available memory, because we wanted to maintain reasonable running times for the other systems that we compared NetStore to. These systems become very slow for larger data sets and performance gap compared to NetStore increases with the size of the data. 4.1 Parameters Figure 6 shows the influence that the segment size has over the insertion rate. We observe that the insertion rate drops with the increase of segment size. This trend is expected and is caused by the delay in preprocessing phase, mostly because of the larger segment array sorting. As Figure 7 shows, the segment

14 290 P. Giura and N. Memon Table 1. NetStore flow attributes. Table 2. NetStore properties and network rates supported based on 24 hour flow records data and the 12 attributes Column Type Bytes sourceip int 4 destip int 4 sourceport short 2 destport short 2 protocol byte 1 starttime short 2 endtime short 2 tcpsyns byte 1 tcpacks byte 1 tcpfins byte 1 tcprsts byte 1 numbytes int 4 Property Value Unit records insertion rate 10,000 records/second number of records 62,397,594 records number of bytes transported 1.17 Terabytes bytes transported per record 20, Bytes/record bits rate supported 1.54 Gbit/s number of packets transported 2,028,392,356 packets packets transported per record packets/record packets rate supported 325, packets/second size also affects the compression ratio of each segment, the larger the segment size the larger the compression ratio achieved. But high compression ratio is not a critical requirement. The size of the segments is more critically related to the available memory, the desired insertion rate for the network and the number of attributes used for each record. We set the insertion rate goal at 10,000 records/second, and for this goal we set a segment size of 2 million records given the above hardware specification and records sizes. Table 2 shows the insertion performance of NetStore. The numbers presented are computed based on average bytes per record and average packets per record given the insertion rate of 10,000 records/second. When installed on a machine with the above specification, NetStore can keep up with traffic rates up to 1.5 Gbit/s for the current experimental implementation. For a constant memory size, this rate decreases with the increase in segment size and the increase in the number of attributes for each flow record. Fig. 6. Insertion rate for different segment sizes Fig. 7. Compression ratio with and without aggregation

15 NetStore: An Efficient Storage Infrastructure Queries Having described the NetStore architecture and it s design details, in this section we consider the queries described in [5], but taking into account data collected over the 24 hours for internal network /16. We consider both the queries and methodology in [5] meaningful for how an investigator will perform security analysis on network flow data. We assume all the flow attributes used are inserted into a table flow and we use standard SQL to describe all our examples. Scanning. Scanning attack refers to the activity of sending a large number of TCP SYN packets to a wide range of IP addresses. Based on the received answer the attacker can determine if a particular vulnerable service is running on the victim s host. As such, we want to identify any TCP SYN scanning activity initiated by an external hosts, with no TCP ACK or TCP FIN flags set and targeted against a large number of internal IP destinations, larger than a preset limit. We use the following range aggregation query (Q1): SELECT sourceip, destport, count(distinct destip), starttime FROM flow WHERE sourceip <> /16 AND destip = /16 AND protocol = tcp AND tcpsyns = 1 AND tcpacks = 0 AND tcpfins = 0 GROUP BY sourceip HAVING count(distinct destip) > limit; External IP address was found scanning starting at time t 1.We check if there were any valid responses after time t 1 from the internal hosts, where no packet had the TCP RST flag set, and we use the following query (Q2): SELECT sourceip, sourceport, destip FROM flow WHERE starttime > t 1 AND sourceip = /16 AND destip = AND protocol = tcp AND tcprsts = 0; Worm Infected Hosts. Internal host with the IP address was discovered to have been responded to a scanning initiated by a host infected with the Conficker worm and we want to check if the internal host is compromised. Typically, after a host is infected, the worm copies itself into memory and begins propagating to random IP addresses across a network by exploiting the same vulnerability. The worm opens a random port and starts scanning random IPs on port 445. We use the following query to check the internal host (Q3): SELECT sourceip, destport, count(distinct destip) FROM flow WHERE starttime > t 1 AND sourceip = AND destport = 445;

16 292 P. Giura and N. Memon SYN Flooding. It is a network based-denial of service attack in which the attacker sends an unusual large number of SYN request, over a threshold t, to a specific target over a small time window W. To detect such an attack we filter all the incoming traffic and count the number of flows with TCP SYN bit set and no TCP ACK or TCP FIN for all the internal hosts. We use the following query(q4): SELECT destip, count(distinct sourcep), starttime FROM flow WHERE starttime > NOW - W AND destip = /16 AND protocol = tcp AND tcpsyns = 1 AND tcpacks = 0 AND tcpfins = 0 GROUP BY destip HAVING count(sourceip) > t; Network Statistics. Besides security analysis, network statistics and performance monitoring is another important usage for network flow data. To get this information we use aggregation queries for all collected data over a large time window, both incoming and outgoing. Aggregation operation can be number of bytes or packets summation, number of unique hosts contacted or some other meaningful aggregation statistics. For example we use the following simple aggregation query to find the number of bytes transported in the last 24 hours (Q5): SELECT sum(numbytes) FROM flow WHERE starttime > NOW - 24h ; General Queries. The sample queries described above are complex and belong to more than one basic type described in Section 3.1. However, each of them can be separated into several basic types such that the result of one query becomes the input for the next one. We build a more general set of queries starting from the ones described above by varying the parameters in such a way to achieve different level of data selectivity form low to high. Then, for each type we reported the average performance for all the queries of that type. Figure 8 shows the average running times of selected queries for increasing segment sizes. We observe that for S type queries that don t use IPs index (e.g. for attributes other than internal sourceip or destip), the performance decreases when the segment size increases. This is an expected result since for larger segments there is more unused data loaded as part of the segment where the spotted value resides. When using the IPs index the performance benefit comes from skipping the irrelevant segments whose positions are not found in the positions list. However, for internal busy servers that have corresponding flow records in all the segments, all corresponding segments of attributes have to be read but not the IPs segments. This is an advantage since an IP segment is several times larger in general than the other attributes segments. Hence, except for spot queries that use non-indexed attributes, queries tend to be faster for larger segment sizes. 4.3 Compression Our goal with using compression is not to achieve the best compression ratio nor the best compression or decompression speed, but to obtain the highest records

17 NetStore: An Efficient Storage Infrastructure 293 insertion rate and the best query performance. We evaluated our compression selection model by comparing performance when using a single method for all the segments in the column, with the performance when using the compression selection algorithm for each segment. To select the method for a column we compressed first all the segments of the columns with all the six methods presented. We then measured the access performance for each column compressed with each method. Finally, we selected the compression method of a column, the method that provides the best access times for the majority of the segments. For the variable segments compression, we activated the methods selection mechanism for all columns and then we inserted the data, compressing each segment based on the statistics of its own data rather than the entire column. In both cases we did not change anything in the statistic collection process since all the statistics were used in the query process for both approaches. We obtained on an average 10 to 15 percent improvement per query using the segment based compression method selection model with no penalty for the insertion rate. However, we consider the overall performance of compression methods selection model is satisfactory and the true value resides in the framework implementation, being limited only by the individual methods used not by the general model design. If the data changes and other compression methods are more efficient for the new data, only the compression algorithm and the operators that work on this compressed data should be changed, with the overall architecture remaining the same. Some commercial systems [19] apply on top of the value-based compressed columns another layer of general binary compression for increased performance. We investigated the same possibility and compared four different approaches to compression on top of the implemented column oriented architecture: no compression, value-based compression only, binary compression only and value-based plus binary compression on top of that. For the no compression case, we processed the data using the same indexing structure and column oriented layout but with the compression disabled for all the segments. For the binary compression only we compress each segment using the generic binary compression. In the case of value-based compression we compress all the segments having the dynamic selection mechanism enabled, and for the last approach we apply another layer of generic compression on top of already value-based compressed segments. The results of our experiment for the four cases are shown in Figure 9. We can see that compression is a determining factor in performance metrics. Using valuebased compression achieves the best average running time for the queries while the uncompressed segments scenario yields the worst performance.we also see that adding another compression layer does not help in query performance nor in the insertion rate even though it provides better compression ratio. However, the general compression method can be used for data aging, to compress and archive older data that is not actively used. Figure 7 shows the compression performance for different segment sizes and how flow aggregation affects storage footprint. As expected, compression performance is better for larger segment sizes in both cases, with and without aggregation. That is the case because of the compression methods used. The larger the

18 294 P. Giura and N. Memon Fig. 8. Average query times for different segment sizes and different query types Fig. 9. Average query times for the compression strategies implemented segment, the longer the runs for column with few distinct values, the smaller the dictionary size for each segment. The overall compression ratio of raw network flow data for the segment size of 2 million records is 4.5 with no aggregation and 8.4 with aggregation enabled. Note that the size of compressed data includes also the size of both indexing structures: column indexes and IPs index. 4.4 Comparison with Other Systems For comparison we used the same data and performed a system-specific tuning for each of the systems parameters. To maintain the insertion rate above our target of 10,000 records/second we created three indexes for each Postgres and Luciddb: one clustered index on starttime and two un-clustered indexes, one on sourceip and one on destip attributes. Although we believe we chose good values for the other tuning parameters we cannot guarantee they are optimal and we only present the performance we observed. We show the performance for using the data and the example queries presented in Section 4.2. Table 3 shows the relative performance of NetStore compared to PostgresSQL for the same data. Since our main goal is to improve disk resident data access, we ran each query once for each system to minimize the use of cached data. The numbers presented show how many times NetStore is better. To maintain a fair overall comparison we created a PostgresSQL table for each column of Netstore. As mentioned in [2], row-storeswith columnardesign provide better performance for queries that access a small number of columns such as the sample queries in Section 4.2. We observe that Netstore clearly outperforms Table 3. Relative performance of NetStore versus columns only PostgreSQL and LucidDB for query running times and total storage needed Q1 Q2 Q3 Q4 Q5 Storage Postgres/NetStore LucidDB/NetStore

19 NetStore: An Efficient Storage Infrastructure 295 PostgreSQL for all the query types providing the best results for queries accessing more attributes (e.g. Q1 and Q4) even though it uses 90 times more disk space including all the auxiliary data. The poor PostgreSQL performance can be explained by the absence of more clustered indexes, the lack of compression, and the unnecessary tuple overhead. Table 3 also shows the relative performance compared to LucidDB. We observe that the performance gap is not at the same order of magnitude compared to that of PostgreSQL even when more attributes are accessed. However, NetStore performs clearly better when storing about 6 times less data. The performance penalty of LucidDB can be explain by the lack of column segmentation design and by early materialization in the processing phase specific to general-purpose column stores. However we noticed that LucidDB achieves a significant performance improvement for the subsequent runs of the same query by efficiently using memory resident data. 5 Conclusion and Future Work With the growth of network traffic, there is an increasing demand for solutions to better manage and take advantage of the wealth of network flow information recorded for monitoring and forensic investigations. The problem is no longer the availability and the storage capacity of the data, but the ability to quickly extract the relevant information about potential malicious activities that can affect network security and resources. In this paper we have presented the design, implementation and evaluation of a novel working architecture, called NetStore, that is useful in the network monitoring tasks and assists in network forensics investigations. The simple column oriented design of NetStore helps in reducing query processing time by spending less time for disk I/O and loading only needed data. The column partitioning facilitates the use of efficient compression methods for network flow attributes that allow data processing in compressed format, therefore boosting query runtime performance. NetStore clearly outperforms existing row-based DBMSs systems and provides better results that the general purpose column oriented systems because of simple design decisions tailored for network flow records. Experiments show that NetStore can provide more than ten times faster query response comparedto other storage systems while maintaining much smaller storage size. In future work we seek to explore the use of NetStore for new types of time sequential data, such as host log analysis, and the possibility to release it as an open source system. References 1. Abadi, D., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: SIGMOD 2006: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp ACM, New York (2006)

20 296 P. Giura and N. Memon 2. Abadi, D.J., Madden, S.R., Hachem, N.: Column-stores vs. row-stores: how different are they really? In: SIGMOD 2008: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp ACM, New York (2008) 3. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A distributed storage system for structured data. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2006 (2006) 4. Cranor, C., Johnson, T., Spataschek, O., Shkapenyuk, V.: Gigascope: a stream database for network applications. In: SIGMOD 2003: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp ACM, New York (2003) 5. Gates, C., Collins, M., Duggan, M., Kompanek, A., Thomas, M.: More netflow tools for performance and security. In: LISA 2004: Proceedings of the 18th USENIX Conference on System Administration, pp USENIX Association, Berkeley (2004) 6. Geambasu, R., Bragin, T., Jung, J., Balazinska, M.: On-demand view materialization and indexing for network forensic analysis. In: NETB 2007: Proceedings of the 3rd USENIX International Workshop on Networking Meets Databases, pp USENIX Association, Berkeley (2007) 7. Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: Proceedings of IEEE International Conference on Data Engineering, pp (1998) 8. Halverson, A., Beckmann, J.L., Naughton, J.F., Dewitt, D.J.: A comparison of c- store and row-store in a common framework. Technical Report TR1570, University of Wisconsin-Madison (2006) 9. Holloway, A.L., DeWitt, D.J.: Read-optimized databases, in depth. Proc. VLDB Endow. 1(1), (2008) 10. Infobright Inc. Infobright, LucidEra. Luciddb, Paxson, V.: Bro: A system for detecting network intruders in real-time. Computer Networks, (1998) 13. PostgreSQL. Postgresql, Roesch, M.: Snort - lightweight intrusion detection for networks. In: LISA 1999: Proceedings of the 13th USENIX Conference on System Administration, pp USENIX Association, Berkeley (1999) 15. Ślȩzak, D., Wróblewski, J., Eastwood, V., Synak, P.: Brighthouse: an analytic data warehouse for ad-hoc queries. Proc. VLDB Endow. 1(2), (2008) 16. Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O Neil, E., O Neil, P., Rasin, A., Tran, N., Zdonik, S.: C-store: a column-oriented dbms. In: VLDB 2005: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB Endowment, pp (2005) 17. Sullivan, M., Heybey, A.: Tribeca: A system for managing large databases of network traffic. In: USENIX, pp (1998) 18. Cisco Systems. Cisco ios netflow, Vertica Systems. Vertica, Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23, (1977) 21.Zukowski,M.,Boncz,P.A.,Nes,N.,Héman, S.: Monetdb/x100 - a dbms in the cpu cache. IEEE Data Eng. Bull. 28(2), (2005)

RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG

RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG 1 RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG Background 2 Hive is a data warehouse system for Hadoop that facilitates

More information

NSC 93-2213-E-110-045

NSC 93-2213-E-110-045 NSC93-2213-E-110-045 2004 8 1 2005 731 94 830 Introduction 1 Nowadays the Internet has become an important part of people s daily life. People receive emails, surf the web sites, and chat with friends

More information

Monitoring PostgreSQL database with Verax NMS

Monitoring PostgreSQL database with Verax NMS Monitoring PostgreSQL database with Verax NMS Table of contents Abstract... 3 1. Adding PostgreSQL database to device inventory... 4 2. Adding sensors for PostgreSQL database... 7 3. Adding performance

More information

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency

More information

Chuck Cranor, Ted Johnson, Oliver Spatscheck

Chuck Cranor, Ted Johnson, Oliver Spatscheck Gigascope: How to monitor network traffic 5Gbit/sec at a time. Chuck Cranor, Ted Johnson, Oliver Spatscheck June, 2003 1 Outline Motivation Illustrative applications Gigascope features Gigascope technical

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

Research on Errors of Utilized Bandwidth Measured by NetFlow

Research on Errors of Utilized Bandwidth Measured by NetFlow Research on s of Utilized Bandwidth Measured by NetFlow Haiting Zhu 1, Xiaoguo Zhang 1,2, Wei Ding 1 1 School of Computer Science and Engineering, Southeast University, Nanjing 211189, China 2 Electronic

More information

NetFlow Tracker Overview. Mike McGrath x ccie CTO mike@crannog-software.com

NetFlow Tracker Overview. Mike McGrath x ccie CTO mike@crannog-software.com NetFlow Tracker Overview Mike McGrath x ccie CTO mike@crannog-software.com 2006 Copyright Crannog Software www.crannog-software.com 1 Copyright Crannog Software www.crannog-software.com 2 LEVELS OF NETWORK

More information

Port evolution: a software to find the shady IP profiles in Netflow. Or how to reduce Netflow records efficiently.

Port evolution: a software to find the shady IP profiles in Netflow. Or how to reduce Netflow records efficiently. TLP:WHITE - Port Evolution Port evolution: a software to find the shady IP profiles in Netflow. Or how to reduce Netflow records efficiently. Gerard Wagener 41, avenue de la Gare L-1611 Luxembourg Grand-Duchy

More information

Analysis of a Distributed Denial-of-Service Attack

Analysis of a Distributed Denial-of-Service Attack Analysis of a Distributed Denial-of-Service Attack Ka Hung HUI and OnChing YUE Mobile Technologies Centre (MobiTeC) The Chinese University of Hong Kong Abstract DDoS is a growing problem in cyber security.

More information

In-Memory Data Management for Enterprise Applications

In-Memory Data Management for Enterprise Applications In-Memory Data Management for Enterprise Applications Jens Krueger Senior Researcher and Chair Representative Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University

More information

Hierarchical Bloom Filters: Accelerating Flow Queries and Analysis

Hierarchical Bloom Filters: Accelerating Flow Queries and Analysis Hierarchical Bloom Filters: Accelerating Flow Queries and Analysis January 8, 2008 FloCon 2008 Chris Roblee, P. O. Box 808, Livermore, CA 94551 This work performed under the auspices of the U.S. Department

More information

SQL Server Business Intelligence on HP ProLiant DL785 Server

SQL Server Business Intelligence on HP ProLiant DL785 Server SQL Server Business Intelligence on HP ProLiant DL785 Server By Ajay Goyal www.scalabilityexperts.com Mike Fitzner Hewlett Packard www.hp.com Recommendations presented in this document should be thoroughly

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

Practical Experience with IPFIX Flow Collectors

Practical Experience with IPFIX Flow Collectors Practical Experience with IPFIX Flow Collectors Petr Velan CESNET, z.s.p.o. Zikova 4, 160 00 Praha 6, Czech Republic petr.velan@cesnet.cz Abstract As the number of Internet applications grows, the number

More information

How good can databases deal with Netflow data

How good can databases deal with Netflow data How good can databases deal with Netflow data Bachelorarbeit Supervisor: bernhard fabian@net.t-labs.tu-berlin.de Inteligent Networks Group (INET) Ernesto Abarca Ortiz eabarca@net.t-labs.tu-berlin.de OVERVIEW

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_SSD_Cache_WP_ 20140512 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges...

More information

Improving the Database Logging Performance of the Snort Network Intrusion Detection Sensor

Improving the Database Logging Performance of the Snort Network Intrusion Detection Sensor -0- Improving the Database Logging Performance of the Snort Network Intrusion Detection Sensor Lambert Schaelicke, Matthew R. Geiger, Curt J. Freeland Department of Computer Science and Engineering University

More information

Key Components of WAN Optimization Controller Functionality

Key Components of WAN Optimization Controller Functionality Key Components of WAN Optimization Controller Functionality Introduction and Goals One of the key challenges facing IT organizations relative to application and service delivery is ensuring that the applications

More information

Fuzzy Network Profiling for Intrusion Detection

Fuzzy Network Profiling for Intrusion Detection Fuzzy Network Profiling for Intrusion Detection John E. Dickerson (jedicker@iastate.edu) and Julie A. Dickerson (julied@iastate.edu) Electrical and Computer Engineering Department Iowa State University

More information

Case Study: Instrumenting a Network for NetFlow Security Visualization Tools

Case Study: Instrumenting a Network for NetFlow Security Visualization Tools Case Study: Instrumenting a Network for NetFlow Security Visualization Tools William Yurcik* Yifan Li SIFT Research Group National Center for Supercomputing Applications (NCSA) University of Illinois at

More information

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process

More information

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski kbajda@cs.yale.edu

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski kbajda@cs.yale.edu Kamil Bajda-Pawlikowski kbajda@cs.yale.edu Querying RDF data stored in DBMS: SPARQL to SQL Conversion Yale University technical report #1409 ABSTRACT This paper discusses the design and implementation

More information

Capacity Planning Process Estimating the load Initial configuration

Capacity Planning Process Estimating the load Initial configuration Capacity Planning Any data warehouse solution will grow over time, sometimes quite dramatically. It is essential that the components of the solution (hardware, software, and database) are capable of supporting

More information

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices Sawmill Log Analyzer Best Practices!! Page 1 of 6 Sawmill Log Analyzer Best Practices! Sawmill Log Analyzer Best Practices!! Page 2 of 6 This document describes best practices for the Sawmill universal

More information

Graph Database Proof of Concept Report

Graph Database Proof of Concept Report Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment

More information

2009 Oracle Corporation 1

2009 Oracle Corporation 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,

More information

Wireshark Developer and User Conference

Wireshark Developer and User Conference Wireshark Developer and User Conference Using NetFlow to Analyze Your Network June 15 th, 2011 Christopher J. White Manager Applica6ons and Analy6cs, Cascade Riverbed Technology cwhite@riverbed.com SHARKFEST

More information

Windows Server Performance Monitoring

Windows Server Performance Monitoring Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly

More information

Hardware Configuration Guide

Hardware Configuration Guide Hardware Configuration Guide Contents Contents... 1 Annotation... 1 Factors to consider... 2 Machine Count... 2 Data Size... 2 Data Size Total... 2 Daily Backup Data Size... 2 Unique Data Percentage...

More information

Virtuoso and Database Scalability

Virtuoso and Database Scalability Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of

More information

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 Performance Study VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 VMware VirtualCenter uses a database to store metadata on the state of a VMware Infrastructure environment.

More information

Performance Modeling and Analysis of a Database Server with Write-Heavy Workload

Performance Modeling and Analysis of a Database Server with Write-Heavy Workload Performance Modeling and Analysis of a Database Server with Write-Heavy Workload Manfred Dellkrantz, Maria Kihl 2, and Anders Robertsson Department of Automatic Control, Lund University 2 Department of

More information

Oracle Database In-Memory The Next Big Thing

Oracle Database In-Memory The Next Big Thing Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes

More information

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011 SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,

More information

8. 網路流量管理 Network Traffic Management

8. 網路流量管理 Network Traffic Management 8. 網路流量管理 Network Traffic Management Measurement vs. Metrics end-to-end performance topology, configuration, routing, link properties state active measurements active routes active topology link bit error

More information

Energy Efficient MapReduce

Energy Efficient MapReduce Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing

More information

Fact Sheet In-Memory Analysis

Fact Sheet In-Memory Analysis Fact Sheet In-Memory Analysis 1 Copyright Yellowfin International 2010 Contents In Memory Overview...3 Benefits...3 Agile development & rapid delivery...3 Data types supported by the In-Memory Database...4

More information

co Characterizing and Tracing Packet Floods Using Cisco R

co Characterizing and Tracing Packet Floods Using Cisco R co Characterizing and Tracing Packet Floods Using Cisco R Table of Contents Characterizing and Tracing Packet Floods Using Cisco Routers...1 Introduction...1 Before You Begin...1 Conventions...1 Prerequisites...1

More information

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc. Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE

More information

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

low-level storage structures e.g. partitions underpinning the warehouse logical table structures DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures

More information

Similarity Search in a Very Large Scale Using Hadoop and HBase

Similarity Search in a Very Large Scale Using Hadoop and HBase Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

AlienVault Unified Security Management (USM) 4.x-5.x. Deployment Planning Guide

AlienVault Unified Security Management (USM) 4.x-5.x. Deployment Planning Guide AlienVault Unified Security Management (USM) 4.x-5.x Deployment Planning Guide USM 4.x-5.x Deployment Planning Guide, rev. 1 Copyright AlienVault, Inc. All rights reserved. The AlienVault Logo, AlienVault,

More information

Whitepaper: performance of SqlBulkCopy

Whitepaper: performance of SqlBulkCopy We SOLVE COMPLEX PROBLEMS of DATA MODELING and DEVELOP TOOLS and solutions to let business perform best through data analysis Whitepaper: performance of SqlBulkCopy This whitepaper provides an analysis

More information

Beyond Monitoring Root-Cause Analysis

Beyond Monitoring Root-Cause Analysis WHITE PAPER With the introduction of NetFlow and similar flow-based technologies, solutions based on flow-based data have become the most popular methods of network monitoring. While effective, flow-based

More information

We will give some overview of firewalls. Figure 1 explains the position of a firewall. Figure 1: A Firewall

We will give some overview of firewalls. Figure 1 explains the position of a firewall. Figure 1: A Firewall Chapter 10 Firewall Firewalls are devices used to protect a local network from network based security threats while at the same time affording access to the wide area network and the internet. Basically,

More information

Network forensics 101 Network monitoring with Netflow, nfsen + nfdump

Network forensics 101 Network monitoring with Netflow, nfsen + nfdump Network forensics 101 Network monitoring with Netflow, nfsen + nfdump www.enisa.europa.eu Agenda Intro to netflow Metrics Toolbox (Nfsen + Nfdump) Demo www.enisa.europa.eu 2 What is Netflow Netflow = Netflow

More information

Monitoring System Status

Monitoring System Status CHAPTER 14 This chapter describes how to monitor the health and activities of the system. It covers these topics: About Logged Information, page 14-121 Event Logging, page 14-122 Monitoring Performance,

More information

Large-Scale TCP Packet Flow Analysis for Common Protocols Using Apache Hadoop

Large-Scale TCP Packet Flow Analysis for Common Protocols Using Apache Hadoop Large-Scale TCP Packet Flow Analysis for Common Protocols Using Apache Hadoop R. David Idol Department of Computer Science University of North Carolina at Chapel Hill david.idol@unc.edu http://www.cs.unc.edu/~mxrider

More information

Emerald. Network Collector Version 4.0. Emerald Management Suite IEA Software, Inc.

Emerald. Network Collector Version 4.0. Emerald Management Suite IEA Software, Inc. Emerald Network Collector Version 4.0 Emerald Management Suite IEA Software, Inc. Table Of Contents Purpose... 3 Overview... 3 Modules... 3 Installation... 3 Configuration... 3 Filter Definitions... 4

More information

Integrating Apache Spark with an Enterprise Data Warehouse

Integrating Apache Spark with an Enterprise Data Warehouse Integrating Apache Spark with an Enterprise Warehouse Dr. Michael Wurst, IBM Corporation Architect Spark/R/Python base Integration, In-base Analytics Dr. Toni Bollinger, IBM Corporation Senior Software

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_WP_ 20121112 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD

More information

Introducing the Microsoft IIS deployment guide

Introducing the Microsoft IIS deployment guide Deployment Guide Deploying Microsoft Internet Information Services with the BIG-IP System Introducing the Microsoft IIS deployment guide F5 s BIG-IP system can increase the existing benefits of deploying

More information

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1 Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System

More information

AlienVault. Unified Security Management (USM) 5.x Policy Management Fundamentals

AlienVault. Unified Security Management (USM) 5.x Policy Management Fundamentals AlienVault Unified Security Management (USM) 5.x Policy Management Fundamentals USM 5.x Policy Management Fundamentals Copyright 2015 AlienVault, Inc. All rights reserved. The AlienVault Logo, AlienVault,

More information

Using the HP Vertica Analytics Platform to Manage Massive Volumes of Smart Meter Data

Using the HP Vertica Analytics Platform to Manage Massive Volumes of Smart Meter Data Technical white paper Using the HP Vertica Analytics Platform to Manage Massive Volumes of Smart Meter Data The Internet of Things is expected to connect billions of sensors that continuously gather data

More information

Architectures for Big Data Analytics A database perspective

Architectures for Big Data Analytics A database perspective Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum

More information

Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator

Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator WHITE PAPER Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com SAS 9 Preferred Implementation Partner tests a single Fusion

More information

Architecture Overview

Architecture Overview Architecture Overview Design Fundamentals The networks discussed in this paper have some common design fundamentals, including segmentation into modules, which enables network traffic to be isolated and

More information

INCREASE NETWORK VISIBILITY AND REDUCE SECURITY THREATS WITH IMC FLOW ANALYSIS TOOLS

INCREASE NETWORK VISIBILITY AND REDUCE SECURITY THREATS WITH IMC FLOW ANALYSIS TOOLS WHITE PAPER INCREASE NETWORK VISIBILITY AND REDUCE SECURITY THREATS WITH IMC FLOW ANALYSIS TOOLS Network administrators and security teams can gain valuable insight into network health in real-time by

More information

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES ABSTRACT EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES Tyler Cossentine and Ramon Lawrence Department of Computer Science, University of British Columbia Okanagan Kelowna, BC, Canada tcossentine@gmail.com

More information

Application of Netflow logs in Analysis and Detection of DDoS Attacks

Application of Netflow logs in Analysis and Detection of DDoS Attacks International Journal of Computer and Internet Security. ISSN 0974-2247 Volume 8, Number 1 (2016), pp. 1-8 International Research Publication House http://www.irphouse.com Application of Netflow logs in

More information

Scaling 10Gb/s Clustering at Wire-Speed

Scaling 10Gb/s Clustering at Wire-Speed Scaling 10Gb/s Clustering at Wire-Speed InfiniBand offers cost-effective wire-speed scaling with deterministic performance Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400

More information

D1.2 Network Load Balancing

D1.2 Network Load Balancing D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June ronald.vanderpol@sara.nl,freek.dijkstra@sara.nl,

More information

Network Intrusion Detection Systems. Beyond packet filtering

Network Intrusion Detection Systems. Beyond packet filtering Network Intrusion Detection Systems Beyond packet filtering Goal of NIDS Detect attacks as they happen: Real-time monitoring of networks Provide information about attacks that have succeeded: Forensic

More information

Security Event Management. February 7, 2007 (Revision 5)

Security Event Management. February 7, 2007 (Revision 5) Security Event Management February 7, 2007 (Revision 5) Table of Contents TABLE OF CONTENTS... 2 INTRODUCTION... 3 CRITICAL EVENT DETECTION... 3 LOG ANALYSIS, REPORTING AND STORAGE... 7 LOWER TOTAL COST

More information

nfdump and NfSen 18 th Annual FIRST Conference June 25-30, 2006 Baltimore Peter Haag 2006 SWITCH

nfdump and NfSen 18 th Annual FIRST Conference June 25-30, 2006 Baltimore Peter Haag 2006 SWITCH 18 th Annual FIRST Conference June 25-30, 2006 Baltimore Peter Haag 2006 SWITCH Some operational questions, popping up now and then: Do you see this peek on port 445 as well? What caused this peek on your

More information

Adaptive Flow Aggregation - A New Solution for Robust Flow Monitoring under Security Attacks

Adaptive Flow Aggregation - A New Solution for Robust Flow Monitoring under Security Attacks Adaptive Flow Aggregation - A New Solution for Robust Flow Monitoring under Security Attacks Yan Hu Dept. of Information Engineering Chinese University of Hong Kong Email: yhu@ie.cuhk.edu.hk D. M. Chiu

More information

Internet Firewall CSIS 4222. Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS 4222. net15 1. Routers can implement packet filtering

Internet Firewall CSIS 4222. Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS 4222. net15 1. Routers can implement packet filtering Internet Firewall CSIS 4222 A combination of hardware and software that isolates an organization s internal network from the Internet at large Ch 27: Internet Routing Ch 30: Packet filtering & firewalls

More information

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage

More information

EMC Unified Storage for Microsoft SQL Server 2008

EMC Unified Storage for Microsoft SQL Server 2008 EMC Unified Storage for Microsoft SQL Server 2008 Enabled by EMC CLARiiON and EMC FAST Cache Reference Copyright 2010 EMC Corporation. All rights reserved. Published October, 2010 EMC believes the information

More information

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance. Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance

More information

DATA WAREHOUSING II. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 23

DATA WAREHOUSING II. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 23 DATA WAREHOUSING II CS121: Introduction to Relational Database Systems Fall 2015 Lecture 23 Last Time: Data Warehousing 2 Last time introduced the topic of decision support systems (DSS) and data warehousing

More information

Distributed storage for structured data

Distributed storage for structured data Distributed storage for structured data Dennis Kafura CS5204 Operating Systems 1 Overview Goals scalability petabytes of data thousands of machines applicability to Google applications Google Analytics

More information

and reporting Slavko Gajin slavko.gajin@rcub.bg.ac.rs

and reporting Slavko Gajin slavko.gajin@rcub.bg.ac.rs ICmyNet.Flow: NetFlow based traffic investigation, analysis, and reporting Slavko Gajin slavko.gajin@rcub.bg.ac.rs AMRES Academic Network of Serbia RCUB - Belgrade University Computer Center ETF Faculty

More information

Network Monitoring On Large Networks. Yao Chuan Han (TWCERT/CC) james@cert.org.tw

Network Monitoring On Large Networks. Yao Chuan Han (TWCERT/CC) james@cert.org.tw Network Monitoring On Large Networks Yao Chuan Han (TWCERT/CC) james@cert.org.tw 1 Introduction Related Studies Overview SNMP-based Monitoring Tools Packet-Sniffing Monitoring Tools Flow-based Monitoring

More information

Exercise 7 Network Forensics

Exercise 7 Network Forensics Exercise 7 Network Forensics What Will You Learn? The network forensics exercise is aimed at introducing you to the post-mortem analysis of pcap file dumps and Cisco netflow logs. In particular you will:

More information

NoDB: Efficient Query Execution on Raw Data Files

NoDB: Efficient Query Execution on Raw Data Files NoDB: Efficient Query Execution on Raw Data Files Ioannis Alagiannis Renata Borovica Miguel Branco Stratos Idreos Anastasia Ailamaki EPFL, Switzerland {ioannis.alagiannis, renata.borovica, miguel.branco,

More information

Network Security Monitoring and Behavior Analysis Pavel Čeleda, Petr Velan, Tomáš Jirsík

Network Security Monitoring and Behavior Analysis Pavel Čeleda, Petr Velan, Tomáš Jirsík Network Security Monitoring and Behavior Analysis Pavel Čeleda, Petr Velan, Tomáš Jirsík {celeda velan jirsik}@ics.muni.cz Part I Introduction P. Čeleda et al. Network Security Monitoring and Behavior

More information

Performance Verbesserung von SAP BW mit SQL Server Columnstore

Performance Verbesserung von SAP BW mit SQL Server Columnstore Performance Verbesserung von SAP BW mit SQL Server Columnstore Martin Merdes Senior Software Development Engineer Microsoft Deutschland GmbH SAP BW/SQL Server Porting AGENDA 1. Columnstore Overview 2.

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

Richard Bejtlich richard@taosecurity.com www.taosecurity.com / taosecurity.blogspot.com BSDCan 14 May 04

Richard Bejtlich richard@taosecurity.com www.taosecurity.com / taosecurity.blogspot.com BSDCan 14 May 04 Network Security Monitoring with Sguil Richard Bejtlich richard@taosecurity.com www.taosecurity.com / taosecurity.blogspot.com BSDCan 14 May 04 Overview Introduction to NSM The competition (ACID, etc.)

More information

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771 ENHANCEMENTS TO SQL SERVER COLUMN STORES Anuhya Mallempati #2610771 CONTENTS Abstract Introduction Column store indexes Batch mode processing Other Enhancements Conclusion ABSTRACT SQL server introduced

More information

Final exam review, Fall 2005 FSU (CIS-5357) Network Security

Final exam review, Fall 2005 FSU (CIS-5357) Network Security Final exam review, Fall 2005 FSU (CIS-5357) Network Security Instructor: Breno de Medeiros 1. What is an insertion attack against a NIDS? Answer: An insertion attack against a network intrusion detection

More information

DBMS / Business Intelligence, SQL Server

DBMS / Business Intelligence, SQL Server DBMS / Business Intelligence, SQL Server Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the needs of IT professionals.

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2

Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2 Using Synology SSD Technology to Enhance System Performance Based on DSM 5.2 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD Cache as Solution...

More information

Hypertable Architecture Overview

Hypertable Architecture Overview WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for

More information

Limitations of Packet Measurement

Limitations of Packet Measurement Limitations of Packet Measurement Collect and process less information: Only collect packet headers, not payload Ignore single packets (aggregate) Ignore some packets (sampling) Make collection and processing

More information

LCMON Network Traffic Analysis

LCMON Network Traffic Analysis LCMON Network Traffic Analysis Adam Black Centre for Advanced Internet Architectures, Technical Report 79A Swinburne University of Technology Melbourne, Australia adamblack@swin.edu.au Abstract The Swinburne

More information

ACHIEVING STORAGE EFFICIENCY WITH DATA DEDUPLICATION

ACHIEVING STORAGE EFFICIENCY WITH DATA DEDUPLICATION ACHIEVING STORAGE EFFICIENCY WITH DATA DEDUPLICATION Dell NX4 Dell Inc. Visit dell.com/nx4 for more information and additional resources Copyright 2008 Dell Inc. THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES

More information

Intrusion Detection in AlienVault

Intrusion Detection in AlienVault Complete. Simple. Affordable Copyright 2014 AlienVault. All rights reserved. AlienVault, AlienVault Unified Security Management, AlienVault USM, AlienVault Open Threat Exchange, AlienVault OTX, Open Threat

More information

Performance Guideline for syslog-ng Premium Edition 5 LTS

Performance Guideline for syslog-ng Premium Edition 5 LTS Performance Guideline for syslog-ng Premium Edition 5 LTS May 08, 2015 Abstract Performance analysis of syslog-ng Premium Edition Copyright 1996-2015 BalaBit S.a.r.l. Table of Contents 1. Preface... 3

More information

Firewalls Overview and Best Practices. White Paper

Firewalls Overview and Best Practices. White Paper Firewalls Overview and Best Practices White Paper Copyright Decipher Information Systems, 2005. All rights reserved. The information in this publication is furnished for information use only, does not

More information

4 Internet QoS Management

4 Internet QoS Management 4 Internet QoS Management Rolf Stadler School of Electrical Engineering KTH Royal Institute of Technology stadler@ee.kth.se September 2008 Overview Network Management Performance Mgt QoS Mgt Resource Control

More information

Index Terms Domain name, Firewall, Packet, Phishing, URL.

Index Terms Domain name, Firewall, Packet, Phishing, URL. BDD for Implementation of Packet Filter Firewall and Detecting Phishing Websites Naresh Shende Vidyalankar Institute of Technology Prof. S. K. Shinde Lokmanya Tilak College of Engineering Abstract Packet

More information

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems Chapter 13 File and Database Systems Outline 13.1 Introduction 13.2 Data Hierarchy 13.3 Files 13.4 File Systems 13.4.1 Directories 13.4. Metadata 13.4. Mounting 13.5 File Organization 13.6 File Allocation

More information

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems Chapter 13 File and Database Systems Outline 13.1 Introduction 13.2 Data Hierarchy 13.3 Files 13.4 File Systems 13.4.1 Directories 13.4. Metadata 13.4. Mounting 13.5 File Organization 13.6 File Allocation

More information

Firewalls, Tunnels, and Network Intrusion Detection

Firewalls, Tunnels, and Network Intrusion Detection Firewalls, Tunnels, and Network Intrusion Detection 1 Part 1: Firewall as a Technique to create a virtual security wall separating your organization from the wild west of the public internet 2 1 Firewalls

More information