Comparing Files Using Structural Entropy
|
|
|
- Bonnie Goodwin
- 9 years ago
- Views:
Transcription
1 Comparing Files Using Structural Entropy Ivan Sorokin Doctor Web s Virus Lab, Ltd., 2 Malaya Monetnaya st., 970 Saint-Petersburg, Russia [email protected]; [email protected] Abstract. One of the main trends in the modern anti-virus industry is the development of algorithms that help estimate the similarity of files. Since malware writers tend to use increasingly complex techniques to protect their code such as obfuscation and polymorphism, anti-virus software vendors face problems of the increasing difficulty of file scanning, the considerable growth of anti-virus databases, and file storages overgrowth. For solving such problems, a static analysis of files appears to be of some interest. Its use helps determine those file characteristics that are necessary for their comparison without executing malware samples within a protected environment. The solution provided in this article is based on the assumption that different samples of the same malicious program have a similar order of code and data areas. Each such file area may be characterized not only by its length, but also by its homogeneity. In other words, the file may be characterized by the complexity of its data order. Our approach consists of using wavelet analysis for the segmentation of files into segments of different entropy levels and using edit distance between sequence segments to determine the similarity of the files. The proposed solution has a number of advantages that help detect malicious programs efficiently on personal computers. First, this comparison does not take into account the functionality of analysed files and is based solely on determining the similarity in code and data area positions which makes the algorithm effective against many ways of protecting executable code. On the other hand, such a comparison may result in false alarms. Therefore, our solution is useful as a preliminary test that triggers the running of additional checks. Second, the method is relatively easy to implement and does not require code disassembly or emulation. And, third, the method makes the malicious file record compact which is significant when compiling anti-virus databases.. Introduction One of the main trends in the modern anti-virus industry is the development of algorithms that help estimate the similarity of files. Since malware writers tend to use increasingly complex techniques to protect their code, e.g., obfuscation and polymorphism (Christodorescu & Jha, 2004), anti-virus software vendors face several problems. First, there is the issue of the increasing difficulty of file scanning (e.g., due to additional emulation). Second, the use of outdated signature detection methods results in the considerable growth of anti-virus databases. Third, there is also a problem of filling file storages used by anti-virus software vendors with loads of sample files (Jacob et al, 200). For solving such problems, a static analysis of files appears to be of some interest. Its use helps determine those file characteristics that are necessary for their comparison without executing malware samples within a protected environment.
2 Aside from the presence of similar byte sequences or headers in executable files undergoing comparison, a decision on their similarity may be drawn from more complex features such as file code patterns and data structures, e.g., a unique sequence of function calls or processor instructions. The solution provided in this article is based on the assumption that different samples (i.e., files) of the same malicious program have a similar order of code and data areas. Each such file area may be characterized not only by its length (i.e., the number of bytes), but also by its homogeneity (i.e., distinction of bytes). In other words, the file may be characterized by the complexity of its data order. To indicate this characteristic of a file, we use the concept of structural entropy (Prangišvili, 2003). Our approach consists of two main parts. The first stage includes using wavelet analysis (Daubechies, 992) for the segmentation of files into segments of different entropy levels. In the second stage, we use edit distance between sequence segments to determine the similarity of the files (Wagner and Fischer, 974). In summary, the main contribution of this article is the following: A description of an algorithm for the segmentation of files into segments that are characterized by length and average entropy. A review of the sequence alignment technique to compare files represented by sequences of segments. 2. Related Work As was mentioned above, our solution is based on two basic techniques: entropy analysis and sequence alignment. Both of these approaches have ever-widening application in information security. Entropy analysis allows for the estimation of the package and encryption level of data. Such estimations may serve as a step in detecting packed data. Lyda and Hamrock (2007) calculate the average and maximum entropy of a whole segmented file to identify packed and encrypted data. Perdisci et al. (2008) calculate the entropy of individual segments of executable files. Together with other characteristics, this method allows for the effective use of pattern recognition techniques to classify files into packed and unpacked categories. Ebringer, Sun, Boztas (2008) and then later Sun, Versteeg, Boztas, Yann (200) use Huffman codes to estimate entropy. By calculating a code for each byte, they use a sliding window method to build an entropy map of the whole file. This allows for a more detailed comparison of files and classifying them by packer type. Breitenbacher (200) uses his own algorithm for estimating the randomness of 6-byte blocks. Unfortunately his research is limited to reviewing an entropy map of the whole file. In most cases, malicious code or activity can be represented as a sequence of elements or events. This allows for the use of comparison algorithms based on sequence alignment. For instance, the Smith-Waterman local alignment algorithm for sequences is convenient to use for the detection of malicious network traffic (Newsome, Karp, Song, 2005). In their subsequent research of network traffic analysis, Kreibich and Crowcroft (2006) propose an improved version of the Jacobson-Vo local alignment algorithm that is based on the identification of the longest increasing subsequence. Another approach (Fabjanski and Kruk, 2008) utilizes multiple sequence alignment (MSA) methods. The analysis of executable files is another field of use for sequence comparison algorithms. For example, in several articles (Sung et al., 2004; Gheorghescu, 2005; Wagener et al., 2007; Li et al., 2009), alignment algorithms are used for a behaviour comparison of malicious programs.
3 3. Methodology The proposed solution lies in the static analysis of files. We do not take into account file types; that is, we ignore PE header attributes of Windows executables. The only thing of importance for us is file structure, that is, the order of its distinctive code and data areas. To determine such areas, we build an entropy map of the whole file first and then use wavelet analysis to segment it (see Section 3.). When we have a representation of a file as a sequence of segments, we compare the file with other files using edit distance (see Section 3.2). Therefore, the algorithm as a whole includes the following stages: file segmentation and sequence comparison. 3.. File Segmentation Any file can be characterized by properties of data it contains. For instance, one may consider how well-ordered the data are or how much space the data occupy. Among other things, if we take a look at executable files, we may notice that they contain data of various kinds: executable code, text, and packed data. All of these file areas differ not only in size, but also in the level of informational entropy. When an executable file may be considered as a system of such elements, then we can use the term structural entropy (Prangišvili, 2003) for its characterization. Therefore, the main purpose of the suggested segmentation algorithm is splitting the file into segments that are characterised by size and entropy. 3.. Entropy Analysis Initially the sliding window method is used to represent the source file as a time series Y y : i,..., N, where N is the total number of windows. To calculate entropy within each i window, we use the Shannon s formula: m y p( j)log p( j ), () i j 2 where p ( j) is the frequency of occurrence of the j -th byte within the i -th window, and m is a number of different bytes in the window. Please note that we consider the frequency of a byte s occurrence in an individual window and not within the whole file. This helps keep the window entropy level from depending on other bytes in the file. For instance, some researchers (Ebringer et al., 2008; Sun et al., 200) calculate Huffman codes across the whole file. This results in different entropy diagrams for files of similar structure but differing length Wavelet Analysis The main task when segmenting a file is to determine those places within it where average entropy changes. We suggest using wavelet analysis (Daubechies, 992) to extract this information from our resulting time series Y. The essence of the analysis follows. First, we choose a mother wavelet whose properties determine our ability to identify changes in analyzed data. Second, we calculate the wavelet transform of various scales. The obtained wavelet coefficients will contain information on the correlation between the used wavelet and the analyzed time series. As a result, we will be able to determine segments by analysing significant wavelet coefficients.
4 For the mother wavelet, we choose the Haar wavelet which has an asymmetrical form and whose zero moment equals zero: HAAR, 0 t / 2, ( t),/ 2 t, 0, t 0, t. (2) Since continuous wavelet transform (CWT) is redundant due to the continuous change of scale coefficient and shift parameter, it is more cumbersome than discrete wavelet transform (DWT). Therefore, we use the following estimate to calculate DWT: W ( a, b) a /2 N i y i HAAR t i b a, (3) where a is a scale parameter, b is a mother wavelet shift parameter, y i is an informational entropy level within the i -th window, and N is the total number of windows in the file. The main peculiarity of DWT is that the scale parameter a changes according to a power of 2. This means, first, that we can use multi-resolution analysis which allows us to use the values determined on the previous scale on each next scale of transformation. This results in a reduced number of reduced number of mathematical operations involving addition. Second, this placed a restriction on the source data. The number of counts of the time series should be divisible by a power of 2: a 2 n n where n is the maximum scale. Therefore, we need to increase the time series on each side with averaged values. From the received coefficients, we need to identify significant ones, i.e., the local extremums that have the maximum or minimum by the a and b variables. If all of the points of the local extremums in the time-scale plane are connected, then the resulting lines will build a skeleton. These lines represent the structure of the analysed data in full. Therefore, the segmentation algorithm s main task lies in building the skeleton of input data that is used afterwards to identify segments. The total number of segments is determined by significant wavelet coefficients on the maximum scale, while their limits are determined by significant wavelet coefficients on the minimal scale of transformation Sequence Comparison In most cases, similar malicious files are alike in terms of size; therefore, we will use global alignment to compare them. This method allows us to compare whole sequences while taking into account all of their elements. In turn, algorithms based on local alignment are applied mostly to sequences that differ in size and have just a few similar fragments. For global alignment of sequences, we will use the Wagner-Fischer dynamic programming method based on the Levenshtein distance. Using this method, insertion, deletion and substitution operations will receive penalties depending on the characteristics of the compared elements (see Section 3.2.).
5 The comparison will be carried out in two steps. First, we will align the sequences (see Section 3.2.2), i.e., we will look for the correspondence between similar elements. Then we will estimate the total degree of similarity between two sequences (see Section 3.2.3) Edit Cost Function Since each element of a sequence is identified by two characteristics (size and entropy), we need to select a general cost function that will determine a normalized penalty value for the mismatching of two elements depending on the difference in their sizes and averaged entropy values. We set the range for this function between zero and a certain constant which will indicate the absolute similarity and absolute difference of two sequences accordingly. So, by selecting such a function, we will be able to align two sequences. Denoting the sizes of two elements by size and size 2 while denoting their averaged entropy by ent and ent 2, we may set the size penalty according to the following function: size size cost s size size 2 2 (4) Here at least the size of one of the compared elements should be non-zero. For this formula, the maximum penalty for the difference in sizes equals. If the sizes are equal, then there is no penalty. Now, let us set a penalty for the difference in entropy: coste exp( 4 ent ent 6.5) 2 (5) In this formula, we use a sigmoid (see Figure ). Through its form we can regulate the normalized penalty value differently. In this case, two elements with a difference in entropy starting from 2 bits are considered different. Figure : Sigmoid for normalizing the difference in entropy between two segments The total penalty for two segments is calculated as a sum of penalties for difference in size (4) and entropy (5): cost cost PART _ SIZE cost PART _ ENT (6) s The use of coefficients in formula (6) allows the fraction of penalty for size or entropy to be set differently when comparing two segments. e
6 3.2.2 Sequence Alignment After we decide on the general cost function, we can determine an alignment algorithm for two sequences. Like any algorithm based on the Levenshtein distance, our algorithm utilizes dynamic programming. We set an edit matrix d, i.e., a two-dimensional array in which each element determines the comparison of corresponding subsequences. The essence of the algorithms lies in filling in this array and determining the last element that represents the resulting penalty for comparing two sequences. Regardless of the fact that the general cost function takes into account differences in both size and the averaged entropy values of two elements, we will also additionally account for the logarithmic sizes of the corresponding elements when filling in array d. In addition, to allow for more flexible adjustment of total penalty value, we will use the constant TAX which represents the average share of the penalty for all elements. So, when filling in the first column, which represents deletion of corresponding elements from the first sequence s, we will use the following formula: d[ i][0] d[ i -][0] TAX log 0( s[ i -]. size), i length( s ). Likewise, to fill in the first row, which represents insertion of corresponding elements from the second sequence s 2, we use a similar formula: d[0][ j] d[0][ j ] TAX log 0( s2[ j ]. size), j length ( s 2). All other elements of array d are set according to the following formula: d[ i][ j] cost( s [ i], s [ j]) log (( s [ i]. size s [ j]. size) / 2) d[ i ][ j ] min d[ i][ j ] TAX log ( s [ i]. size) 0 d[ i ][ j] TAX log ( s [ j]. size) 0 2 (7) In each step, we select one of the three minimal values. If the first summand is minimal, then it indicates that two elements are replaced. In this case, the edit operation receives a penalty not only from the cost function (6), but also from the average size of the two elements in logarithmic dependence. If the second summand is minimal, then it indicates that the element from the first sequence s is deleted. In this case, the operation receives a penalty depending on the size of the area. Finally, if the third summand is minimal, then it indicates that the element from the second sequence s 2 is inserted. The penalty in this case also depends on the size of the area. So, the resulting array d contains information on penalties received when comparing corresponding subsequences, and, therefore, its last element represents how large the penalty is when comparing the whole sequences s and s 2. If we follow the array from its end while taking into account minimal values, then we obtain the full alignment of two sequences (Wagner & Fisher) Estimating the Degree of Similarity To estimate the degree of similarity between two sequences, we need to determine the maximum penalty that their comparison could have received. The maximum penalty means that all elements from the first sequence are deleted, and all elements from the second sequence are
7 inserted with the corresponding penalties. The value is already calculated when filling in array d. Namely, it equals the sum of the last element in the first row and the last element in the first column. A good rule of thumb is to increase the estimate of the maximum penalty and thus bring together two sequences. For this, we need to recalculate the maximum penalty while taking into account the performed alignment: 2 TAX (log ( s [ i]. size) log ( s [ j]. size)), s [ i] substitution s [ j] cost_max TAX log ( s [ i]. size), delete s [ i] TAX log ( s [ j]. size), insert s [ j] (8) In other words, calculating value cost_max is similar to filling in the first column and the first row of array d, with the only difference being that we artificially increase the penalty by doubling it when replacing two elements. Given that the last element of array d represent the true difference between two sequences while the maximal penalty cost_max represents how different the subsequences could have been in the worst case scenario, the resulting degree of similarity may be calculated as follows: 4. Experiment d[ length( s )][ length( s )] cost_max 2 similarity (9) To demonstrate the capabilities of the described method, let us examine the comparison of two files that differ in structure but belong to the same family of malicious programs. The structural differences are explained by the peculiarities of the polymorphic packer used to protect malicious functionality. To detect these malware, Dr.Web Anti-virus uses a special procedure that analyses the functionality of an executable file. If particular evidence is found, the anti-virus reports the detection of BackDoor.Tdss.based.7. Use of the suggested method allows for the similarity of such files to be detected on the basis of their structural entropy only. Figures 3 and 4 display entropy diagrams of the first and second file accordingly. The diagrams are built using the sliding window method. For illustration purposes, intervals with packed data are shortened. As a window size, we selected 256 bytes. Therefore, the maximum entropy level in one window can reach up to 8 bit (). For a window shift, we use 28 byte. Such a selection means that our algorithm is effective on files of 2 and more megabytes. First, let us examine the segmentation algorithm in respect to the first file (see Figure 3). To understand the capability of wavelet analysis used for segmentation, see Figure 2 which displays wavelet coefficients surface W ( a, b ) built using DWT. For this transform, we used the Haar wavelet as a mother wavelet. Each point on the surface represents the compliance of the source data with the selected mother wavelet. In this figure, you can see that at larger transformation scales, insignificant changes in source data are ignored and vice versa; on the lower scales, there is more detail. Therefore, the maximal transformation scale determines the resulting number of segments, while the minimal scale is responsible for accuracy within the limits of obtained segments.
8 Figure 2: Wavelet coefficients built using DWT on the first 60 counts of the first file To compare two files, we need to apply the following parameters to our segmentation algorithm. Let us set the maximum transformation scale to 6, which would mean that the number of wavelet translators is 4, i.e., formula (3) will be computed four times. The threshold limit for determining significant wavelet coefficients will be set to 0.5, which means that we will ignore peaks of wavelet coefficients less than 0.5 in height (as in Figure 2) when segmenting the file. As a result, when using this algorithm, we will receive segments whose borders are displayed at the top of the diagrams in Figures 3 and 4. Figure 3: Entropy diagram of the first file In these diagrams, you may see differences in the structure of the selected files. In the first file, the 5-th segment is located to the right of the 3-th segment of the second file. This is explained by the fact that the polymorphic packer placed the compressed data in a different order: in the first file, these data are placed after the import section (3-th segment), while in the second file, they are placed before the import section (4-th segment). We should also note that in the first file, the segmentation algorithm singled out the area with zero entropy (4-th segment). A similar area is also present in the second file (windows from 03 to 06), but it is shorter on one window and is placed to the left of the low entropy area. Therefore, the segmentation algorithm has not split the 5-th segment of the second file. Other areas of the files have similar characteristics and equal positions.
9 Figure 4: Entropy diagram of the second file After representing files as sequences of segments each of which is characterized by its size and averaged entropy, we apply our alignment algorithm. But first, we need to determine setting parameters. Let us set the coefficients from formula (6) as follows: PART _ ENT 0.6, PART _ SIZE.4. This will increase the effect of the difference in the size of the compared elements when calculating penalties. The parameter TAX will be set to 0.3. This means that the cost function for two mismatching elements will return 0.3 on average. After that, let us fill in array d (see Table ). Note that column 0 in this table represents the cost of deleting all elements from the first sequence, while row 0 represents the cost of inserting all elements from the second sequence. In other words, in order to turn the first sequence into the second sequence, we need to delete all elements from the first sequence and then insert all elements from the second sequence. Table : Cost array for comparing the segment sequences of two files Once we have filled in array d, let us examine the alignment procedure for two sequences (see Table 2). The procedure involves progressing through Table from its lower right corner. Shifting to an element with the lowest cost, we determine one of the three edit operations. If the element with the lowest cost is located to the left of the current element, then it means that the current element from the second sequence is inserted. An example is the 4-th segment of the second file. If we shift vertically, then it indicates that the element from the first sequence is deleted. Other examples are the 3- and 4-th segments of the first file. If we shift diagonally, then it indicates the substitution of the corresponding elements.
10 # Number of windows Table 2: Alignment of the segment sequences of two files Entropy (bit) # Number of windows Entropy (bit) Real penalty Max. penalty As a result, to determine the degree of similarity between two sequences, we need to compare the real penalty to the maximum one. By real penalty, we understand this to be the resulting penalty received after filling in array d (Table ). The maximum penalty represents how different the sequences could have been. The maximum penalty is calculated after alignment is completed in order to increase the penalty when comparing similar elements. For instance, if we look at the first segments of the files in our example, we will see in Table that the deletion or insertion of each of them receives a penalty of That means that the maximum penalty after an artificial increase will be Such penalty will push away the compared sequences, though we can see that both segments are very alike. Therefore, we increase the maximal penalty and, taking into account formula (8), obtain the value of.04. So, when we determine the share of the real penalty in the maximum one (9), we determine the degree of similarity between two sequences. In our example it equals %. 5. Conclusion The proposed solution has a number of advantages that help detect malicious programs efficiently on personal computers. First, this comparison does not take into account the functionality of analysed files and is based solely on determining the similarity in code and data area positions. Therefore, the algorithm is effective against many ways of protecting executable code. On the other hand, such a comparison may result in false alarms. Therefore, our solution is useful as a preliminary test that triggers the running of additional checks. Second, the method is relatively easy to implement and does not require code disassembly or emulation. And, third, the malicious file record is compact which is significant when compiling anti-virus databases.
11 References Z. Breitenbacher. Entropy based detection of polymorphic malware. Proceedings of the 9 th Annual EICAR Conference ICT Security: Quo Vadis?, pp. 7-28, 200. M. Christodorescu, S. Jha. Testing malware detectors. Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis, pp , I. Daubechies, Ten Lectures on Wavelets, SIAM, p. 377, 992. R. Ebringer, L.Sun, S. Boztas. A fast randomness test that preserves local detail. Proceedings of the Virus Bulletin (VB) Conference, pp , K. Fabjanski, T. Kruk. Network traffic classification by common subsequence finding. In M. Bubak, G. van Albada, P. Sloot (Eds.), Computational Science - ICCS 2008, Vol. 50, pp , M. Gheorghescu. An automated virus classification system. Proceedings of the Virus Bulletin (VB) Conference, pp , G. Jacob, M. Neugschwandtner, P. M. Comparetti, C. Kruegel, G. Vigna. A static, packer-agnostic filter to detect similar malware samples. Department of Computer Science University of California Santa Barbara Technical Report, Retrieved 29 November 200 from C. Kreibich, J. Crowcroft. Efficient sequence alignment of network traffic. Proceedings of Internet Measurement Conference, pp , J. Li, J. Xu, M. Xu, H. Zhao, N. Zheng. Malware obfuscation measuring via evolutionary similarity. Proceedings of the International Conference on Future Information Networks, pp , R. Lyda, J. Hamrock. Using entropy analysis to find encrypted and packed malware. IEEE Security and Privacy, 5(2), pp , J. Newsome, B. Karp, D. Song. Polygraph: Automatically generating signatures for polymorphic worms. Proceedings of the 2005 IEEE Symposium on Security and Privacy, pp , R. Perdisci, A. Lanzi, W. Lee. Classification of packed executables for accurate computer virus detection. Pattern Recognition Letters, 29(4), pp , I. V. Prangišvili. Èntropijnye i drugie sistemnye zakonomernosti. Voprosy upravlenija složnymi sistemami [Entropy and other system laws. Issues of managing complex systems], Trapeznikov Institute of Control Sciences, Russian Academy of Sciences, Moscow, Nauka, p. 428, L. Sun, S. Versteeg, S. Boztas, T. Yann. Pattern recognition techniques for the classification of malware packers. Proceedings of the 5 th Australian Conference on Information Security and Privacy, pp , 200. A. H. Sung, J. Xu, P. Chavez, S. Mukkamala. Static analyzer of vicious executables (SAVE). Proceedings of the 20th Annual Computer Security Applications Conference, pp , G. Wagener, R. State, A. Dulaunoy. Malware behaviour analysis, extended version. Journal in Computer Virology, 4(4), pp , R.A. Wagner, M.J. Fischer. The string-to-string correction problem. Journal of the ACM, 2(), pp , 974.
Wavelet analysis. Wavelet requirements. Example signals. Stationary signal 2 Hz + 10 Hz + 20Hz. Zero mean, oscillatory (wave) Fast decay (let)
Wavelet analysis In the case of Fourier series, the orthonormal basis is generated by integral dilation of a single function e jx Every 2π-periodic square-integrable function is generated by a superposition
Application of Data Mining based Malicious Code Detection Techniques for Detecting new Spyware
Application of Data Mining based Malicious Code Detection Techniques for Detecting new Spyware Cumhur Doruk Bozagac Bilkent University, Computer Science and Engineering Department, 06532 Ankara, Turkey
Image Compression through DCT and Huffman Coding Technique
International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul
A Simple Feature Extraction Technique of a Pattern By Hopfield Network
A Simple Feature Extraction Technique of a Pattern By Hopfield Network A.Nag!, S. Biswas *, D. Sarkar *, P.P. Sarkar *, B. Gupta **! Academy of Technology, Hoogly - 722 *USIC, University of Kalyani, Kalyani
New Hash Function Construction for Textual and Geometric Data Retrieval
Latest Trends on Computers, Vol., pp.483-489, ISBN 978-96-474-3-4, ISSN 79-45, CSCC conference, Corfu, Greece, New Hash Function Construction for Textual and Geometric Data Retrieval Václav Skala, Jan
Arithmetic and Algebra of Matrices
Arithmetic and Algebra of Matrices Math 572: Algebra for Middle School Teachers The University of Montana 1 The Real Numbers 2 Classroom Connection: Systems of Linear Equations 3 Rational Numbers 4 Irrational
Malware Detection Module using Machine Learning Algorithms to Assist in Centralized Security in Enterprise Networks
Malware Detection Module using Machine Learning Algorithms to Assist in Centralized Security in Enterprise Networks Priyank Singhal Student, Computer Engineering Sardar Patel Institute of Technology University
SPAMfighter Mail Gateway
SPAMfighter Mail Gateway User Manual Copyright (c) 2009 SPAMfighter ApS Revised 2009-05-19 1 Table of contents 1. Introduction...3 2. Basic idea...4 2.1 Detect-and-remove...4 2.2 Power-through-simplicity...4
Mahalanobis Distance Map Approach for Anomaly Detection
Edith Cowan University Research Online Australian Information Security Management Conference Security Research Institute Conferences 2010 Mahalanobis Distance Map Approach for Anomaly Detection Aruna Jamdagnil
ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
ECE 533 Project Report Ashish Dhawan Aditi R. Ganesan
Handwritten Signature Verification ECE 533 Project Report by Ashish Dhawan Aditi R. Ganesan Contents 1. Abstract 3. 2. Introduction 4. 3. Approach 6. 4. Pre-processing 8. 5. Feature Extraction 9. 6. Verification
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding
TOOLS FOR T-RFLP DATA ANALYSIS USING EXCEL
TOOLS FOR T-RFLP DATA ANALYSIS USING EXCEL A collection of Visual Basic macros for the analysis of terminal restriction fragment length polymorphism data Nils Johan Fredriksson TOOLS FOR T-RFLP DATA ANALYSIS
PE Explorer. Heaventools. Malware Code Analysis Made Easy
Heaventools PE Explorer Data Sheet Malware Code Analysis Made Easy Reverse engineers within the anti-virus, vulnerability research and forensics companies face the challenge of analysing a large number
Detecting Internet Worms Using Data Mining Techniques
Detecting Internet Worms Using Data Mining Techniques Muazzam SIDDIQUI Morgan C. WANG Institute of Simulation & Training Department of Statistics and Actuarial Sciences University of Central Florida University
Solving Systems of Linear Equations Using Matrices
Solving Systems of Linear Equations Using Matrices What is a Matrix? A matrix is a compact grid or array of numbers. It can be created from a system of equations and used to solve the system of equations.
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Chapter 2 Data Storage
Chapter 2 22 CHAPTER 2. DATA STORAGE 2.1. THE MEMORY HIERARCHY 23 26 CHAPTER 2. DATA STORAGE main memory, yet is essentially random-access, with relatively small differences Figure 2.4: A typical
DYNAMIC DOMAIN CLASSIFICATION FOR FRACTAL IMAGE COMPRESSION
DYNAMIC DOMAIN CLASSIFICATION FOR FRACTAL IMAGE COMPRESSION K. Revathy 1 & M. Jayamohan 2 Department of Computer Science, University of Kerala, Thiruvananthapuram, Kerala, India 1 [email protected]
CAS : A FRAMEWORK OF ONLINE DETECTING ADVANCE MALWARE FAMILIES FOR CLOUD-BASED SECURITY
CAS : A FRAMEWORK OF ONLINE DETECTING ADVANCE MALWARE FAMILIES FOR CLOUD-BASED SECURITY ABHILASH SREERAMANENI DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEOUL NATIONAL UNIVERSITY OF SCIENCE AND TECHNOLOGY
Predict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
KEITH LEHNERT AND ERIC FRIEDRICH
MACHINE LEARNING CLASSIFICATION OF MALICIOUS NETWORK TRAFFIC KEITH LEHNERT AND ERIC FRIEDRICH 1. Introduction 1.1. Intrusion Detection Systems. In our society, information systems are everywhere. They
Chapter 6. Linear Programming: The Simplex Method. Introduction to the Big M Method. Section 4 Maximization and Minimization with Problem Constraints
Chapter 6 Linear Programming: The Simplex Method Introduction to the Big M Method In this section, we will present a generalized version of the simplex method that t will solve both maximization i and
Big Ideas in Mathematics
Big Ideas in Mathematics which are important to all mathematics learning. (Adapted from the NCTM Curriculum Focal Points, 2006) The Mathematics Big Ideas are organized using the PA Mathematics Standards
Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability
Classification of Fingerprints Sarat C. Dass Department of Statistics & Probability Fingerprint Classification Fingerprint classification is a coarse level partitioning of a fingerprint database into smaller
Simultaneous Gamma Correction and Registration in the Frequency Domain
Simultaneous Gamma Correction and Registration in the Frequency Domain Alexander Wong [email protected] William Bishop [email protected] Department of Electrical and Computer Engineering University
LASTLINE WHITEPAPER. Why Anti-Virus Solutions Based on Static Signatures Are Easy to Evade
LASTLINE WHITEPAPER Why Anti-Virus Solutions Based on Static Signatures Are Easy to Evade Abstract Malicious code is an increasingly important problem that threatens the security of computer systems. The
Amino Acids and Their Properties
Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that
A Wavelet Based Prediction Method for Time Series
A Wavelet Based Prediction Method for Time Series Cristina Stolojescu 1,2 Ion Railean 1,3 Sorin Moga 1 Philippe Lenca 1 and Alexandru Isar 2 1 Institut TELECOM; TELECOM Bretagne, UMR CNRS 3192 Lab-STICC;
Load Balancing Algorithm Based on Services
Journal of Information & Computational Science 10:11 (2013) 3305 3312 July 20, 2013 Available at http://www.joics.com Load Balancing Algorithm Based on Services Yufang Zhang a, Qinlei Wei a,, Ying Zhao
A NEW APPROACH FOR COMPLEX ENCRYPTING AND DECRYPTING DATA
A NEW APPROACH FOR COMPLEX ENCRYPTING AND DECRYPTING DATA ABSTRACT Obaida Mohammad Awad Al-Hazaimeh Department of Information Technology, Al-balqa Applied University, AL-Huson University College, Irbid,
LASTLINE WHITEPAPER. In-Depth Analysis of Malware
LASTLINE WHITEPAPER In-Depth Analysis of Malware Abstract Malware analysis is the process of determining the purpose and functionality of a given malware sample (such as a virus, worm, or Trojan horse).
TIM 50 - Business Information Systems
TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz March 1, 2015 The Database Approach to Data Management Database: Collection of related files containing records on people, places, or things.
PowerScheduler Load Process User Guide. PowerSchool Student Information System
PowerSchool Student Information System Released November 18, 2008 Document Owner: Documentation Services This edition applies to Release 5.2 of the PowerSchool software and to all subsequent releases and
Face detection is a process of localizing and extracting the face region from the
Chapter 4 FACE NORMALIZATION 4.1 INTRODUCTION Face detection is a process of localizing and extracting the face region from the background. The detected face varies in rotation, brightness, size, etc.
Characterization and Modeling of Packet Loss of a VoIP Communication
Characterization and Modeling of Packet Loss of a VoIP Communication L. Estrada, D. Torres, H. Toral Abstract In this work, a characterization and modeling of packet loss of a Voice over Internet Protocol
Concepts of digital forensics
Chapter 3 Concepts of digital forensics Digital forensics is a branch of forensic science concerned with the use of digital information (produced, stored and transmitted by computers) as source of evidence
A Review And Evaluations Of Shortest Path Algorithms
A Review And Evaluations Of Shortest Path Algorithms Kairanbay Magzhan, Hajar Mat Jani Abstract: Nowadays, in computer networks, the routing is based on the shortest path problem. This will help in minimizing
This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.
This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy
BMI Paper The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy Faculty of Sciences VU University Amsterdam De Boelelaan 1081 1081 HV Amsterdam Netherlands Author: R.D.R.
Tracking Moving Objects In Video Sequences Yiwei Wang, Robert E. Van Dyck, and John F. Doherty Department of Electrical Engineering The Pennsylvania State University University Park, PA16802 Abstract{Object
5.3 The Cross Product in R 3
53 The Cross Product in R 3 Definition 531 Let u = [u 1, u 2, u 3 ] and v = [v 1, v 2, v 3 ] Then the vector given by [u 2 v 3 u 3 v 2, u 3 v 1 u 1 v 3, u 1 v 2 u 2 v 1 ] is called the cross product (or
Research on Task Planning Based on Activity Period in Manufacturing Grid
Research on Task Planning Based on Activity Period in Manufacturing Grid He Yu an, Yu Tao, Hu Da chao Abstract In manufacturing grid (MG), activities of the manufacturing task need to be planed after the
Error Log Processing for Accurate Failure Prediction. Humboldt-Universität zu Berlin
Error Log Processing for Accurate Failure Prediction Felix Salfner ICSI Berkeley Steffen Tschirpke Humboldt-Universität zu Berlin Introduction Context of work: Error-based online failure prediction: error
Speed Performance Improvement of Vehicle Blob Tracking System
Speed Performance Improvement of Vehicle Blob Tracking System Sung Chun Lee and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA [email protected], [email protected] Abstract. A speed
Alternative Biometric as Method of Information Security of Healthcare Systems
Alternative Biometric as Method of Information Security of Healthcare Systems Ekaterina Andreeva Saint-Petersburg State University of Aerospace Instrumentation Saint-Petersburg, Russia [email protected]
Chapter 7 Memory Management
Operating Systems: Internals and Design Principles Chapter 7 Memory Management Eighth Edition William Stallings Frame Page Segment A fixed-length block of main memory. A fixed-length block of data that
Web Forensic Evidence of SQL Injection Analysis
International Journal of Science and Engineering Vol.5 No.1(2015):157-162 157 Web Forensic Evidence of SQL Injection Analysis 針 對 SQL Injection 攻 擊 鑑 識 之 分 析 Chinyang Henry Tseng 1 National Taipei University
Behavioral Entropy of a Cellular Phone User
Behavioral Entropy of a Cellular Phone User Santi Phithakkitnukoon 1, Husain Husna, and Ram Dantu 3 1 [email protected], Department of Comp. Sci. & Eng., University of North Texas [email protected], Department
A Fast Pattern-Matching Algorithm for Network Intrusion Detection System
A Fast Pattern-Matching Algorithm for Network Intrusion Detection System Jung-Sik Sung 1, Seok-Min Kang 2, Taeck-Geun Kwon 2 1 ETRI, 161 Gajeong-dong, Yuseong-gu, Daejeon, 305-700, Korea [email protected]
10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method
578 CHAPTER 1 NUMERICAL METHODS 1. ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS As a numerical technique, Gaussian elimination is rather unusual because it is direct. That is, a solution is obtained after
Force on Moving Charges in a Magnetic Field
[ Assignment View ] [ Eðlisfræði 2, vor 2007 27. Magnetic Field and Magnetic Forces Assignment is due at 2:00am on Wednesday, February 28, 2007 Credit for problems submitted late will decrease to 0% after
Learning in Abstract Memory Schemes for Dynamic Optimization
Fourth International Conference on Natural Computation Learning in Abstract Memory Schemes for Dynamic Optimization Hendrik Richter HTWK Leipzig, Fachbereich Elektrotechnik und Informationstechnik, Institut
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 3: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major
Drawing a histogram using Excel
Drawing a histogram using Excel STEP 1: Examine the data to decide how many class intervals you need and what the class boundaries should be. (In an assignment you may be told what class boundaries to
On Entropy in Network Traffic Anomaly Detection
On Entropy in Network Traffic Anomaly Detection Jayro Santiago-Paz, Deni Torres-Roman. Cinvestav, Campus Guadalajara, Mexico November 2015 Jayro Santiago-Paz, Deni Torres-Roman. 1/19 On Entropy in Network
Statistical Modeling of Huffman Tables Coding
Statistical Modeling of Huffman Tables Coding S. Battiato 1, C. Bosco 1, A. Bruna 2, G. Di Blasi 1, G.Gallo 1 1 D.M.I. University of Catania - Viale A. Doria 6, 95125, Catania, Italy {battiato, bosco,
Solving Simultaneous Equations and Matrices
Solving Simultaneous Equations and Matrices The following represents a systematic investigation for the steps used to solve two simultaneous linear equations in two unknowns. The motivation for considering
Polymorphic Worm Detection Using Structural Information of Executables
Polymorphic Worm Detection Using Structural Information of Executables Christopher Kruegel 1,EnginKirda 1, Darren Mutz 2, William Robertson 2, and Giovanni Vigna 2 1 Technical University of Vienna [email protected]
Neovision2 Performance Evaluation Protocol
Neovision2 Performance Evaluation Protocol Version 3.0 4/16/2012 Public Release Prepared by Rajmadhan Ekambaram [email protected] Dmitry Goldgof, Ph.D. [email protected] Rangachar Kasturi, Ph.D.
Operation Count; Numerical Linear Algebra
10 Operation Count; Numerical Linear Algebra 10.1 Introduction Many computations are limited simply by the sheer number of required additions, multiplications, or function evaluations. If floating-point
Flexible Deterministic Packet Marking: An IP Traceback Scheme Against DDOS Attacks
Flexible Deterministic Packet Marking: An IP Traceback Scheme Against DDOS Attacks Prashil S. Waghmare PG student, Sinhgad College of Engineering, Vadgaon, Pune University, Maharashtra, India. [email protected]
Optimization of the physical distribution of furniture. Sergey Victorovich Noskov
Optimization of the physical distribution of furniture Sergey Victorovich Noskov Samara State University of Economics, Soviet Army Street, 141, Samara, 443090, Russian Federation Abstract. Revealed a significant
CHAPTER 2 LITERATURE REVIEW
11 CHAPTER 2 LITERATURE REVIEW 2.1 INTRODUCTION Image compression is mainly used to reduce storage space, transmission time and bandwidth requirements. In the subsequent sections of this chapter, general
Introduction to Medical Image Compression Using Wavelet Transform
National Taiwan University Graduate Institute of Communication Engineering Time Frequency Analysis and Wavelet Transform Term Paper Introduction to Medical Image Compression Using Wavelet Transform 李 自
SIGNATURE VERIFICATION
SIGNATURE VERIFICATION Dr. H.B.Kekre, Dr. Dhirendra Mishra, Ms. Shilpa Buddhadev, Ms. Bhagyashree Mall, Mr. Gaurav Jangid, Ms. Nikita Lakhotia Computer engineering Department, MPSTME, NMIMS University
Storage Optimization in Cloud Environment using Compression Algorithm
Storage Optimization in Cloud Environment using Compression Algorithm K.Govinda 1, Yuvaraj Kumar 2 1 School of Computing Science and Engineering, VIT University, Vellore, India [email protected] 2 School
Session 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
Excel 2003 Tutorials - Video File Attributes
Using Excel Files 18.00 2.73 The Excel Environment 3.20 0.14 Opening Microsoft Excel 2.00 0.12 Opening a new workbook 1.40 0.26 Opening an existing workbook 1.50 0.37 Save a workbook 1.40 0.28 Copy a workbook
Determining Polar Axis Alignment Accuracy
Determining Polar Axis Alignment Accuracy by Frank Barrett 7/6/008 Abstract: In order to photograph dim celestial objects, long exposures on the order of minutes or hours are required. To perform this
Numerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number
Signature verification using Kolmogorov-Smirnov. statistic
Signature verification using Kolmogorov-Smirnov statistic Harish Srinivasan, Sargur N.Srihari and Matthew J Beal University at Buffalo, the State University of New York, Buffalo USA {srihari,hs32}@cedar.buffalo.edu,[email protected]
Review Study on Techniques for Network worm Signatures Automation
Review Study on Techniques for Network worm Signatures Automation 1 Mohammed Anbar, 2 Sureswaran Ramadass, 3 Selvakumar Manickam, 4 Syazwina Binti Alias, 5 Alhamza Alalousi, and 6 Mohammed Elhalabi 1,
Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Impact of Feature Selection on the Performance of ireless Intrusion Detection Systems
Wavelet Analysis Based Estimation of Probability Density function of Wind Data
, pp.23-34 http://dx.doi.org/10.14257/ijeic.2014.5.3.03 Wavelet Analysis Based Estimation of Probability Density function of Wind Data Debanshee Datta Department of Mechanical Engineering Indian Institute
Malware Detection in Android by Network Traffic Analysis
Malware Detection in Android by Network Traffic Analysis Mehedee Zaman, Tazrian Siddiqui, Mohammad Rakib Amin and Md. Shohrab Hossain Department of Computer Science and Engineering, Bangladesh University
Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction
Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals Modified from the lecture slides of Lami Kaya ([email protected]) for use CECS 474, Fall 2008. 2009 Pearson Education Inc., Upper
Commonly Used Excel Functions. Supplement to Excel for Budget Analysts
Supplement to Excel for Budget Analysts Version 1.0: February 2016 Table of Contents Introduction... 4 Formulas and Functions... 4 Math and Trigonometry Functions... 5 ABS... 5 ROUND, ROUNDUP, and ROUNDDOWN...
MICROSOFT EXCEL STEP BY STEP GUIDE
IGCSE ICT SECTION 14 DATA ANALYSIS MICROSOFT EXCEL STEP BY STEP GUIDE Mark Nicholls ICT Lounge Data Analysis Self Study Guide Contents Learning Outcomes Page 3 What is a Data Model?... Page 4 Spreadsheet
CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction
CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous
How To Model A System
Web Applications Engineering: Performance Analysis: Operational Laws Service Oriented Computing Group, CSE, UNSW Week 11 Material in these Lecture Notes is derived from: Performance by Design: Computer
Obfuscation of sensitive data in network flows 1
Obfuscation of sensitive data in network flows 1 D. Riboni 2, A. Villani 1, D. Vitali 1 C. Bettini 2, L.V. Mancini 1 1 Dipartimento di Informatica,Universitá di Roma, Sapienza. E-mail: {villani, vitali,
On Correlating Performance Metrics
On Correlating Performance Metrics Yiping Ding and Chris Thornley BMC Software, Inc. Kenneth Newman BMC Software, Inc. University of Massachusetts, Boston Performance metrics and their measurements are
INTRUSION PREVENTION AND EXPERT SYSTEMS
INTRUSION PREVENTION AND EXPERT SYSTEMS By Avi Chesla [email protected] Introduction Over the past few years, the market has developed new expectations from the security industry, especially from the intrusion
Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
Fast Sequential Summation Algorithms Using Augmented Data Structures
Fast Sequential Summation Algorithms Using Augmented Data Structures Vadim Stadnik [email protected] Abstract This paper provides an introduction to the design of augmented data structures that offer
How To Filter Spam Image From A Picture By Color Or Color
Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among
Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.
Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet
A Secure File Transfer based on Discrete Wavelet Transformation and Audio Watermarking Techniques
A Secure File Transfer based on Discrete Wavelet Transformation and Audio Watermarking Techniques Vineela Behara,Y Ramesh Department of Computer Science and Engineering Aditya institute of Technology and
