Learning Behavioral Fingerprints from NetFlows... using Timed Automata Nino Pellegrino October the 20th, 2015 Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 1 / 32
Use case Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 2 / 32
Use case What does behavioral ngerprint exactly mean? How is it possible to detect a malicious host using behavioral ngerprints? Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 3 / 32
NetFlows Characterized by several properties derived from the aggregation of packet-based features. FEATURE TYPE VALUES source-ip string 147.32.84.193 protocol string TCP, UDP direction string ->, <-, <->, <?> start-time timestamp 2011-08-17 15:51:08.499 duration oat 0.103, 2.696 total-packets integer 9, 1 total-bytes integer 1030, 66, 43 Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 4 / 32
NetFlows PRO: frequently logged by network operators and much easy to obtain. PRO: (more) privacy preserving. In contrast to network packets, NetFlows do not contain content and format elds. PRO: scale smoothly with big data amounts CON: automata learned from NetFlows dene behavior on a high abstraction level. Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 5 / 32
Timed Events Timed Events are couples (timestamp, symbol): NetFlow TIMED EVENT 147.32.84.193, udp, ->, 15:51:09, 0.000304, 1, 68 (1,a) 147.32.84.193, udp, <->, 15:52:01, 0.000442, 5, 590 (52,b) 147.32.84.193, tcp, <->, 15:53:46, 0.000527, 3, 479 (150,c) Timed Events represent a symbolic abstraction of actual NetFlows data Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 6 / 32
Obtaining Events The symbolic part of a Timed Event is set in a way such that two NetFlows exhibiting the same features get the symbol. Cathegorical features, as direction or protocol, have been mapped to progressive positive numbers basing on their values. EXAMPLE: if protocol=udp then 0 if protocol=tcp then 1 etc. Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 7 / 32
Obtaining Events Numerical features, as duration or total packets, have been mapped according to the 20th, 40th, 60th, and 80th percentiles. EXAMPLE (total-packets feature): Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 8 / 32
Obtaining TIMED sequences of events The temporal part of a Timed Event t i is set to the dierence in the start-time from the previously seen Timed Event t i 1 prot dir time duration packet byte event udp -> 0 0.000304 1 68 (0,a) udp <-> 5 0.000442 5 590 (5,b) tcp <-> 17 0.000527 3 479 (12,c) udp -> 22 0.17121 10 7701 (5,c) tcp <-> 24 0.120181 6 212 (2,d) Sequences of Timed Events are generated by sliding a temporal window of xed duration, i.e. 20 milliseconds. s 1 = (0,a)(5,b)(12,c) s 2 = (5,b)(12,c)(5,b)(2,d) Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 9 / 32
Learning a stateful model Positive data: aa, b, bba; Negative data: a, aaa, aabb Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 10 / 32
Learning a stateful model Select two nodes Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 11 / 32
Learning a stateful model Move input transitions Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 12 / 32
Learning a stateful model Move output transitions Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 13 / 32
Learning a stateful model Move output transitions Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 14 / 32
Learning a stateful model Delete the obsolete state Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 15 / 32
Learning a stateful model Determinization Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 16 / 32
Learning a stateful model Determinization Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 17 / 32
Learning a stateful model Determinization Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 18 / 32
Learning a stateful model Determinization Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 19 / 32
Learning a stateful model Determinization Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 20 / 32
Learning a stateful model Determinization Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 21 / 32
Learning a stateful model Select two nodes, iterate Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 22 / 32
Learning a stateful model Select two nodes, iterate Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 23 / 32
Recognizing a host as infected Two dierent strategies for nding infection on candidate hosts. Both strategies rely on infection symptoms. Infection symptoms are all couples (state, timed-event) collected using a model on testing data. Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 24 / 32
Error Based Strategy Evaluates whether a candidate host C shows the same symptoms occurrences as a known malicious host M. Let Countsi M and Countsi C be counts of symptom i in M and C, respectively, host C is classied as infected if Counts M i Counts C i < τ i i.e. if the absolute error between the expected and observed symptom counts if below a pre-computed threshold. Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 25 / 32
Fingerprint Based Strategy Uses a conguration dataset to look for distinguishing symptoms. distinguishing symptoms characterize malicious hosts, but never occur in any host in the conguration datatset. Let Countsi F denote occurrences of a symptom i in such dataset, host C is considered malicious if: Counts F i = 0 Counts M i > 0 Counts C i > 0 Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 26 / 32
Per-scenario Performances Scenario Conf Size Train Size Eval Size Infected 9 877 1 386 10 10 359 1 162 10 11 82 1 37 3 12 14 1 9 3 error based ngerprint based TP TN FP FN TP TN FP FN 9 9 377 0 0 9 376 1 0 10 9 153 0 0 9 152 1 0 11 1 35 0 1 1 33 2 1 12 0 7 0 2 2 6 1 0 Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 27 / 32
Error Based Strategy 4000 3500 147.32.86.165 True Negative (S1) False Positive (S2) 147.32.84.193 True Positive (S1, S2) 147.32.84.204 True Positive (S1, S2) 204.12.234.66 True Negative (S1, S2) 205.188.17.129 True Negative (S1, S2) Observed Occurence 3000 2500 2000 1500 1000 500 0 0 500 1000 1500 2000 2500 3000 3500 4000 Expected Occurence Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 28 / 32
Detection performance on unseen malware Scenario 1 2 6 8 9 TP 1 1 0 1 10 FP 8 26 21 49 82 Accuracy 0.6 0.9549 0.9310 0.9461 0.9351 F-measure 0.2 0.0714 0 0.0392 0.1961 Accuracy = F-Measure = TP + TN TP + FP + TN + FN 2TP 2TP + FP + FN Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 29 / 32
Obtaining Events Using mapping values we assign for each NetFlow in a single positive number using the following encoding algorithm: Input: a NetFlow n = a 0, a 1,, a k with k features Input: an attribute mapping M i, i = 0, 1,, k Output: integer code for n code 0; spacesize k i=0 Dom(M i(a i )) ; for i 0 to k do code code + M i (a i ) spacesize Dom(M i ) ; spacesize spacesize Dom(M i ) ; return code; Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 30 / 32
Example If we have two features mapping domain size protocol TCP=0, UDP=1, <unknown>=2 3 total-packets 20th: 7, 40th: 22, 60th: 30, 80th: 32 5 Then Encode( TCP, 12 ) = 0 15 3 + 1 5 5 = 1 Encode( UDP, 45 ) = 1 15 3 + 4 5 5 = 9 Encode( ICMP, 25 ) = 2 15 3 + 2 5 5 = 12 Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 31 / 32
Host behavior descriptions: PDRTAs root state 1 [0.40 0.28 0.06 0.25] [0.63 0.21 0.11 0.05] Q-TCP [171,195] TCP Q-UDP [0,203] state 2 [0.09 0.22 0.01 0.67] [1.00 0.00 0.00 0.00] Q-TCP TCP [0,1] Q-UDP [204,2759] Q-UDP [2760,max] UDP Q-TCP Q-UDP Q-TCP [161,170] state 5 [0.85 0.05 0.02 0.08] [1.00 0.00 0.00 0.00] TCP [2,max] Q-TCP [31,153] Q-TCP [196,max] Q-UDP Q-UDP TCP Q-TCP [0,30] Q-TCP [154,160] state 3 [0.93 0.02 0.02 0.04] [1.00 0.00 0.00 0.00] Q-UDP UDP state 6 [0.51 0.17 0.01 0.31] [1.00 0.00 0.00 0.00] TCP TCP state 9 [0.06 0.03 0.01 0.90] [1.00 0.00 0.00 0.00] TCP Q-TCP Q-UDP Q-TCP TCP state 4 [0.04 0.37 0.01 0.59] [1.00 0.00 0.00 0.00] Q-TCP TCP TCP state 8 [0.05 0.55 0.01 0.39] [1.00 0.00 0.00 0.00] Q-TCP state 7 [0.05 0.08 0.01 0.86] [1.00 0.00 0.00 0.00] Nino Pellegrino Learning Behavioral Fingerprints October the 20th, 2015 32 / 32