Machine Learning Forensics for Law Enforcement, Security, and Intelligence Jesus Mena CRC Press Taylor &. Francis Group Boca Raton London NewYork CRC Press is an imprint of the Taylor & Francis Croup, an Informa business AN AUERBACH BOOK
Contents Introduction The Author ix xi Chapter 1 What Is Machine Learning Forensics? 1 1.1 Definition 1 1.2 Digital Maps and Models: Strategies and Technologies 2 1.3 Extractive Forensics: Link Analysis and Text Mining 3 1.4 Inductive Forensics: Clustering Incidents and Crimes 7 1.5 Deductive Forensics: Anticipating Attacks and Precrimc 10 1.6 Fraud Detection: On the Web, Wireless, and in Real Time 21 1.7 Cybersecurity Investigations: Self-Organizing and Evolving Analyses 24 1.8 Corporate Counterintelligence: Litigation and Competitive Investigations 28 1.9 A Machine Learning Forensic Worksheet 32 Chapter 2 Digital Investigative Maps and Models: Strategies and Techniques 37 2.1 Forensic Strategies 37 2.2 Decompose 2.3 Criminal Data Sets, Reports, the Data 41 and Networks 42 2.4 Real Estate, Auto, and Credit Data Sets 45 2.5 Psychographic and Demographic Data Sets 46 2.6 Internet Data Sets 49 2.7 Deep Packet Inspection (DPI) 53 V
VI CONTENTS 2.8 Designing a Forensic Framework 56 2.9 Tracking Mechanisms 58 2.10 Assembling Data Streams 63 2.11 Forensic Techniques 65 2.12 Investigative Maps 69 2.13 Investigative Models 72 Chapter 3 Extractive Forensics: Link Analysis and Text Mining 77 3.1 Data Extraction 77 3.2 Link Analysis 80 3.3 Link Analysis Tools 83 3.4 Text Mining 96 3.5 Text Mining Tools 98 3.5.1 Online Text Mining Analytics Tools 99 3.5.2 Commercial Text Mining Analytics Software 99 3.6 From Extraction to Clustering 123 Chapter 4 Inductive Forensics: Clustering Incidents and Crimes 125 4.1 Autonomous Forensics 125 4.2 Self-Organizing Maps 129 4.3 Clustering Software 132 4.3.1 Commercial Clustering Software 132 4.3.2 Free and Open-Source Clustering Software 134 4.4 Mapping Incidents 138 4.5 Clustering Crimes 141 4.6 From Induction to Deduction 154 Chapter 5 Deductive Forensics: Anticipating Attacks and Precrime 159 5.1 Artificial Intelligence and Machine Learning 159 5.2 Decision Trees 160 5.3 Decision Tree Techniques 163 5.4 Rule Generators 167 5.5 Decision Tree Tools 170 5.5.1 Free and Shareware Decision Tree Tools 179 5.5.2 Rule Generator Tools 179 5.5.3 Free Rule Generator Tools 182 5.6 The Streaming Analytical Forensic Processes 184 5.7 Forensic Analysis of Streaming Behaviors 190 5.8 Forensic Real-Time Modeling 191 5.9 Deductive Forensics for Precrime 192 Chapter 6 Fraud Detection: On the Web, Wireless, and in Real Time 195 6.1 Definition and Techniques: Where, Who, and How 195 6.2 The Interviews: The Owners, Victims, and Suspects 202
CONTENTS VII 6.3 Hie S cene of the Crime: Search for Digital Evidence 6.3.1 Four Key Steps in Dealing with Digital Evidence 6.4 Searches for Associations: Discovering Links and 205 206 Text Concepts 207 6.5 Rules offraud: Conditions and Clues 208 6.6 A Forensic Investigation Methodology 209 6.6.1 Step One: Understand the Investigation Objective 209 6.6.2 Step Two: Understand the Data 210 6.6.3 Step Three: Data Preparation Strategy 210 6.6.4 Step Four: Forensic Modeling 210 6.6.5 Step Five: Investigation Evaluation 211 6.6.6 Step Six: Detection Deployment 211 6.7 Forensic Ensemble Techniques 212 6.7.1 Stage One: Random Sampling 212 6.7.2 Stage Two: Balance the Data 213 6.7.3 Stage Three: Split the Data 213 6.7.4 Stage Four: Rotate the Data 213 Models 213 6.7.5 Stage Five: Evaluate Multiple 6.7.6 Stage Six: Create an Ensemble Model 214 6.7.7 Stage Seven: Measure False Positives and Negatives 215 6.7.8 Stage Eight: Deploy and Monitor 215 6.7.9 Stage Nine: Anomaly Detection 216 6.8 Fraud Detection Forensic Solutions 216 6.9 Assembling an Evolving Fraud Detection Framework 227 Chapter 7 Cybersecurity Investigations: Self- Organizing and Evolving Analyses 233 7.1 What Is Cybersecurity Forensics? 233 7.2 Cybersecurity and Risk 234 7.3 Machine Learning Forensics for Cybersecurity 236 7.4 Deep Packet Inspection (DPI) 239 7.4.1 Layer 7: Application 239 7.4.2 Layer 6: Presentation 240 7.4.3 Layer 5: Session 240 7.4.4 Layer 4: Transport 240 7.4.5 Layer 3: Network 241 7.4.6 Layer 2: Data Link 241 7.4.7 Layer 1: Physical 241 7.4.8 Software Tools Using DPI 241 7.5 Network Security Tools 242 7.6 Combating Phishing 245 7.7 Hostile Code 247 7.8 The Foreign Threat 250 7.8.1 The CNCI Initiative Details 252
VI11 CONTENTS 7.9 Forensic Investigator Toolkit 256 7.10 Wireless Hacks 259 7.11 Incident Response Check-Off Checklists 263 7.12 Digital Fingerprinting 267 Chapter 8 Corporate Counterintelligence: Litigation and Competitive Investigations 271 8.1 Corporate Counterintelligence 271 8.2 Ratio, Trending, and Anomaly Analyses 274 8.3 E-Mail Investigations 276 8.4 Legal Risk Assessment Audit 283 8.4.2 Inventory of External Inputs to the Process 285 8.4.3 Identify Assets and Threats 286 8.4.4 List Risk Tolerance for Major Events 286 8.4.5 List and Evaluate Existing Protection Mechanisms 287 8.4.6 List and Assess Underprotected Assets and Unaddressed Threats 287 8.5 Competitive Intelligence Investigations 292 8.5 Triangulation Investigations 302 Index 307