SIMPLE MACHINE HEURISTIC INTELLIGENT AGENT FRAMEWORK Simple Machine Heuristic (SMH) Intelligent Agent (IA) Framework Tuesday, November 20, 2011 Randall Mora, David Harris, Wyn Hack Avum, Inc.
Outline Solution Objective Problem Definition SMH IA Framework Solution to the Problem SMH IA Framework Core Functionality Description of Use Case Scenarios Heuristic Analysis Across Domains Distributed Analysis of High Volume Network Traffic Predictive Analytics of Aggregated Data. Demonstration Metrics and Performance Discussion Applied Solutions With SMH IA Framework Future Applications
Solution Objective Develop a prototype framework operating within a diverse real-time and Big Data environment that enables rapid extraction of intrusion detection (or any) tradecraft into interacting communities of intelligent agents. The IA Framework Prototype Will Demonstrate: P Extended IAs that perform intrusion detection through continuous monitoring. P Distributed IAs threaded to handle a large volume of low-latency data for extended time periods. P A Big Data operating environment and rapid design-to-test IA base functionality for analysts/developers to perform intrusion analysis and the migration of tradecraft into the machine. P IAs collaborating in a team environment for the identification, generation, publication, and storing of meaning and relevance. P IBM InfoSphere Streams streaming data to the IA Framework message broker and aggregated data being streamed back to InfoSphere Streams for scoring with a PMML model. 3
Problem Definition Drowning In Data Innovative architectures are needed to allow the analyst to interact with data in new ways. Technologies are needed to leverage Big Data, including initial analysis, and design-totest and deployment of non-brittle scalable solutions. Methods are Needed to Combine Multiple Streams or Sources of Disparate Data P Technologies to build a comprehensive picture of data across domains P Flexible tools, technologies and frameworks to allow the analyst/ developer to incorporate new domains of data and build a unified view of data Scalable Analytics and Data Processing Continuous Monitoring Architectures 4
Today s [or Current ] Architectures Flat Architecture Todays Agent Frameworks operate within a single core Operating System (OS) and require multiple interfaces for receiving and communicating their knowledge base. Each agent is initialized based on its defined functionality. Single Domain Functionality P Interfaces required for multiple domains P Single view of data Brittle Architecture P Extensive design-to-test and deployment processes for new developments P Short useful lifespan in today s rapidly changing environment Container-Based Scalability P Application Server or Application Operating Environment 5
SMH IA Framework The SMH IA Framework supplies an open architecture that enables the programmer/analyst to build an IA suite for mining, fusing, examining and evaluating heterogeneous data for semantic representations. Extensible and Portable Framework Game Changing, Yet Proven Components for Interoperability and Big Data Communities of Distributed and Collaborating Agents Analyst/Developer rapid design-totest intrusion analysis and tradecraft extraction Real-time, near-real-time, and Big Data Analysis Highly Scalable 6
Components for Interoperability Java/JEE Spring Framework Users Open Service Gateway initiative (OSGi) Apache Kafka Apache ZooKeeper Hadoop Hypertable Analyst Programer/Analyst IBM InfoSphere Steams Predictive Model Markup Language (PMML) 7
SMH IA Framework Value Independent and Distributed Development and Execution of Domain Solutions Unified Framework that a Programmer/Analyst Can Plug Into for Cyber Analytics Enables the Creation of Independent Agents that can Be Shared Interagency Facilitates Intercommunicating Agent Communities Working Independently or Together with a Common Goal Extremely Convenient Environment for Design-Test and Deployment of Domain Data Fusion Solutions Built-in Hadoop Archival and Playback of Both Raw or Aggregated Data Post-Operation Analysis Plug In Visualization Capabilities Easily Plug-In and Integrate with Other Solutions/Technologies/ Architectures Solving Unique Solutions P IBM InfoSphere Streams 8
Heuristic Analysis Across Domains Scenario to demonstrate the use of SMH IAs in the context of illicit download detection P Communities/teams of interacting IAs collaborating to detect illicit downloads P Fusing information across domains to identify perpetrators P Rapid design-to-test and deployment of IAs P Pushing new IA to perform deep analysis and reporting for the possible perpetrator P Training IAs for acceptable threshold recognition using vetted data P Training IAs with user responses P Dynamic interaction with live users and IAs to identify illicit activities 9
Distributed Analysis of High Volume Network Traffic Scenario to demonstrate the distributed analysis of continuous high volume network traffic within the SMH IA Framework P Distributed Nature of OSGi P Visualization of data fused from multiple Kafka queues P Archiving data that can be replayed from Hadoop for later analysis P Utilizing IBM InfoSphere Streams as an IA 10
Distributed Analysis of High Volume Network Traffic 11
Predictive Analytics of Aggregated Data Scenario to demonstrate the use of PMML to perform predictive analytics against aggregated data from the IA Framework P Aggregated data being pushed into IBM InfoSphere Streams P Scoring of IA streams with a simple PMML model in IBM InfoSphere Streams P Aggregated/scored stream being returned to the IA Framework 12
IAs Created for Prototype Base (Fundamental Servers) IAs P Kafka, Hypertable, Database, Email services, IBM InfoSphere Integration Heuristic Analysis P PCAP PCAP File Reader P Reduced PCAP Archiver P User Activity Log Reader (with PCAP synchronization) P User Activity Parser P Current Login Status Dictionary P Traffic Flow Accumulator (monitors traffic in and out of servers) P User Download Threshold Analyst P Training Alert Response Agent P User Email Alert/Response Agent P Data-At-Rest and Illicit Activity Analyst/ Reporter Distributed Analysis P IBM InfoSphere Stream PCAP Splitter P Conversation Flow Packet Aggregator P Conversation Flow Archiver P Intrusion Detection Analyst P Flow Grapher Predictive Analytics (PMML Scoring) P PCAP File Reader P Traffic Flow Accumulator (monitors traffic in and out of various servers) P IBM InfoSphere Stream (Kafka Consumer IA) P PMML Data Mining Toolkit P IBM InfoSphere Stream (Kafka Producer IA) P PMML Results Analyst/Logger 13
Results, Scalability, Throughput, Flexibility 2 ½ Months Building the Framework ½ Month to Build the Demos Scenarios P The prototype demonstrates design-to-test rapid development of IAs P Prototype architecture allows an analyst/developer to interact with data in new ways P The prototype demonstrates rapid design-to-test of intrusion detection and continuous monitoring P The prototype demonstrates flexible tools, technologies and frameworks that allow the analyst/developer to incorporate new data domains and build a unified view of data Every Component and IA can be Distributed Across any Number of Machines For a Single Producer, Single Consumer and Single Kafka Machine Performance Tests demonstrated Throughput of: P 50MB/sec writing to the queues P 100MB/sec reading from the queues Standard Java/JEE Environment P Huge Collection of Affordable Talent Familiar With the Underling Architecture 14
SMH IA Applied To Relevant Problems Design-To-Test Cycles ü Current As-Is Intelligent Agent Solutions vs SMH IA Framework ü Unique Wired Operating Environment for Data Analysis Flexible and Adaptable Framework ü Integration With Existing Architectures, Operating Environments, Industrial Machinery, and Mobile Devices Big Data Operating Environment For Machine Learning ü Quickly and effectively integrate structured, semi-structured, and unstructured data for rapid design-to-test of existing, new and improved algorithms Create New Views of Data Across Domains For Analysis Interactively Simulate Solutions for Cyber Intelligence ü Interactively reaching new levels in the tradecraft extraction 15
Areas of Future Work Implement Predictive Components ü Add predictive modeling into IA s functionality ü Integrate Machine Learning into the base IA functionality (algorithms, predictive modeling, etc.) Fuse Additional Data Domains P Implement algorithms to assign value to and classify structured and unstructured data from additional sources P Transform/fuse additional domains to feed into algorithms to mine intelligence and make predictions Improve the System to be Self-Administrated P Tailor additional Administrator IA to control management subsystem for IA deployment Collaborate to Innovate P Work with Subject Matter Experts (SMEs) to incorporate their ideas Investigate and Implement IA Clusters Running Other Domains 16
Areas of Future Work 17