How To Use Data For Abstract Reasoning



Similar documents
Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software

The Game of Big Data! Analytics Infrastructure at KIXEYE

Data Visualization An Outlook on Disruptive Techniques (Technical Insights)

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst

Cisco Data Preparation

Sentiment Analysis on Big Data

Cray: Enabling Real-Time Discovery in Big Data

2015 State of Artificial Intelligence & Big Data in the Enterprise

Social Market Analytics, Inc.

Navigating Big Data business analytics

Manifest for Big Data Pig, Hive & Jaql

SAP HANA Vora : Gain Contextual Awareness for a Smarter Digital Enterprise

How To Make Data Streaming A Real Time Intelligence

ANALYTICS BUILT FOR INTERNET OF THINGS

Understanding the Value of In-Memory in the IT Landscape

LOG AND EVENT MANAGEMENT FOR SECURITY AND COMPLIANCE

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps

Gain Contextual Awareness for a Smarter Digital Enterprise with SAP HANA Vora

Steven C.H. Hoi School of Information Systems Singapore Management University

RESEARCH REPORT. The State of Streaming Big Data Analytics: 2014 Survey Results

THE 2014 THREAT DETECTION CHECKLIST. Six ways to tell a criminal from a customer.

Anatomy of Cyber Threats, Vulnerabilities, and Attacks

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Ensighten Data Layer (EDL) The Missing Link in Data Management

FutureWorks Nokia technology vision 2020: personalize the network experience. Executive Summary. Nokia Networks

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Industry 4.0 and Big Data

CHAPTER 5: BUSINESS ANALYTICS

Security Analytics Topology

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Network Metrics Content Pack for VMware vrealize Log Insight

Discovering Business Insights in Big Data Using SQL-MapReduce

Big Data and Natural Language: Extracting Insight From Text

The 4 Pillars of Technosoft s Big Data Practice

Adobe Insight, powered by Omniture

How To Choose A Business Intelligence Toolkit

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Open Platform. Clinical Portal. Provider Mobile. Orion Health. Rhapsody Integration Engine. RAD LAB PAYER Rx

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

A New Era Of Analytic

Pipeliner CRM Phaenomena Guide Sales Target Tracking Pipelinersales Inc.

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Big Data? Definition # 1: Big Data Definition Forrester Research

Ramesh Bhashyam Teradata Fellow Teradata Corporation

Best Practices for Eliminating Risk from Routing Changes

Guide to Operating SAS IT Resource Management 3.5 without a Middle Tier

Niara Security Analytics. Overview. Automatically detect attacks on the inside using machine learning

Visualization methods for patent data

Transforming the Telecoms Business using Big Data and Analytics

Safe Harbor Statement

Software Requirements Specification. Schlumberger Scheduling Assistant. for. Version 0.2. Prepared by Design Team A. Rice University COMP410/539

Best Practices for Hadoop Data Analysis with Tableau

IBM BigInsights for Apache Hadoop

Demonstration of SAP Predictive Analysis 1.0, consumption from SAP BI clients and best practices

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

Industrial Dr. Stefan Bungart

GETTING REAL ABOUT SECURITY MANAGEMENT AND "BIG DATA"

Data Discovery, Analytics, and the Enterprise Data Hub

Converging Technologies: Real-Time Business Intelligence and Big Data

Processing and Analyzing Streams. CDRs in Real Time

how can I improve performance of my customer service level agreements while reducing cost?

CHAPTER 4: BUSINESS ANALYTICS

Accelerate BI Initiatives With Self-Service Data Discovery And Integration

Real-Time Data Access Using Restful Framework for Multi-Platform Data Warehouse Environment

The Cyber Threat Profiler

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT

QLIKVIEW FOR LIFE SCIENCES. Partnering for Innovation and Sustainable Growth

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Big Data Analytics Nokia

The State of Real-Time Big Data Analytics & the Internet of Things (IoT) January 2015 Survey Report

Preempting Business Risk with RSA SIEM and CORE Security Predictive Security Intelligence Solutions

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

BUSINESS INTELLIGENCE

Ignite Your Creative Ideas with Fast and Engaging Data Discovery

LOG MANAGEMENT AND SIEM FOR SECURITY AND COMPLIANCE

Designing Agile Data Pipelines. Ashish Singh Software Engineer, Cloudera

The Definitive Guide to Data Blending. White Paper

High Performance Computing and Big Data: The coming wave.

What s New in Security Analytics Be the Hunter.. Not the Hunted

Using In-Memory Computing to Simplify Big Data Analytics

Three Open Blueprints For Big Data Success

Transcription:

Applying High-speed Pattern Recognition to Generate Queryable Semantics from Big Data Chuck Rehberg CTO Trigent Software Chief Scientist at Semantic Insights October 2014 copyright 2014 Semantic Insights

The Problem: Using Raw Data for Abstract Reasoning In some cases, where big data is collected in real-time, the data being collected is at a level of abstraction that is too low for domain-knowledgeable investigators to directly query....1110010101010001001101001000100010001000100010001000100001001000101001001001000 10010001000100010001010100001001010010100101010010010100101001010010111101001001 00100100101001001010101001010001010010010100100101001001010010100100100101010100 10010101001010101010010101010010101010100100100101001001010010100100100101001010 01010010011110010101010001001101001000100010001000100010001000100001001000101001 00100100010010001000100010001010100001001010010100101010010010100101001010010111 10100100100100100101001001010101001010001010010010100100101001001010010100100100 10101010010010101001010101010010101010010101010100100100101001001010010100100100 1010010100101001001... Key 12, Key 37 + Resp 13, Key 33, Key 14 + Resp 11, Key 12, Key 37 + Resp 15, Key 33, Key 14 + Resp 19, Key 12, Key 37 + Resp 15, Key 33, Key 14 + Resp 11, Key 12, Key 39 + Resp 13, Key 33, Key 14 + Resp 11, Key 12, Key 37 + Resp 13, Key 33, Key 14 + Resp 11, Key 12, Key 37 + Resp 15, Key 33, Key 14 + Resp 19, Key 12, Key 37 + Resp 15, Key 33, Key 14 + Resp 11, Key 12, Key 39 + Resp 13, Key 33, Key 14 + Resp 11... Login Check Balance Transfer *failed Check Balance Transfer *failed* - Read Notice Idle time out Login *failed Login View Stock - [possible hacking attempt] or [possible phishing attempt]

Example: Data at the wrong level of Abstraction 1. Suppose you have a data stream where a massive amount of realtime data = individual key clicks + responses tagged to individual interactive sessions. 2. Suppose domain-knowledgeable investigators are interested in a group of nearly simultaneous transactions from multiple distributed interactive sessions involving any one of a set of specific data items. 3. Suppose the key clicks + responses tagged to each interactive session are interspersed with a wide variety of other (nonrelevant) key clicks + responses within the same interactive session. 4. Suppose the domain-knowledgeable investigators wish to conduct their investigation with ad-hoc queries that are not in terms of relative sequences of key clicks + responses. Rather, they wish to make queries in terms of high level transactions. 5. And finally, suppose you have more than one massive interspersed data stream with similar and/or related information, which taken together are used to infer the high level transactions referenced in the queries of the domainknowledgeable investigators.??? Fraud Detection Unit

A Semantic Translation Solution The raw data represents the lowest level of abstraction available. Often it is the only level of abstraction available. A high-speed complex pattern recognizer can be employed to sift through and correlate, recognize and generate aggregate data representing a higher level of abstraction. Multiple possible interpretations can be maintained. Additional high-speed complex pattern recognizers can operate on this new higher level data to generate new aggregate data representing an even higher level of abstraction. This can be continued to allow flexible changes to each data transformation level. Again, multiple possible interpretations can be maintained. The goal is to produce data representing higher levels of abstractions which constitute semantic representations required by the domainknowledgeable investigators.

Two Key Translation Challenges must be met 1. One key challenge is to flexibly transform the raw data into data that domain-knowledgeable investigators can directly query. The transformation may require considering at a vast amount of data in sequence before a proper transformation can be made. Multiple possible interpretations of the data may need to be maintained. Individual transformations should be allowed to evolve over time to become more accurate The data may contain information that does not contribute (or even detracts from) the semantic interpretation of the data. The data of interest may be surrounded by semantic noise. 2. For time-sensitive data, you will need to transform the data in realtime as it arrives.

Real-time Semantic Transformation Our High-speed complex pattern recognizer is based on the Fast Rule Selection Engine (FRSE). US Patent #7,433,858 We have made additional modifications to the FRSE Engine to efficiently handle time-sequenced pattern matching. can match adjacent tokens or span intervening nonmatching tokens (noise). The number of tokens to be scanned at a time can be pre-specified or adjust dynamically as needed. Multiple simultaneous levels of transformation in real-time is accomplished by pipelining. Where each level hands off to the next level for further processing while continuing to process the new input.

Transforming Streamed Data in Real-Time: A Multi-layered Pipelined Approach 1. The continuous stream of data is transformed into objects recognized in another transformation process which in turn produces objects that are sent to a third transformation process, and so forth. 2. This can continue an arbitrary number of times with each transformation running independently in parallel. 3. This is well suited to problems requiring a cascade of information toward increasing levels of abstraction. 2 1 3 Level 0-1 Domain Knowledge Level 1-2 Domain Knowledge Level 2-3 Domain Knowledge Define Define Define Level 0-1 Level 1-2 Level 2-3 1 2 Continuous Stream of (Level 0) data Pattern Recognizers Streaming Higher level (Level 1) data Pattern Recognizers Streaming Higher level data 3 Pattern Recognizers Pipeline HDFS DB Load HDFS Streaming Higher level data Load Triple Store (RDF) Triple Store Hadoop SPRQL Fraud Detection Unit Fraud Detection Unit

Specifying Transformation Flexibly Transformation processed by the FRSE Engine are powerful and simple to define: 1. Transformation are sequences of Token Expressions (i.e. expressions that match Match Tokens) 2. Transformation also produce a Pattern Result 3. Pattern Result creates a new sequence of one or more Result Tokens in the output data 4. Match Tokens are any data item that can be matched in the input data 5. Result Tokens are created by Transformation 6. Token Expressions state what must be true for a Match Token to be correctly identified 7. Result Tokens can act like Match Tokens in subsequent Transformation Specifically: <Transformation Pattern> ::= <Token Expression>(1,n) --> [<directive>] <Result Token>(1,n) <Token Expression> ::= [<Match Operator>] <Match Token> /* default Match Operator is '=' */ <Match Operator> ::= { = <> > < <= => } <Directive> ::= { ADD REPLACE RESULT } /* ADD/REPLACE modifies input; RESULT sends to output */ <Match Token> ::= <Token Set ID> token(1,n) <Token Set ID> ::= token(1,n) /* Token Set ID acts like a synonym */ <Result Token> ::= token

Summary Big Data, representing real world events, often needs to be filtered and transformed into semantically meaningful data before domainknowledgeable analysts can perform queries. Today, interactive data visualization techniques are often used to display data aggregations and relationships to help domainknowledgeable analysts preform generalized visual queries. However, data visualization can be particularly unrevealing when data aggregation requires recognizing content-dependent patterns in a field of physical and semantic noise. High-speed Pattern Recognizers are capable of filtering and transforming real-time Big Data for immediate high-level evaluation. For more information, please visit us at www.semanticinsights.com.

Chuck Rehberg As CTO at Trigent Software and Chief Scientist at Semantic Insights, Chuck Rehberg has developed patented high performance rules engine technology and advanced natural language understanding technologies that empower a new generation of semantic research solutions. Chuck has more than thirty years in the high-tech industry, developing leading-edge solutions in the areas of Artificial Intelligence, Semantic Technologies, analysis and large scale configuration software. 10

copyright 2014 Semantic Insights