Primary Key Associates Limited



Similar documents
Microsoft Training and Certification Guide. Current as of December 31, 2013

MEng, BSc Applied Computer Science

MEng, BSc Computer Science with Artificial Intelligence

How To Write A Monitoring System For Free

The University of Jordan

The Cyber Threat Profiler

Data Modeling for Big Data

Information Architecture

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Big Data Integration: A Buyer's Guide

Course Syllabus. Maintaining a Microsoft SQL Server 2005 Database. At Course Completion

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Masters in Advanced Computer Science

Dynamic Cloud Management

IBM SECURITY QRADAR INCIDENT FORENSICS

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

Enterprise level security, the Huddle way.

DATA ANALYTICS SERVICES. G-CLOUD SERVICE DEFINITION.

Masters in Information Technology

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Masters in Human Computer Interaction

Masters in Artificial Intelligence

Masters in Computing and Information Technology

3. Provide the capacity to analyse and report on priority business questions within the scope of the master datasets;

Inside Track Research Note. In association with. Enterprise Storage Architectures. Is it only about scale up or scale out?

Thanks to the innovative technology that we have developed, lucierna is able to monitor all of the client s code with an overhead of less than 2%.

Client Overview. Engagement Situation. Key Requirements

Masters in Networks and Distributed Systems

Application Performance Management for Enterprise Applications

(Funding for training support means all you pay is the apprentice s wage)

Inside Track Research Note. In association with. Data Protection and RAID. Modern business needs new storage protection

Data as a Service Virtualization with Enzo Unified

Compliance in Office 365 What You Should Know. Don Miller Vice President of Sales Concept Searching

capabilities statement

Next Generation Business Performance Management Solution

ORACLE UTILITIES ANALYTICS

and the world is built on information

Contents The College of Information Science and Technology Undergraduate Course Descriptions

MANAGEMENT SUMMARY INTRODUCTION KEY MESSAGES. Written by: Michael Azoff. Published June 2015, Ovum

Reconciliation and best practices in a configuration management system. White paper

Five Secrets to SQL Server Availability

Microsoft Training and Certification Guide. Current as of March 16, 2015

Client Study Portfolio

SAP HANA. Markus Fath, SAP HANA Product Management June 2013

Master of Science in Information Systems & Security Management. Courses Descriptions

Course Bachelor of Information Technology majoring in Network Security or Data Infrastructure Engineering

Data Deduplication: An Essential Component of your Data Protection Strategy

ediscovery Solution for Archiving

Data Refinery with Big Data Aspects

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.

SECURITY DOCUMENT. BetterTranslationTechnology

Interactive Application Security Testing (IAST)

CORE CLASSES: IS 6410 Information Systems Analysis and Design IS 6420 Database Theory and Design IS 6440 Networking & Servers (3)

Please Note: Temporary Graduate 485 skills assessments applicants should only apply for ANZSCO codes listed in the Skilled Occupation List above.

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

Middleware- Driven Mobile Applications

G-Cloud Service Definition Canopy Big Data proof of concept Service SCS

IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst

KASPERSKY SECURITY INTELLIGENCE SERVICES. EXPERT SERVICES.

Inside Track Research Note. In association with. Storage Quality of Service Management. The automation imperative

White. Paper. EMC Isilon: A Scalable Storage Platform for Big Data. April 2014

White Paper Software Quality Management

Unifying Search for the Desktop, the Enterprise and the Web

Enterprise Security Architecture for Cyber Security. M.M.Veeraragaloo 5 th September 2013

Understanding the Value of In-Memory in the IT Landscape

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

Information Technology Career Field Pathways and Course Structure

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

Top 10 IT Trends that will shape David Chin Chair BICSI Southeast Asia

Testing Big data is one of the biggest

Enterprise Workloads on the IBM X6 Portfolio: Driving Business Advantages

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Cisco Data Preparation

Mission-Critical Database with Real-Time Search for Big Data

Additional Offeror Qualifications: Not applicable.

Can I customize my identity management deployment without extensive coding and services?

Special Item No Information Technology Professional Services. Government Site GSA Rate Effective March 6, 2015

Course 20489B: Developing Microsoft SharePoint Server 2013 Advanced Solutions OVERVIEW

Diploma Of Computing

CESG Certification of Cyber Security Training Courses

Niara Security Intelligence. Overview. Threat Discovery and Incident Investigation Reimagined

POWERFUL SOFTWARE. FIGHTING HIGH CONSEQUENCE CYBER CRIME. KEY SOLUTION HIGHLIGHTS

DE-20489B Developing Microsoft SharePoint Server 2013 Advanced Solutions

Technical Overview Simple, Scalable, Object Storage Software

White Paper: 5GL RAD Development

Niara Security Analytics. Overview. Automatically detect attacks on the inside using machine learning

HOW TO DO A SMART DATA PROJECT

The introduction covers the recent changes is security threats and the effect those changes have on how we protect systems.

Jun 2015 to Aug Computer Science/ Technology related. Information Systems

Search and Real-Time Analytics on Big Data

can you effectively plan for the migration and management of systems and applications on Vblock Platforms?

Eye with Intelligence. Connecting CCTV to Big Data, Hadoop and R

Guidelines for Best Practices in Data Management Roles and Responsibilities

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 1, March, 2013 ISSN:

Securing the endpoint and your data

Transcription:

is at the core of Primary Key Associates work Our approach to analytics In this paper Andrew Lea, our Technical Director in charge of, describes some of the paradigms, models, and techniques we have developed to allow us to accomplish remarkable things for our clients. Defining an Analytic Analytics analyse data in response to queries, subject to constraints, and produce lists of answers We consider an analytic to have these components (some of which are optional): Analytics consume and analyse data in response to queries, and produce answers. Although analytics do not of themselves own data, nor have a user interface, and are not a data visualisation, they are normally part of a system that does. Primary Key analytics are frequently based on Artificial Intelligence techniques, and may operate on unstructured data, such as summarising text, as well as numbers. Paradigms and Principles Five business focused analytic paradigms Based on years of experience we have developed five paradigms which guide our use of analytics, and in particular to keep them focused on solving the particular business problem at hand. These paradigms are: Answer Based Analytics Scenario Based Analytics The Primary Key Principle: Minimal Complexity Minimum Infrastructure Impact Evidential Quality

Answer Based Analytics Our analytics produce directly actionable intelligence Many tools present an attractive visualisations of analysed data such as social media - but still beg the question, what does this mean to the business? This question then has to be answered by analysts before feeding up to business leaders. Our systems produce results that are immediately actionable, because they directly address business questions, supported with evidence. To know which questions to answer we work with clients to identify their interests, and the analytics which produce the answers they need. (If we have no such analytic, we develop one in short order.) By way of example, Primary Key Distil is designed to turn data consumer transactions, account data, transaction history into actionable business answers, rather than the more typical complex technical reports, which then need to be further refined by analysts. [For more information, please see our paper on Fraud Discovery with Primary Key Distil]. Scenario Based Analytics Describe the desired outcome, not the steps to find it This recent refinement of Answer Based Analytics observes that people often cannot describe the criteria by which that which they are looking for can be located, but would be able to recognise it when they see it. Using scenario based analytics we work with clients to develop for-instance pictures of scenarios they would be concerned with, and use those pictures to develop scenarios which our analytics look for. We use an interesting declarative analytic language that we developed, in which the desired answer is modelled, and the system itself works out the steps to be taken to find it.

The Primary Key Principle: Minimal Complexity Options produce complexity We keep it simple, so it just works Many systems give end-users multiple options to set which has several disadvantages: Different users with different settings, get different results, causing confusion. Users want to get their job done, rather than be experts in complex software. The end-user is often the person least qualified to know what settings should apply - especially when the developer could not decide what they should be. The Primary Key Principle is therefore to: Have no settings for the user to adjust, and; Where settings are needed (under the hood, as it were), to decide them in software, handling the complexity ourselves. Lowest possible impact on your Information Systems Minimum Infrastructure Impact We design our systems and services to have minimum impact on your IT. If we are analysing your data, we always: load it read-only, so there is no possibility whatsoever of our software corrupting your existing data or interrupting your systems. Operate within your security perimeter, to ensure there can be no data loss. For example, our fraud discovery process is generally deployed like this: If we are providing data to you, we provide it via secure web sites, so no IT modifications are required. We know what IT departments prefer. Evidential Quality Processing High value results Based on our many years experience in capturing and analysing digital data for use in legal matters, we designed our systems to ensure that we carry out data acquisition and analysis to an evidential quality, so that it can be produced if necessary as part of legal proceedings.

Data and Data Acquisition Different data presents different challenges Data analytics clearly requires data to analyse, and indeed, it is appropriate when contemplating data analytics, to ask what data would best answer this problem? And what data do I have? Before data can be analysed, it must be imported. Different types of data present different challenges, but we have experience in analysing: Structured data including transactions. Social media for which we have our own advanced acquisition engines. [For more information on this, please see our Social Media Personnel Protection paper.] Natural language data we can use not only metadata about text, but drawing on thirty years of natural language analysis can and normally do use the meaning within that text. In most languages. Image data we can analyse both image metadata, for example for litigation, and content, in which we have experience in regard to space-craft remote sensing. Video data again, we can analyse and improve video, for example, extracting good quality images from low quality video. Data Storage We store data safely, evidentially, and flexibly Our data storage is underpinned by three principles: 1. We use our own storage infrastructure (not in the cloud ) so that we can be sure where that data is for the whole of its lifecycle. 2. We store data that we acquire evidentially, so we can prove where it came from, and when. 3. We store data using a model that reflects the acquisition. Only when we come to analyse that data do we re-model it to the model on which that analytic is based. This helps keep us future proof. Data Models Data analytics models the world to make discoveries To understand the world we live in, people make models of it. For example, the model we all carry in our heads of where a moving ball tends to go next enables us to catch a ball. In a similar way, data analytics needs to have a model of the world into which the data fits. This can either be explicit or, for simpler scenarios, implicit and unstated. We have developed several models in which to analyse the world, which include: The Primary Key Seven Layer Model of Social Analytics The Three Entity Model The Bayesian Fraud Model (not discussed here)

The Primary Key Seven Layer Model of Social Analytics Structure from chaos via seven layers We analysis social media in terms of seven layers of actors, publications, and data: 1. Population focuses on the overall ebb and flow of ideas 2. Groups groups of actors with similar interests 3. Communities actors who share interests, and are associated in communities 4. Actors individual people, companies, web sites 5. Publications individual publications, such as tweets or web pages 6. Cleaned data identified publications, rendered in a common way 7. Raw data the raw data acquired by the acquisition engines The human social world consists of people, events, and assets (which might be a place) The Three Entity Model We designed our three entity model for conducting scenario based analytics on data which describes the interaction of people, web sites, companies, and so forth: in other words, the human social world. We often prefer to use triples to represent data when using scenario-based analytics. (Triples are a particularly flexible way of representing data, and consists of subject-link-predicate. For example, Andrew Lea - works-at - Primary Key.) This model has three supra-entities, and mitigates a common trap of triples, namely the need to reify or redesign triples when designs change. The three supra-entities are: Actors: people, web sites, companies anyone who can do or say something. Actors participate in Events: such as meetings, conversations, emails, tweets, which occur at a point in time either at or with: Assets: are things which can be owned, such as locations (or cyber-locations) or things, such as a car. Each entity has very few properties, namely: Id a unique reference Typ what sort of entity it is Name its name Details one set of details, description, or text. Anything else is (in this model) better represented as a link to another entity. Entities have links between each other, including within the same type. The three entity model looks like this:

Although it may look complex, it has reduced the world to only three classes of entity, four properties, and a few link types. We have not found a scenario we wish to model with it, but cannot.

Analysing Big Data Smart and fast parallel processing We overcome the challenge of big data with three complementary approaches: 1. Parallel processing 2. Powerful integrated architectures 3. Smart algorithms Order(N) processing We use these approaches on a daily basis. Our Illuminate service, for example, uses them to enable complex questions about social media to be answered in a timely fashion. Without these approaches, our daily processing requirements would, we estimate, require about three months. Parallel Processing Parallel processing in Primary Key Illuminate In general, we design our analytics to proceed in parallel. Our Primary Key Illuminate service uses this parallel architecture, which is supported by the highly virtualised infrastructure that we run, in which CPU cores can be re-allocated as required. For resilience, the architecture is run cross-site duplicated and cross-linked, providing a hot backup. AutoPilot both schedules acquisition and analytic builds, and cross checks on the system health. AutoMonitor, not shown on the diagram, monitors health of the replicated system, and raises alerts if there are problems.

Powerful Multi-Technology Integrated Architectures Multiple techniques to address complex questions We use multiple AI (artificial intelligence) and data analytic techniques to solve the requisite problem. Here, for example, is an AI architecture designed to allow queries expressed as natural language emails, on spacecraft image data: We answer hard questions despite the growth in processing time implied by big data This architecture used natural language processing, image understanding, and inferencing with (as it happens) a partially self-deducing knowledge base. (In this case, the incoming email caused the system to induct the most appropriate knowledge base to use to then answer the question.) It was able to operate in multiple languages. Order (N) Processing It is an unfortunate property of the real world that the most useful questions to address are often the hardest, and so it is with data analytics. For example, it is frequently useful to know which of many items are like which others: perhaps, which financial transactions are so similar to other transactions that they indicate a potential fraud. For this type of question, there is really no avoiding the all-against-all query. Generally, data analytic houses will avoid even mentioning this type of query, since they cannot do it on big data. The reason is that it runs in Order(N 2 ): the time it takes is proportional to the square of the number of data items. However fast your processing, N 2 can always grow to large and with big data, it does.

We developed a smart heuristic which allows questions to be tackled in Order(N). As a heuristic, it necessarily does not visit all pairs, so it can miss viable answers. The question is, how often does that happen? We conducted some exhaustive tests and measured the accuracy as never dropping below 99.97%, which of course is considerably greater than any similarity metric that might be employed anyway. And the alternative is to simply have no answer. The graph shows the growth in execution time. Using Primary Key Analytics Our analytics at your service You can use our analytics in several ways, for: Social media analytics, by taking our Illuminate service Fraud discovery, by taking our Distil service Custom analytics, including investigations and trial support, by engaging us on a consultancy basis Integrating our analytics into your systems and services, by discussing integration and licensing Using our skills to enhance your developments, by engaging our programming and design consultancy services

About us Andy Clark and Andrew Lea formed Primary Key Associates Limited in 2010 to build a team of experts that delivers world-class consultancy and services to businesses both large and small. Expert consultancy and services Primary Key Associates are experienced professionals in IT and related areas. Our team has in-depth knowledge in: Information Security, Cyber Security and Physical Security Cyber threat intelligence and Competitive Intelligence Open source investigations Business Continuity and Disaster Recovery Fraud and money-laundering discovery Civil and criminal investigations Expert Witness Digital Forensics Programme and Project Management Crisis Management, including data loss emergencies Enterprise and Security Architecture, including SABSA and TOGAF Systems and Software Development Cryptography Embedded systems, including real-time and spacecraft systems Virtualisation Servers and mainframes IBM Mobile platforms ios and Android Web platforms - HTTP, JavaScript, HTML, CSS Computer languages - Assembler, C, C++, C#, Fortran, Delphi, Java, Python, SQL Artificial Intelligence, including computational linguistics and image analysis Supervised and unsupervised learning timely analysis of very big data sets We invest resources in generating IPR. Our current portfolio includes: Primary Key Illuminate a social media analysis and intelligence service Primary Key Distil a fraud detection toolkit we deploy on client sites Patents to protect our growing portfolio of inventions e: cyberinfo@primarykey.co.uk w: www.primarykey.co.uk t: +44 1403 599900 @pkaluk