Fighting Cyber Fraud with Hadoop. Niel Dunnage Senior Solutions Architect



Similar documents
The Future of Data Management

The Future of Data Management with Hadoop and the Enterprise Data Hub

Data Security in Hadoop

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Upcoming Announcements

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

The Enterprise Data Hub and The Modern Information Architecture

Apache Sentry. Prasad Mujumdar

Cloudera Enterprise Data Hub in Telecom:

Hadoop & Spark Using Amazon EMR

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer,

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Hadoop Trends and Practical Use Cases. April 2014

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Big Data Management and Security

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera

Cloudera Enterprise Data Hub. GCloud Service Definition Lot 3: Software as a Service

More Data in Less Time

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

Deploying an Operational Data Store Designed for Big Data

HDP Hadoop From concept to deployment.

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Luncheon Webinar Series May 13, 2013

Ensure PCI DSS compliance for your Hadoop environment. A Hortonworks White Paper October 2015

HDP Enabling the Modern Data Architecture

Securing Hadoop in an Enterprise Context

How to avoid building a data swamp

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS PART 4 BEYOND MAPREDUCE...385

Encryption and Anonymization in Hadoop

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Certified Big Data and Apache Hadoop Developer VS-1221

Constructing a Data Lake: Hadoop and Oracle Database United!

Driving Growth in Insurance With a Big Data Architecture

Oracle Big Data Fundamentals Ed 1 NEW

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Interactive data analytics drive insights

MULTITENANCY AND THE ENTERPRISE DATA HUB:

Hadoop Ecosystem B Y R A H I M A.

June Production Hadoop systems in the enterprise

How To Manage Security On A Networked Computer System

SECURING YOUR ENTERPRISE HADOOP ECOSYSTEM

#TalendSandbox for Big Data

Big Data Analytics Nokia

Accelerating Enterprise Big Data Success. Tim Stevens, VP of Business and Corporate Development Cloudera

Comprehensive Analytics on the Hortonworks Data Platform

Ganzheitliches Datenmanagement

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Big Data must become a first class citizen in the enterprise

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Cloud and Data Center Security

White paper. The Big Data Security Gap: Protecting the Hadoop Cluster

Datameer Big Data Governance

Safe Harbor Statement

Oracle Big Data SQL Technical Update

Oracle Big Data Building A Big Data Management System

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Who Am I? Mark Cusack Chief Architect 9 years@rainstor Founding developer Ex UK Ministry of Defence Research InfoSec projects

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

Trend Micro. Secure virtual, cloud, physical, and hybrid environments easily and effectively INTRODUCTION

IBM Cloud Security Draft for Discussion September 12, IBM Corporation

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Big Data SQL and Query Franchising

Virtualizing Apache Hadoop. June, 2012

Are You Big Data Ready?

Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang Software Engineer, Cloudera April 07, 2014

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015

Dominik Wagenknecht Accenture

Professional Hadoop Solutions

Native Connectivity to Big Data Sources in MSTR 10

A Modern Data Architecture with Apache Hadoop

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Policy-based Pre-Processing in Hadoop

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn

Moving From Hadoop to Spark

How to Hadoop Without the Worry: Protecting Big Data at Scale

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Seven Things To Consider When Evaluating Privileged Account Security Solutions

Dell In-Memory Appliance for Cloudera Enterprise

BIG DATA TRENDS AND TECHNOLOGIES

HADOOP. Revised 10/19/2015

Data Security For Government Agencies

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Transcription:

Fighting Cyber Fraud with Hadoop Niel Dunnage Senior Solutions Architect 1

Summary Big Data is an increasingly powerful enterprise asset and this talk will explore the relationship between big data and cyber security, how we preserve privacy whilst exploiting the advantages of data collection and processing. Big Data technologies provide both governments and corporations powerful tools to offer more efficient and personalized services. The rapid adoption of these technologies has of course created tremendous social benefits. Unfortunately unwanted side effects are the potential rich pickings available to those with malicious intentions. Increasingly, the sophisticated cyber attacker is able to exploit the rich array public data to build detailed profiles on their adversaries to support their malicious intentions. 2 2014 Cloudera, Inc. All rights reserved.

Agenda Data: - The new oil Defend your data The security value of Big Data Source: Grant Thornton LLP 2014 Corporate General Counsel Survey, conducted by American Lawyer Media 3 2014 Cloudera, Inc. All rights reserved.

Cyber Security:- Data is a valuable commodity DDOS Data Exfiltration Confidential customer records Transaction data Reputation attack False flag Fake data Insider Threat Operations designed to deceive in such a way that the operations appear as though they are being carried out by entities, groups or nations other than those who actually planned and executed them http://en.wikipedia.org/wiki/false_flag The @SQLiNairb hacker has released a database dump from a US fantasy football website (http://www.fftoday.com/), claiming that it was timed to coincide with the NFL draft @security_511 has continued to support OpSaudi, claiming further attacks on websites connected to Saudi Aramco. Anonymous Italy and Operation Green Rights (OpGR) have released the contents of an email account connected to an Italian steel producer, in connection to accusations of pollution against the company 4 2014 Cloudera, Inc. All rights reserved.

Typical Security Layers Type Access Authentication Authorization Encryption at Rest Encryption in transport Auditing Policy / Procedure Example Physical (lock and key), Virtual (Firewalls, VLANS) Logins verify users are who they say they are Permissions verify what a user can do Data protection for files on disk Data protection on the wire Keep track of who accessed what Protect against Human Error & Social Engineering 5 2014 Cloudera, Inc. All rights reserved.

Cloudera s Approach to Hadoop Security Comprehensive Standards-based Authentication Centralized, Granular Authorization Native Data Protection End-to-End Data Audit and Lineage Compliance-Ready Meet compliance requirements HIPAA, PCI-DSS, Encryption and key management Transparent Security at the core Minimal performance impact Compatible with new components Insight with compliance 6 2014 Cloudera, Inc. All rights reserved.

Defense: - Security Features Hadoop Security: - Kerberos simplified deployment with Cloudera Manager Sentry: - provides unified authorization with a single policy for Hive, Impala and Search HDFS Extended ACL s and HBase cell level access control Navigator encrypt and key trustee deliver compliant data security Via Gazzang acquisition Navigator provides data management layer including audit, access control reviews, data classification and discovery, and lineage 7 2014 Cloudera, Inc. All rights reserved.

Kerberos Security Perimeter Security Guarding access to the cluster itself Technical Concepts: Authentication Network isolation Kerberos Kerberos: A computer network authentication protocol that works on basis of tickets to allow nodes to prove identity to each other in a secure manner using encryption extensively Messages are exchanged between: Client Server Kerberos Key Distribution Center (KDC). Note this is not part of Hadoop, but most Linux Distros come with MIT Kerberos KDC. Passwords are not sent across network, Instead passwords are used to compute encryption keys Authentication status is cached (don t need to send credentials with each request) Timestamps are essential to Kerberos (make sure system clocks are synchronized!) 8 2014 Cloudera, Inc. All rights reserved.

Apache Sentry Access Security Defining Access what users and applications can do with data Technical Concepts: Permissions Authorization Sentry Sentry provides unified authorization across multiple access paths A single authorization policy will be enforced for Impala, Hive and Search Role based access at Server, Database, Table or View granularity Multi-tenant: Separate policies for each database / schema 9 2014 Cloudera, Inc. All rights reserved.

Cloudera Navigator Visibility Reporting Visibility on where data came from and how it s being used Technical Concepts: Auditing Lineage Cloudera Navigator Auditing and Access Management View, granting and revoke permissions across the Hadoop stack Identify access to a data asset around the time of security breach Generate alert when a restricted data asset is accessed Lineage Given a data set, trace back to the original source Understand the downstream impact of purging/modifying a data set Metadata Tagging and Discovery Search through metadata to find data sets of interest Given a data set, view schema, metadata and policies Lifecycle Management Automate periodic ingestion of data Compress/encrypt a data set at rest Purge a dataset/replicate data set to a remote site 10 2014 Cloudera, Inc. All rights reserved.

11 2014 Cloudera, Inc. All rights reserved.

Encryption at rest Navigator Encrypt and Key Trustee Encrypt any File, Directory AES-256 Encryption Unique Access controls Process Based, NOT users / groups 100% Transparent Separation of Duties Key Management AES encryption keys stored on separate Key Trustee server Key manager breach, data is safe Data Server breach, data is safe Process Based ACL s Linux File, Directory AES-256 Encryption Linux Server / VM Encrypt client GPG Linux Server / VM Key Trustee Server 12 Gazzang gazzang.com/products/cloudencrypt-for-aws

Our Design Strategy The Enterprise Data Hub A fully integrated Hadoop ecosystem One pool of data One metadata model One security framework One set of system resources Metadata, Navigator Select CPU_Met from application WHERE (USAGE > 1000) LEFT OUTER JOIN ON application_id where application_type IS Non_Critical Batch Processing Spark, MAPREDUCE, HIVE & PIG Interactive SQL CLOUDERA IMPALA HDFS TEXT, RCFILE, PARQUET, AVRO, ETC. Interactive Search CLOUDERA SEARCH Engines Machine Learning Spark Mlib,MAHOUT, Oryx Resource Management YARN Storage Integration graph.vertices.filter{case(id, _) => id==13669222}.collect Math & Statistics SAS, R Hbase/ Accumulo RECORDS REST (Webhdfs), File (Fuse) Flume, Sqoop Stream Processing Spark streaming Security, Navigator, Sentry 13 2014 Cloudera, Inc. All rights reserved.

Enterprise Data Hub Users Cases OSINT Analysis Fraud Detection Log Processing Performance Management Risk Manageme nt Innovation and Advantage Ask bigger questions in the pursuit of discovering something incredible Operational Efficiency Perform existing workloads faster, cheaper, better ETL Acceleration Active Archive EDW Optimization Deep Exploratory BI Historical Compliance 14 2013 Cloudera, Inc. All Rights Reserved.

Offence:- Fraud Detection Fully Automated at scale User Cases Distributed parallel execution with chained joins Historical processing at scale Machine Learning, malware/anomaly detection, spam filters etc Combined real time and batch predictors 15 15

Big Data Economics Ask bigger questions Predictably process large data sets Linear scaling Robust and economic crypto security Creative fail fast innovation Powers productivity insights Increasing infrastructure ROI Increasing business ROI Defeating fraudulent activity Evaluating risk Innovate Predict Ingest Discover 16 2013 Cloudera, Inc. All Rights Reserved.

Data Ingest NRT Ingest Flume Optimized to flow real time event data into the Hadoop cluster Spark Streaming for near real time micro batch aggregations Twitter streaming Kafka Log API Bulk Load Sqoop for structured Fuse file system access API Web / Hue Data Enrichment Flume interceptors Kite Morplines module Configuration based interceptors that can enrich data. For example extracting facets, entity extraction applying regulatory tags collect Client Client Client Client enrich Agent Agent Agent buffer store 17 2014 Cloudera, Inc. All rights reserved.

Near Real time Access to threats View the geographic distribution of Slowloris DDOS taken from Apache web server logs Help isolate unpatched servers Identify source of attacks LogUtils.createStream(...).filter(_.getText.contains( 408 Error")).countByWindow(Seconds(10)) stream.join(historiccounts).filter { case (word, (curcount, oldcount)) => curcount > oldcount } 18 2014 Cloudera, Inc. All rights reserved.

Machine Learning Real-time large-scale machine learning predictive analytics infrastructure build on Hadoop Collaborative filtering and recommendation Classification and regression, Clustering 19 19

Internal Threat Dashboard Overall Risk Assessment: Risk Per Category: Online Banking Access: Public Records: Financial transaction rate: Online Activity: Social Media Activity: Regular purchases Foreign Travel: Ranked List of High Risk Personnel: Name Risk Score Kim Burgess 94 Guy Hughes 93 Jeff Maclaen 87 Ed Snowden 86 Mary Smith 82 Open Cases: Customers with Risk Scores that Recently Changed Name Old Score New Score John Smith 34 94 Rob Jones 26 93 Jim Fisher 17 87 Henry Johnson 45 86 Sue Leefield 12 82 Name Risk Score Customers Dodgy Ecomm.biz 94 John Smith, Rob Jones. Brentford Shopping Centre 93 Jim Fisher, Henry Johnson 20

21 Analytics