Fighting Cyber Fraud with Hadoop Niel Dunnage Senior Solutions Architect 1
Summary Big Data is an increasingly powerful enterprise asset and this talk will explore the relationship between big data and cyber security, how we preserve privacy whilst exploiting the advantages of data collection and processing. Big Data technologies provide both governments and corporations powerful tools to offer more efficient and personalized services. The rapid adoption of these technologies has of course created tremendous social benefits. Unfortunately unwanted side effects are the potential rich pickings available to those with malicious intentions. Increasingly, the sophisticated cyber attacker is able to exploit the rich array public data to build detailed profiles on their adversaries to support their malicious intentions. 2 2014 Cloudera, Inc. All rights reserved.
Agenda Data: - The new oil Defend your data The security value of Big Data Source: Grant Thornton LLP 2014 Corporate General Counsel Survey, conducted by American Lawyer Media 3 2014 Cloudera, Inc. All rights reserved.
Cyber Security:- Data is a valuable commodity DDOS Data Exfiltration Confidential customer records Transaction data Reputation attack False flag Fake data Insider Threat Operations designed to deceive in such a way that the operations appear as though they are being carried out by entities, groups or nations other than those who actually planned and executed them http://en.wikipedia.org/wiki/false_flag The @SQLiNairb hacker has released a database dump from a US fantasy football website (http://www.fftoday.com/), claiming that it was timed to coincide with the NFL draft @security_511 has continued to support OpSaudi, claiming further attacks on websites connected to Saudi Aramco. Anonymous Italy and Operation Green Rights (OpGR) have released the contents of an email account connected to an Italian steel producer, in connection to accusations of pollution against the company 4 2014 Cloudera, Inc. All rights reserved.
Typical Security Layers Type Access Authentication Authorization Encryption at Rest Encryption in transport Auditing Policy / Procedure Example Physical (lock and key), Virtual (Firewalls, VLANS) Logins verify users are who they say they are Permissions verify what a user can do Data protection for files on disk Data protection on the wire Keep track of who accessed what Protect against Human Error & Social Engineering 5 2014 Cloudera, Inc. All rights reserved.
Cloudera s Approach to Hadoop Security Comprehensive Standards-based Authentication Centralized, Granular Authorization Native Data Protection End-to-End Data Audit and Lineage Compliance-Ready Meet compliance requirements HIPAA, PCI-DSS, Encryption and key management Transparent Security at the core Minimal performance impact Compatible with new components Insight with compliance 6 2014 Cloudera, Inc. All rights reserved.
Defense: - Security Features Hadoop Security: - Kerberos simplified deployment with Cloudera Manager Sentry: - provides unified authorization with a single policy for Hive, Impala and Search HDFS Extended ACL s and HBase cell level access control Navigator encrypt and key trustee deliver compliant data security Via Gazzang acquisition Navigator provides data management layer including audit, access control reviews, data classification and discovery, and lineage 7 2014 Cloudera, Inc. All rights reserved.
Kerberos Security Perimeter Security Guarding access to the cluster itself Technical Concepts: Authentication Network isolation Kerberos Kerberos: A computer network authentication protocol that works on basis of tickets to allow nodes to prove identity to each other in a secure manner using encryption extensively Messages are exchanged between: Client Server Kerberos Key Distribution Center (KDC). Note this is not part of Hadoop, but most Linux Distros come with MIT Kerberos KDC. Passwords are not sent across network, Instead passwords are used to compute encryption keys Authentication status is cached (don t need to send credentials with each request) Timestamps are essential to Kerberos (make sure system clocks are synchronized!) 8 2014 Cloudera, Inc. All rights reserved.
Apache Sentry Access Security Defining Access what users and applications can do with data Technical Concepts: Permissions Authorization Sentry Sentry provides unified authorization across multiple access paths A single authorization policy will be enforced for Impala, Hive and Search Role based access at Server, Database, Table or View granularity Multi-tenant: Separate policies for each database / schema 9 2014 Cloudera, Inc. All rights reserved.
Cloudera Navigator Visibility Reporting Visibility on where data came from and how it s being used Technical Concepts: Auditing Lineage Cloudera Navigator Auditing and Access Management View, granting and revoke permissions across the Hadoop stack Identify access to a data asset around the time of security breach Generate alert when a restricted data asset is accessed Lineage Given a data set, trace back to the original source Understand the downstream impact of purging/modifying a data set Metadata Tagging and Discovery Search through metadata to find data sets of interest Given a data set, view schema, metadata and policies Lifecycle Management Automate periodic ingestion of data Compress/encrypt a data set at rest Purge a dataset/replicate data set to a remote site 10 2014 Cloudera, Inc. All rights reserved.
11 2014 Cloudera, Inc. All rights reserved.
Encryption at rest Navigator Encrypt and Key Trustee Encrypt any File, Directory AES-256 Encryption Unique Access controls Process Based, NOT users / groups 100% Transparent Separation of Duties Key Management AES encryption keys stored on separate Key Trustee server Key manager breach, data is safe Data Server breach, data is safe Process Based ACL s Linux File, Directory AES-256 Encryption Linux Server / VM Encrypt client GPG Linux Server / VM Key Trustee Server 12 Gazzang gazzang.com/products/cloudencrypt-for-aws
Our Design Strategy The Enterprise Data Hub A fully integrated Hadoop ecosystem One pool of data One metadata model One security framework One set of system resources Metadata, Navigator Select CPU_Met from application WHERE (USAGE > 1000) LEFT OUTER JOIN ON application_id where application_type IS Non_Critical Batch Processing Spark, MAPREDUCE, HIVE & PIG Interactive SQL CLOUDERA IMPALA HDFS TEXT, RCFILE, PARQUET, AVRO, ETC. Interactive Search CLOUDERA SEARCH Engines Machine Learning Spark Mlib,MAHOUT, Oryx Resource Management YARN Storage Integration graph.vertices.filter{case(id, _) => id==13669222}.collect Math & Statistics SAS, R Hbase/ Accumulo RECORDS REST (Webhdfs), File (Fuse) Flume, Sqoop Stream Processing Spark streaming Security, Navigator, Sentry 13 2014 Cloudera, Inc. All rights reserved.
Enterprise Data Hub Users Cases OSINT Analysis Fraud Detection Log Processing Performance Management Risk Manageme nt Innovation and Advantage Ask bigger questions in the pursuit of discovering something incredible Operational Efficiency Perform existing workloads faster, cheaper, better ETL Acceleration Active Archive EDW Optimization Deep Exploratory BI Historical Compliance 14 2013 Cloudera, Inc. All Rights Reserved.
Offence:- Fraud Detection Fully Automated at scale User Cases Distributed parallel execution with chained joins Historical processing at scale Machine Learning, malware/anomaly detection, spam filters etc Combined real time and batch predictors 15 15
Big Data Economics Ask bigger questions Predictably process large data sets Linear scaling Robust and economic crypto security Creative fail fast innovation Powers productivity insights Increasing infrastructure ROI Increasing business ROI Defeating fraudulent activity Evaluating risk Innovate Predict Ingest Discover 16 2013 Cloudera, Inc. All Rights Reserved.
Data Ingest NRT Ingest Flume Optimized to flow real time event data into the Hadoop cluster Spark Streaming for near real time micro batch aggregations Twitter streaming Kafka Log API Bulk Load Sqoop for structured Fuse file system access API Web / Hue Data Enrichment Flume interceptors Kite Morplines module Configuration based interceptors that can enrich data. For example extracting facets, entity extraction applying regulatory tags collect Client Client Client Client enrich Agent Agent Agent buffer store 17 2014 Cloudera, Inc. All rights reserved.
Near Real time Access to threats View the geographic distribution of Slowloris DDOS taken from Apache web server logs Help isolate unpatched servers Identify source of attacks LogUtils.createStream(...).filter(_.getText.contains( 408 Error")).countByWindow(Seconds(10)) stream.join(historiccounts).filter { case (word, (curcount, oldcount)) => curcount > oldcount } 18 2014 Cloudera, Inc. All rights reserved.
Machine Learning Real-time large-scale machine learning predictive analytics infrastructure build on Hadoop Collaborative filtering and recommendation Classification and regression, Clustering 19 19
Internal Threat Dashboard Overall Risk Assessment: Risk Per Category: Online Banking Access: Public Records: Financial transaction rate: Online Activity: Social Media Activity: Regular purchases Foreign Travel: Ranked List of High Risk Personnel: Name Risk Score Kim Burgess 94 Guy Hughes 93 Jeff Maclaen 87 Ed Snowden 86 Mary Smith 82 Open Cases: Customers with Risk Scores that Recently Changed Name Old Score New Score John Smith 34 94 Rob Jones 26 93 Jim Fisher 17 87 Henry Johnson 45 86 Sue Leefield 12 82 Name Risk Score Customers Dodgy Ecomm.biz 94 John Smith, Rob Jones. Brentford Shopping Centre 93 Jim Fisher, Henry Johnson 20
21 Analytics