Big Data Security Kevvie Fowler kpmg.ca
About myself Kevvie Fowler, CISSP, GCFA Partner, Advisory Services KPMG Canada Industry contributions
Big data security definitions Definitions Big data Datasets so large/complex they become difficult to work with using existing technology Big data technology Specialized technology developed to manage large/complex data sets
Big data security industry demand The big data landscape
Big data security industry demand Explosive growth is occurring within the big data market 2012 $11.6B 2018 $46.34B Apache Hadoop 54.7% growth (~2018) 20.9B market by 2018 Big Data Market By Types - Worldwide Forecasts & Analysis (2013 2018)
Big data security Hadoop architecture Hadoop architecture
Big data security The challenge The Hadoop security challenge Architectural design Sheer volume of data to be secured Minimal native security features
Big data security The challenge Can t you secure Hadoop with 3 rd party products? Several overlays on the market RBAC Logging Encryption The problem with many Hadoop security overlays Don t scale with the data Point solutions Can t substitute for ground-up security builds
Big data security The challenge Big data can be a perfect storm of risk for an organization Massive amount of data Little effective security Big data breaches are inevitable They will dwarf the large breaches of today Cost to recover Investigative abilities De-centralized storage You can significantly increase your protection against attack by following 8 steps
Big data security Step #1 Identify big data use data and associated security/privacy requirements 1 If you don t need sensitive data, don t store it Obfuscate sensitive information whenever possible
Hadoop security Step #2 Use a configuration management tool to deploy and manage your cluster 2 Logging Management Cluster Mgt. Solution
Hadoop security Step #3 Validation of nodes and requests 3 Validate nodes and client applications before admission to the cluster
Hadoop security Step #3 Validation of nodes and requests (continued) 3 Authentication By default there is no authentication Secure RPC & HTTP Web consoles (Hadoop s Web UIs, WebHDFS, and HttpFS) Simple Authentication and Security Layer (SASL) Kerberos Authorization Set your HDFS file permissions MapReduce ACL s
Hadoop security Step #4 Secure the underlying OS 4 Server hardening Encrypt sensitive data-at-rest
Hadoop security Step #5 5 Use transmission level security Most clusters use RPC, TCP/IP & HTTP SSL / TLS to authenticate and ensure privacy of communications between cluster nodes
Hadoop security Step #6 Have a choke point 6 Clients communicate directly with resource managers and nodes implement a choke point to block access to users/ip s as required.
Hadoop security Step #7 Secure Hadoop-related applications 7 Hadoop extensions 3 rd party applications
Hadoop security Step #7 Secure Hadoop-related applications Hive (continued) 7 Hive is a data warehouse system for Hadoop HiveQL is a language based on SQL that allows a user friendly front-end to MapReduce
Hadoop security Step #7 Secure Hadoop-related applications Hive 7 SQL Injection meets the Hive
Hadoop security Step #7 Secure Hadoop-related applications Hive (continued) 7 HiveQL includes many operators, functions and expressions commonly abused by SQL by injection attacks Count Union Distinct Wait for Sub queries Expressions joined by OR in a WHERE clause Comparisons between two constants Type of injection Simple SQL/ASP.NET HIVEQL/ HUE Dynamic SQL Injection X X Blind SQL Injection X X Stacked queries X X
Hadoop security Step #7 Secure Hadoop-related applications Hive (continued) 7 Protecting against HiveQL injection Accountability (user developed functions, views, logic) Security reviews of MapReduce/HiveQL applications Revoke access where possible Use Hive Server 2!
Hadoop security Step #8 Ensure your IR and Forensics program incorporates big data technology 8 Traditional IR/Forensics practices aren t effective against big data technology Potential for enormous organizational impact, little information on how to manage it 10
Hadoop security Future enhancements Upcoming Hadoop security enhancements HBASE Security (HBASE-6222) Token-based authentication (HADOOP-9466) Encrypted data at rest (HADOOP-9331) 10
Hadoop security References www.intel.com www.cloudera.com www.hortonworks.com
Thank you Kevvie Fowler, CISSP, GCFA Partner Advisory Services Office: (416) 777-3742 Email: kevviefowler@kpmg.ca
The information contained herein is of a general nature and is not intended to address the circumstances of any particular individual or entity. Although we endeavor to provide accurate and timely information, there can be no guarantee that such information is accurate as of the date it is received or that it will continue to be accurate in the future. No one should act on such information without appropriate professional advice after a thorough examination of the particular situation. 2013 KPMG LLP, a limited liability partnership and the Canadian member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative ( KPMG International ), a Swiss entity. All rights reserved. The KPMG name, logo and "cutting through complexity" are registered trademarks or trademarks of KPMG International Cooperative ("KPMG International"). 26