Big Data: Controlling the Perfect Storm September 24, 2013 Start Time: 9 AM US Pacific, Noon US Eastern, 5 pm London 1
2 Generously sponsored by:
Welcome Conference Moderator Matt Mosley Northern Virginia, USA Chapter ISSA Web Conference Committee 3
Agenda Speakers Chris Diehl Principal and Co-Founder, The Data Guild Kathy Zeidenstein Technology Evangelist, IBM InfoSphere Guardium Open Panel with Audience Q&A Closing Remarks 4
Organizational Resilience in an Uncertain World Chris Diehl Principal and Co-Founder The Data Guild 5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Question and Answer Chris Diehl Principal and Co-Founder The Data Guild 27 28
Data protection for the new world of Hadoop and NoSQL system Kathy Zeidenstein Technology Evangelist IBM InfoSphere Guardium krzeide@us.ibm.com 5
Agenda Overview of NoSQL and Hadoop Protecting data at rest (file encryption) Protecting data in use (monitoring) Integrating with SIEM
Information protection in the new era of computing Sensitive data is everywhere; Traditional boundaries don t exist Business Application Systems Unstructured Data File Systems (SAP, PeopleSoft, Oracle Financials, In-house, CRM, etc.) Application Server v Office documents, PDF, Vision, Audio & other Fax/Print Servers File Servers Remote locations & Systems Storage & Backup Systems SAN/NAS Backup Systems Security & Other Systems (Event logs, Error logs Cache, Encryption keys, & other secrets) Security Systems Structured Data Database Systems (SQL, Oracle, DB2, Informix, MySQL) Database Server NoSQL and Hadoop Data Communications VoIP Systems FTP/Dropbox Server Email Servers (MongoDB, CouchDB, Hbase, Cassandra, HDFS)
NoSQL and Hadoop 101 NoSQL is a generic term for a very nongeneric landscape of data stores Hadoop is a framework for processing large and varied data sets with low cost at a high degree of fault tolerance. Components include (among others): A file system (Hadoop File System HDFS) A framework for processing data (Map- Reduce) May include a NoSQL Database HBase SQL Not Only SQL Hive Application MapReduce MapReduce Oozie HBase Storage HDFS
Categories of NoSQL Source: Akmal Chaudhri s NoSQLpresentation:
Market is growing and fragmented Hadoop-NoSQL Software and Services Market Forecast 2012-2017 http://wikibon.org/wiki/v/hadoop-nosql_software_and_services_market_forecast_2012-2017
FUD (for a good reason). more than half (56%) [senior-lever IT and security respondents admitted that these security concerns have kept them from starting or finishing cloud or big data projects. 1 Big Data may well present a rich target of opportunity for cybercriminals. 2 1Study conducted by Voltage at InfoSecurity Europe in April 2013 with over 300 IT professionals Reported in Dark Reading: http://www.darkreading.com/management/over-half-of-big-data-cloud-projects-st/240155524 2 Rob Livingstone in CFO.com http://www3.cfo.com/article/2013/5/analytics_big-data-concerns-promises-cfo-role-cyber-data-security-livingstone "Security and access control are part of the reason why Hadoop is not ready to replace relational databases" 3 3David Menninger, an analyst with Ventana Research. Quoted in http://www.computerworld.com/s/article/9221652/it_must_prepare_for_hadoop_security_issues
Perimeter Security is Not Enough Dynamic Data (in use) Static Data (at rest)
Protecting data at rest (encryption) WHO is attempting to access protected data? Configure groups, or applications who can access protected data WHAT data is being accessed? Configure appropriate file and directory access WHEN is the data being accessed? Configure a range of hours and days of the week for authorized access HOW is the data being accessed? Configure allowable file system operations allowed to access the data e.g. read, write, delete, rename, application or process, etc. EFFECT: Permit; Deny; Encrypt; Audit $%#@!*(&^$%$% ^&*()(*&^%$#@#$ %^&*DFGHJTR#$ Static Data (at rest)
Data Encryption Architecture Authenticated Users Applications DBMS Server server / file File server ftp server DE Agent SSL x.509 Certificates Web Administration https File System Active /Active IBM DE Server Data Encryption Security Server Policy and Key Management Centralized administration Separation of duties Key, Policy, Audit Log Store Online Files 38
Protecting big data at rest All data sources potentially contain sensitive information Data is distributed as needed throughout the cluster by the Big Data application Deploy encryption agents to all systems hosting Data Stores Agents protect the data store at the file system or volume level 39 From Vormetric
Implement database activity monitoring Create a secure, detailed, verifiable audit trail of all database activities User activity, including privileged users User creation and object creation and manipulation Gain visibility into all database activity involving sensitive data Who, what, when and how Real-time alerts for suspicious activity Integrate with business processes for audit compliance Dissemination of reports to appropriate personnel for signoff and review Retain reports and signoffs per audit requirements Dynamic Data (in use) 40
High level architecture of the InfoSphere Guardium DAM solution Network traffic sent to tamper-proof appliance by lightweight agent Heavy lifting done on appliance, not on data cluster Separation of duties Role-based GUI Security policies separate from DBA Audit data is accessible only through APIs and reports Reports/Ad hoc searches InfoSphere Guardium Collector Real-time alerts can be integrated with SIEM systems Clients Name node or routing node Data node S-TAPs Routing server or name node (where applicable)
Challenges of protecting the Hadoop stack User Interface Hackers Application MapReduce Hive Unvetted applications or ad hoc processes Oozie Storage HBase HDFS Privileged users
Hadoop example Select count(*) from page_click where Hive query:who ran it and on what object Breaks down into MapReduce Job Low level HDFS commands
Use case: What Map Reduce jobs are accessing the cluster? Do you know when new MapReduce jobs are running in the cluster? Do you know who is submitting them and from where? Are they accessing sensitive data? Now, reduce the noise by filtering out authorized jobs.
Use case: What Map Reduce jobs are accessing the cluster? Focus your resources on the unknown Unauthorized MapReduce jobs
Use case: What Map Reduce jobs are accessing the cluster? Audit process workflow and administrative automation Should this job be approved? Unauthorized Map Reduce Job List Report How can we automate the process of review, approval, and administration.?
Automate approval workflow Audit process workflow and administrative automation Business Owner approves or rejects Information Security confirms Business Owner recommendation Guardium Admin adds authorized jobs to authorized job list
Automate approval workflow Populate new vetted applications automatically
Automate approval workflow Populate new vetted applications automatically
MongoDB Example High Performance -Indexes -RAM Horizontally Scalable -Sharding Native API (Java, C#, C++, 9 more) Highly Available -Replica Sets Dynamic Document Data Model { customer_id : 1, first_name : "Mark", accounts : [ { account_num: 13, branch_id : 200, type : "Checking }, { account_num: 15, branch_id : 300, type : 401(k) co: 10gen } ] }
Sample activity db.creditcard.find() db.creditcard.update({"name":"sundari Voruganti"},{$set:{"product":"Platinum card"}},false, true) 51
Monitor users and roles Who created the user and when did they do it The full message details includes the actual command which shows you the roles 52
53 Automate review of roles
Text search for ad-hoc investigations Who dropped my collection?! 54
Integrate with QRadar to add data security insights to your security intelligence Security Devices In-depth data activity monitoring and security insights from InfoSphere Guardium Databases Data Warehouses Hadoop/NoSQL Big Data environments File shares Applications Servers & Hosts Network & Virtual Activity Database Activity Activity Application Activity Configuration Info Event Correlation Activity Baselining & Anomaly Detection Offense Identification Vulnerability Info Info Extensive Data Sources User Activity + = Deep Intelligence Specific vulnerability assessment for database infrastructure Exceptionally Accurate and Actionable Insight Send real-time data activity security alerts from Guardium to QRadar in LEEF format Send data activity audit reports (syslog) from Guardium to Q1 to enhance analytics Share database vulnerability findings (CVE) between Guardium and QRadar in AXIS or SCAP
InfoSphere Guardium protects sensitive data in Big Data and NoSQL environments Protect your sensitive data with real time activity monitoring Gain insights into activity throughout the Hadoop stack: Hive, MapReduce, HBase and HDFS Gain insight into NoSQL systems such as MongoDB, Cassandra, CouchDB Detect unauthorized applications or users Real time alerts reduce time to discovery for possible breach or infraction of compliance Automate compliance and management tasks Integrates with security information and event management systems for action and correlation analysis User Interface Application Storage Hive MapReduce Oozie HBase HDFS
Protect sensitive data wherever it lies DATABASES NEW Optim Archival D A T A B A S E Exadata InfoSphere BigInsights Master Data Management Data Stage CICS Siebel, PeopleSoft, E-Business FTP InfoSphere Guardium
Next steps? E-book NoSQL does not have to mean no security http://public.dhe.ibm.com/common/ssi/ecm/en/nib03019usen/nib03019us EN.PDF E-book Planning a security and auditing deployment for Hadoop http://www.ibm.com/software/swlibrary/en_us/detail/i804665j74548g31.html WW Sales leader: Dave Valovcin (dvalovcin@us.ibm.com)
Dziękuję gracias Traditional Chinese Polish Thai Spanish Merci French спасибо Russian Obrigado Brazilian Portuguese Danke German Arabic Tack Swedish Simplified Chinese Japanese Grazie Italian
Question and Answer Kathy Zeidenstein Technology Evangelist IBM InfoSphere Guardium 27 60
Open Panel with Audience Q&A Chris Diehl Principal and Co-Founder, The Data Guild Kathy Zeidenstein Technology Evangelist, IBM InfoSphere Guardium 61
Closing Remarks Thank you to our Sponsor Thank you to Citrix for donating this Webcast service Online Meetings Made Easy 62
CPE Credit Within 24 hours of the conclusion of this webcast, you will receive a link via email to a post Web Conference quiz. After the successful completion of the quiz you will be given an opportunity to PRINT a certificate of attendance to use for the submission of CPE credits. On-Demand Viewers Quiz Link: http://www.surveygizmo.com/s3/1385133/issa-web- Conference-Big-Data-Controlling-the-Perfect-Storm- September-24-2013 63