DISCOVERING AND SECURING SENSITIVE DATA IN HADOOP DATA STORES

Similar documents
IBM Policy Assessment and Compliance

Data Security: Fight Insider Threats & Protect Your Sensitive Data

Protecting Patient Data in the Cloud With DLP An Executive Whitepaper

Why Add Data Masking to Your IBM DB2 Application Environment

How to Secure Your SharePoint Deployment

SecureGRC TM - Cloud based SaaS

Effective Data Governance

Microsoft Big Data. Solution Brief

IBM Software Four steps to a proactive big data security and privacy strategy

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Key Steps to Meeting PCI DSS 2.0 Requirements Using Sensitive Data Discovery and Masking

IBM Software Top tips for securing big data environments

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:

Building Confidence in Big Data Innovations in Information Integration & Governance for Big Data

How to Run a Successful Big Data POC in 6 Weeks

Protecting Regulated Information in Cloud Storage with DLP

Annex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

IBM BigInsights for Apache Hadoop

Ganzheitliches Datenmanagement

White Paper: Datameer s User-Focused Big Data Solutions

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

Protecting Data-at-Rest with SecureZIP for DLP

Why Big Data Analytics?

RSA Data Loss Prevention (DLP) Understand business risk and mitigate it effectively

Protecting Sensitive Data Reducing Risk with Oracle Database Security

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

Data Refinery with Big Data Aspects

White paper. The Big Data Security Gap: Protecting the Hadoop Cluster

How to avoid building a data swamp

Chapter 1. Contrasting traditional and visual analytics approaches

SharePoint Governance & Security: Where to Start

Getting Started & Successful with Big Data

Fortify. Securing Your Entire Software Portfolio

BIG Data Analytics Move to Competitive Advantage

Secure Data Across Application Landscapes: On Premise, Offsite & In the Cloud REINVENTING DATA MASKING WHITE PAPER

IBM InfoSphere Optim Data Masking solution

Department of Technology Services

HiSoftware Policy Sheriff. SP HiSoftware Security Sheriff SP. Content-aware. Compliance and Security Solutions for. Microsoft SharePoint

Securing NoSQL Clusters

OpenText Output Transformation Server

Data Loss Prevention Best Practices to comply with PCI-DSS An Executive Guide

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Websense Data Security Suite and Cyber-Ark Inter-Business Vault. The Power of Integration

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Fast, Low-Overhead Encryption for Apache Hadoop*

EDITION CLOUD REPORT HEALTHCARE AND LIFE SCIENCES LEAD IN FINDING AND PREVENTING SENSITIVE DATA LOSS

Driving Company Security is Challenging. Centralized Management Makes it Simple.

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Data Governance and Big Data - A Necessary Convergence. Richard Goldberg Chief Data Governance Officer Citibank Global Consumer Bank

Introduction to Management Information Systems

Test Data Management for Security and Compliance

WHITE PAPER SPLUNK SOFTWARE AS A SIEM

DATA VISUALIZATION: When Data Speaks Business PRODUCT ANALYSIS REPORT IBM COGNOS BUSINESS INTELLIGENCE. Technology Evaluation Centers

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

White paper. Five Key Considerations for Selecting a Data Loss Prevention Solution

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Data Solutions with Hadoop

WAYS TO CUT SPENDING ON MICROSOFT SOFTWARE. Reduce licensing costs by up to 30% with Software Asset Management

IBM Software InfoSphere Guardium. Planning a data security and auditing deployment for Hadoop

Securing and protecting the organization s most sensitive data

Case Study: CMS Data-Sharing Project Highlights the Benefits of a Multiplatform Approach

Outbound Security and Content Compliance in Today s Enterprise, 2005

Experience studies data management How to generate valuable analytics with improved data processes

Redbooks. Enhancing Enterprise Systems with Big Data. Taking advantage of big data. Highlights. Veracity Trusted & Valuable

Raul F. Chong Senior program manager Big data, DB2, and Cloud IM Cloud Computing Center of Competence - IBM Toronto Lab, Canada

Big Data at Cloud Scale

The Value of Vulnerability Management*

Governance, Risk, and Compliance (GRC) White Paper

Cisco Advanced Malware Protection for Endpoints

Application Security Center overview

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

Big data: Unlocking strategic dimensions

Data Virtualization Usage Patterns for Business Intelligence/ Data Warehouse Architectures

Easy CramBible Lab DEMO ONLY VERSION. ** Single-user License ** This copy can be only used by yourself for educational purposes

Three Open Blueprints For Big Data Success

The Netskope Active Platform

LOG MANAGEMENT AND SIEM FOR SECURITY AND COMPLIANCE

OIT Cloud Strategy 2011 Enabling Technology Solutions Efficiently, Effectively, and Elegantly

LOG AND EVENT MANAGEMENT FOR SECURITY AND COMPLIANCE

Protection & Compliance are you capturing what s going on? Alistair Holmes. Senior Systems Consultant

Security management solutions White paper. IBM Tivoli and Consul: Facilitating security audit and compliance for heterogeneous environments.

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

How can Content Aware Identity and Access Management give me the control I need to confidently move my business forward?

SQL Server 2012 Parallel Data Warehouse. Solution Brief

Integrated Social and Enterprise Data = Enhanced Analytics

CA Service Desk Manager

FIVE KEY CONSIDERATIONS FOR ENABLING PRIVACY IN HEALTH INFORMATION EXCHANGES

Data Masking: A baseline data security measure

Data Doesn t Communicate Itself Using Visualization to Tell Better Stories

INTRODUCTION TO CASSANDRA

Hitachi Cloud Services for Private File Tiering. Low Risk Cloud at Your Own Pace. The Hitachi Vision on Cloud

Actian SQL in Hadoop Buyer s Guide

locuz.com Big Data Services

Maintaining PCI-DSS compliance. Daniele Bertolotti Antonio Ricci

INFO What are business processes? How are they related to information systems?

CONTINUOUS DIAGNOSTICS BEGINS WITH REDSEAL

Transcription:

DATAGUISE WHITE PAPER SECURING HADOOP: DISCOVERING AND SECURING SENSITIVE DATA IN HADOOP DATA STORES OVERVIEW: The rapid expansion of corporate data being transferred or collected and stored in Hadoop HDFS is creating a critical problem for Chief Information Security Officers, compliance professionals, and IT staff responsible for data management and security. Frequently, the people responsible for securing corporate data are not even aware that Hadoop has been installed and in use within the company. Dataguise DG for Hadoop scans data stores, locates sensitive content and then applies remedies such as data masking and encryption to ensure compliance with industry regulations, such as HIPAA and PCI, as well as internal corporate data governance policies.

2 BIG DATA EXPLOSION Petabytes of data - structured, semi-structured and unstructured - are accumulating and propagating across your business. A good portion of this data comes from external sources and from customer interaction channels such as web sites, call center records, log files, Facebook, and Twitter. To mine these large volumes and varieties of data in a cost efficient way, companies are adopting new technologies such as Hadoop. TRADITIONAL DATA WAREHOUSES What about traditional data warehouses? While they offer many advantages for decision support, the drawbacks of traditional data warehouses are that they are hugely expensive and require that the schema be decided well in advance; taking away the flexibility of deciding how to slice and dice data as new methods of analysis emerge. Because Hadoop can be set up and expanded rapidly using commodity hardware, and schema may be defined at the time of reading the data, it is becoming the new platform of choice for processing and analyzing big data. BUT WE DON T HAVE HADOOP Chief Information Security Officers (CISOs), CIOs and others involved in corporate information security will often say that their organization does not have any Hadoop clusters. They rely on processes in place to ensure that any software installed in the enterprise has gone through an extensive approval and procurement process before it is implemented. They are therefore often surprised to learn that Hadoop is already installed and running and this happens because Hadoop is a free download and is available directly from the Apache website or from one of the leading distributors of Hadoop such as Cloudera, MapR, IBM (InfoSphere BigInsights), and Hortonworks. It is very simple for any number of employees to create a Hadoop installation and be up and running very quickly. Even if Hadoop is only being used in a sandbox (isolated from the production systems) or in a test environment, corporate data being stored in Hadoop must still adhere to the same rigorous corporate standards in place for the data infrastructure or the company risks the consequences of failing a compliance audit, or even worse a data breach. FINDING SENSITIVE DATA IN HADOOP Sensitive data includes items such as taxpayer IDs, employee names, addresses, credit card numbers, and many more. Data theft prevention, sound governance practices and the need to satisfy THE AGE OF BIG DATA HAS ARRIVED. Even companies that think they don t have Hadoop are surprised to learn it has been downloaded and is in use with sensitive corporate data. DO YOU KNOW WHERE YOUR RISK EXPOSURES ARE?

3 compliance requirements for industry regulations such as PCI, HIPAA, and PII make it imperative that organizations implement the necessary processes to identify and protect sensitive information. The first step in securing Hadoop is to search for and locate sensitive information, and determine the volume and types of data that are at risk. The challenge is that many search products are designed to work only with structured data using basic regular expressions. Scanning data in Hadoop requires a sophisticated discovery tool that can scan large volumes of both structured and unstructured data, and do it rapidly. THE NEED TO PROTECT SENSITIVE DATA Once it has been determined that Hadoop is in the corporate infrastructure and contains sensitive information, CIOs and CISOs should be very nervous about potential exposure. Because Hadoop has been mostly used in Social Media companies so far, and only now is being used by financial, health-care, and other security-conscious enterprises, options for data protection in Hadoop have been limited. Whereas there are numerous options for legacy databases and structured data stores, Hadoop poses a new challenge for companies that need to maintain compliance. The same type of strong protection in use for traditional data stores is needed for the Hadoop environment, as well. CHOOSE MASKING OR ENCRYPTION Whether sensitive data was stored in Hadoop intentionally or unintentionally, once it is discovered and documented there are two main approaches to remediation: encryption and masking. Encryption is typically used when access to the sensitive content is needed for analytical purposes. The encrypted data can be decrypted by an authorized user at the time of use. Masking is used to protect sensitive data when there is no need for the actual sensitive content, as masking replaces sensitive data with realistic (but not real) data. Optionally, consistency may be maintained to retain the statistical distribution of data. Although there are some similarities between data masking and encryption, they are different in usage, technology and deployment strategies. Encryption can conceal private data and decrypt it based on encryption keys. Data masking on the other hand conceals private data but masked values cannot be reversed. THE DATAGUISE SOLUTION Dataguise specializes in sensitive data protection in large repositories. We began with relational databases, expanded to shared file systems "It doesn't take a clairvoyant or in this case, a research analyst to see that 'big data' is becoming (if it isn't already, perhaps) a major buzzword in security circles. Much of the securing of big data will need to be handled by thoroughly understanding the data and its usage patterns. Having the ability to identify, control access to, and where possible mask sensitive data in big data environments based on policy is an important part of the overall approach." Ramon Krikken Research VP, Security and Risk Management Strategies Analyst Gartner DO YOU HAVE THE TOOLS TO PROTECT YOUR DATA?

4 and Microsoft SharePoint, and now we are bringing our enterprise-class expertise to secure Hadoop. Bringing together experienced technology professionals from database, security, and search specialties, we combine the best of these disciplines to secure Hadoop in the enterprise. Dataguise s core product for Hadoop DISCOVER DG for Hadoop combines sensitive data discovery, user and event 4R reports, and options for both encryption and masking to provide the most comprehensive data security solution in the market today. INTRODUCING DGSECURE The purpose of DG for Hadoop is simple yet crucial to detect and protect sensitive data in Hadoop implementations. As part of the Dataguise DgSecure product line, DG for Hadoop is the ideal solution to help ensure that compliance standards are met while reaping the benefits of using Hadoop to manage large amounts of structured and unstructured data. To accommodate various usage patterns, DG for Hadoop supports detection and protection at the source before moving data to Hadoop, in flight while moving data to Hadoop, and at rest after moving unprotected data to HDFS. Just in time protection is provided through incremental scan of newly added data in HDFS. Once sensitive data is located and identified, either masking or encryption can be chosen as a protection method, based on specific requirements of the organization or the purpose of storing and managing the data. HOW IT WORKS DG for Hadoop gives the Chief Security office and other entities that have the responsibility of conforming to industry regulations and Corporate Governance, Risk, and Compliance (GRC), the ability to define policies. These policies define what data is considered sensitive, based on a combination of pre-built data types and custom data types that the user can add. The policies also allow the sensitive data types to be grouped to be in alignment with regulations, and allow remedial actions to be specified. This provides guidance for those handling the data on what to do. All of the details of the data repositories and the actions taken are fully logged, so, in the backend of this process, the Chief Security office and others can track risk profiles through the dashboard which also provides actionable details and use reports to audit actions to ensure that the right people have the right access to the right data. The first step is to search for any sensitive data in Hadoop data stores located on premises or in the cloud. A user operating under the corporate policy DEFINE POLICY guidance (PCI, PII, HIPAA etc.) creates a task definition against one or more files and directories or combinations of them and executes the job. DG for Hadoop scans all the targeted data stores to find data that meet those criteria and take appropriate, task-specified remediation actions. Additional search features: Custom data types add custom expressions to augment the built-in capabilities Columnar searches to allow for searches of structured data Incremental scans to quickly search only the new data that came in since the last scan

5 ANALYZE After collecting the information about sensitive data, DG for Hadoop delivers users risk assessment analytics in the form of easy-to-interpret graphical summaries. Users can then evaluate their compliance exposure profiles and decide on the most appropriate remediation policies to implement. REMEDIATION DG for Hadoop provides three main options for remediation 1) Notification (Search only) whenever new data have been ingested into Hadoop, DG for Hadoop processes the content and informs the designated users of the presence of sensitive data 2) Search and mask As part of locating sensitive data, masking can also be executed based on a predefined policy in-flight as data gets into Hadoop or within in Hadoop once the data is there 3) Search and Encrypt As part of locating sensitive data, DG for Hadoop can optionally encrypt the entire row or just the specific fields in-flight or in Hadoop HDFS DASHBOARD AND REPORTING REPORT DG for Hadoop provides top level summary (directory), and in-depth (file level) detailed information about where sensitive content resides, and remediation method(s) applied, highlighting the gaps in protection and providing actionable data for appropriate follow ups. DG FOR HADOOP SOLUTION FOR ALL USAGE PATTERNS

6 BENEFITS DG for Hadoop provides unique and important solution for enabling data security in Hadoop. Tangible benefits include the ability to conform to regulatory requirements, prevent the risk of failing a compliance audit, and ensuring that valuable corporate data is safe from security breaches. Implementing DG for Hadoop enables organizations to: Simplify Data Compliance Management in Hadoop Eliminates the need to build custom applications or patch together disparate tools to search for and protect sensitive data. Improve Operational Efficiencies Less staff time is required to administer custom Hadoop security controls, custom reports or to move data to databases or other data stores for remediation. Reduce Regulatory Compliance Costs One tool can now take care of tasks that previously took multiple software products and costly consulting hours to achieve. Automate Compliance Assessment and Enforcement DG Hadoop dashboard and reports summarize sensitive data content with actionable details of exposure risks. CONCLUSION Protecting sensitive data in Hadoop is critical as volumes of data continue to expand in the enterprise and Hadoop continues to become technology of choice at increasing rate. Effective data protection strategy must start with finding all of the sensitive data, putting in place the proper remediation policies, and monitoring data flow to ensure that the established procedures are followed. DG for Hadoop is the leading solution to secure Hadoop effectively, quickly and to ensure adherence to sound data governance practices across the entire big data environment. ABOUT DATAGUISE Dataguise helps organizations safely leverage their enterprise data with a comprehensive risk-based data protection solution. By automatically locating sensitive data, transparently protecting it with high performance masking, encryption or quarantine, and providing enterprise security intelligence to managers responsible for regulatory compliance and governance, Dataguise improves data risk management and operational efficiencies while reducing regulatory compliance costs. For more information, call 510-824-1036 or visit www.dataguise.com. Dataguise, Inc. 2012. All rights reserved.