Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera Version: 103
Table of Contents Introduction 3 Importance of Security 3 Growing Pains 3 Security Requirements of the Enterprise 5 Security Capabilities and Hadoop 5 Guard the Perimeter: Authentication 5 Access Controls: Authorization 7 Gain Visibility: Data Governance for Hadoop 10 Data Protection 12 Compliance-Ready 15 Conclusion 15 About Cloudera 15 2
Abstract Enterprises are transforming data management with unified systems that combine distributed storage and computation at limitless scale, for any amount of data and any type, and with powerful, flexible analytics, from batch processing to interactive SQL to full-text search. Yet to realize their full potential, these enterprise data hubs require authentication, authorization, audit, and data protection controls. Cloudera makes the enterprise data hub a reality for information-driven business by delivering applications and frameworks for deploying, managing, and integrating the necessary data security controls demanded by today s regulatory environments and compliance policies. Cloudera s enterprise data hub provides comprehensive security and governance across the four pillars of security, including Cloudera Manager for centralized, automated authentication management; Apache Sentry for unified role-based authorization; Cloudera Navigator for comprehensive audit, lineage, and metadata management; and Navigator Encrypt and Navigator Key Trustee for transparent encryption and enterprisegrade key management. Introduction Information-driven enterprises recognize that the next generation of data management is a single, central system that stores all data in any form or amount and provides business users with a wide array of tools and applications for working with this data. This evolution in data management is the enterprise data hub, and Apache Hadoop is at its core. Yet as organizations increasing store data in this system, a growing portion is business-sensitive and subject to regulations and governance controls. This information requires that Hadoop provide strong capabilities and measures to ensure security and compliance. This paper aims to explore and summarize the common security approaches, tools, and practices found within Hadoop at large, and how Cloudera is uniquely positioned to provide, strengthen, and manage the data security required for an enterprise data hub. Importance of Security Data security is a top business and IT priority for most enterprises. Many organizations operate within a business environment awash in compliance regulations that dictate controls and protection of data. For example, retail merchants and banking service firms must adhere to the Payment Card Industry Data Security Standard (PCI DSS) to protect their clients and their transactions, services, and account information. Healthcare providers and insurers, as well as biotechnology and pharmaceutical firms involved in research, are subject to the Health Insurance Portability and Accountability Act (HIPAA) compliance standards for patient information. And US federal organizations and agencies must comply with the various information security mandates of the Federal Information Security Management Act of 2002 (FISMA). Organizations also have business mandates and objectives that establish internal information security standards to guard their data. For instance, firms may endeavor to maintain strict client privacy as a market differentiator and business asset. Others may view data security as a wall protecting their hard-earned intellectual property or politically sensitive information. These internal policies can also be seen as insurance against negative public relations and press that may come as a result of a breach or accidental exposure. Growing Pains Historically, early Hadoop developers did not prioritize data security, as the use of Hadoop during this time was limited to small, internal audiences and designed for workloads and data sets that were not deemed overly sensitive and subject to regulation. The resulting early controls were designed to address user error to protect against accidental deletion, for example and not to protect against misuse. For stronger protection, Hadoop relied on the surrounding security capabilities inherent to existing data management and networking infrastructure. As Hadoop matured, the various projects within the platform, such as HDFS, MapReduce, Oozie, and Pig, addressed their own particular security needs. As is often the case with distributed, open source efforts, these projects established different methods for configuring the same types of security controls. 3
Early Hadoop developers did not prioritize data security, as Hadoop s initial focus was on workloads and data sets not considered sensitive and subject to regulation. Security controls addressed user error, e.g. accidental deletion, and not malicious use. The data management environment and needs are vastly different today. The flexible, scalable, and cost-efficient storage capabilities of Hadoop now offer businesses the opportunity to acquire a greater range of data for analysis and the ability to retain this data for longer periods of time. This concentration of data assets makes Hadoop a natural target for those with malicious intent, but with the advances in data security and security management in Hadoop today, enterprises have robust and comprehensive capabilities to prevent intrusions and abuse. The maturity of Hadoop today also means many data paths into and out of this information source. The breadth of ways in which a business can work with data in Hadoop, from interactive SQL and full-text search to advanced machine-learning analysis, does offer organizations unprecedented technical capabilities and insights for their multiple business audiences and constituencies. Enterprise security standards demand examination of each pathway, and this poses challenges for IT teams, as some channels and methods may have the concept of fields, tables, and views, for example, while other channels might only address the same data as more granular files and directories. While all of these capabilities open up this data to more users with various frameworks to access and analyze the data, these users may also have varying degrees of permissions that must be enforced. Additionally, unified data storage means sensitive data is also being stored and must be properly protected, while still maintaining the flexibility for organizations to explore data more fully and realize the true value of their data. The powerful and far-reaching capabilities inherent in Hadoop also pose substantial challenges to organizations. While Hadoop stands as the core of the platform for big data, employing Hadoop as the foundation of an enterprise data hub requires that the data and processes of this platform meet the security requirements of the enterprise. Without such assurances, organizations simply cannot realize the full potential of and capitalize on a production-ready enterprise data hub for data management. PROCESS, ANALYZE, SERVE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH SOLR SDK Partners OPERATIONS Cloudera Manager Cloudera Director RESOURCE MANAGEMENT YARN FILESYSTEM HDFS UNIFIED SERVICES RELATIONAL Kudu STORE SECURITY Sentry, RecordService NoSQL HBase DATA MANAGEMENT Cloudera Navigator Encrypt and KeyTrustee Optimizer BATCH Sqoop INTEGRATE REAL-TIME Kafka, Flume 4
Employing Hadoop as the core of an enterprise data hub requires that it meet the security requirements of the enterprise. Without such assurances, organizations cannot realize the full potential of an enterprise data hub for data management. Security Requirements of the Enterprise How then can organizations provide these assurances for enterprise-grade data security? When evaluating security for Hadoop, it s important to look beyond just installing a firewall and evaluate the platform based on the four pillars of security: Due to the sheer scale of Hadoop in terms of data, users, and access frameworks, it s critical that security controls are both managed centrally and automated across multiple systems and data sets in order to provide consistency of configuration and application according to IT policies and to ease and streamline operations. This requirement is particularly critical for distributed systems where deployment and management events are common throughout the lifetime of the services and underlying infrastructure. Lastly, enterprises require that systems and data integrate with existing IT tools such as Kerberos, directories, like Microsoft s Active Directory or the Lightweight Directory Access Protocol (LDAP), and logging and governance infrastructures, like security information and event management (SIEM) tools. Without the ability to leverage and integrate with existing security tools, it can be difficult to implement a new system and resource intensive and repetitive to manage multiple tools. Security Capabilities and Hadoop Hadoop has matured dramatically in the past few years and this maturity means it is being used in production to power mission critical applications. That is why Cloudera has been focused on adding the necessary security and governance to both run Hadoop in production and realize the true value of data. Cloudera is the leader in comprehensive security and governance for Hadoop. Across the four pillars of security, Cloudera s solution preserves the flexibility of Hadoop while providing the compliance-ready security required for the enterprise. The remainder of this paper focuses on how Hadoop now provides enterprise-grade security and governance and how IT teams can support deployments in environments that face frequent regulatory compliance audits and governance mandates. To do so, it is helpful to explore in greater detail the technical underpinnings, design patterns, and practices manifested in Hadoop for the four common factors of enterprise security: Perimeter, Access, Visibility, and Data Protection. Guard the Perimeter: Authentication Authentication mitigates the risk of unauthorized usage of services, and the purpose of authentication in Hadoop, as in other systems, is simply to prove that a user or service is who they claim to be. 5
Typically, enterprises manage identities, profiles, and credentials through a single distributed system, such as a Lightweight Directory Access Protocol (LDAP) directory. LDAP authentication consists of straightforward username/password services backed by a variety of storage systems, ranging from file to database. Kerberos is a centralized network authentication system providing secure, single sign-on and data privacy for users and applications. Developed as a research project at the Massachusetts Institute of Technology (MIT) in the 1980s to tackle mutual authentication and encrypted communication across unsecured networks, Kerberos has become a critical part of any enterprise security plan. The Kerberos protocol offers multiple encryption options that satisfy virtually all current security standards, ranging from single DES and DIGEST-MD5 to RC4-HMAC and the latest NIST encryption standard, the Advanced Encryption Standard (AES). Learn more about Kerberos at http://web.mit.edu/kerberos/ Another common enterprise-grade authentication system is Kerberos. Kerberos provides strong security benefits including capabilities that render intercepted authentication packets unusable by an attacker. Kerberos virtually eliminates the threat of impersonation present in earlier versions of Hadoop and never sends a user s credentials in the clear over the network. Both authentication schemes are widely used for more than 20 years and many enterprise applications and frameworks leverage them at their foundation. For example, Microsoft s Active Directory (AD) is an LDAP directory that also provides Kerberos for additional authentication security. Virtually all of the components of the Hadoop ecosystem are converging to use Kerberos authentication with the option to manage and store credentials in LDAP or AD. Patterns and Practices Hadoop, given its distributed nature, has many touchpoints, services, and operational processes that require robust authentication in order for a cluster to operate securely. Hadoop provides critical facilities for accomplishing this goal. Perimeter Authentication The first common approach concerns secured access to the Hadoop cluster itself: access that is external to or on the perimeter of Hadoop. As mentioned above, there are two principle forms of external authentication commonly available within Hadoop: AD / LDAP and Kerberos. AD/LDAP and Kerberos are authentication staples within the enterprise. With AD/LDAP, you can manage users, groups, and services within the environment. Users provide username and password authentication when they login, and groups control what services users have access to. For example: if there s a group called HR, and there s a payroll system, users in group HR have access to payroll and users not in group HR don t have access to it. Best practices suggest that client-to-hadoop services use Kerberos for authentication, and, as mentioned earlier, virtually all ecosystem clients now support Kerberos. When Hadoop security is configured to utilize Kerberos, client applications authenticate to the edge of the cluster via Kerberos RPC, and web consoles utilize HTTP SPNEGO Kerberos authentication. Despite the different methods used by clients to handle Kerberos tickets, all leverage a common core Kerberos and associated central directory of users and groups, thus maintaining continuity of identification across all services of the cluster and beyond. By leveraging existing standards for authentication, Cloudera not only integrates with AD and Kerberos, but automates much of the tedious set-up process through Cloudera Manager. Active Directory has a built in Kerberos server and Cloudera Manager integrates directly into that eliminating the need to setup a separate Kerberos server. This allows users to continue to authenticate directly against AD, allows Hadoop services to be defined directly in AD as part of overall service management layer, and controls user access to Hadoop services via AD groups. In addition, Cloudera Manager automates the process of Kerberizing the cluster. Through a wizard, authorized users configure Kerberos for individual servers (with recommended defaults) and trigger an automated workflow to secure their cluster. Cloudera Manager can also be used to tune the configurations between the Kerberos server for the cluster 6
and the primary Kerberos service, as well as monitor services for them all. Finally, Cloudera Manager manages and deploys Kerberos client configs and has Hadoop SSL related configs. To help further aid IT teams with access integration, Cloudera updated Hue and Cloudera Manager to offer additional Single Sign-On (SSO) options by employing Security Assertion Markup Language (SAML) for authentication. The Security Assertion Markup Language (SAML) is an XML-based framework for communicating user authentication, entitlement, and attribute information By defining standardized mechanisms for the communication of security and identity information between business partners, SAML makes federated identity, and the crossdomain transactions that it enables, a reality. Organization for the Advancement of Structured Information Standards (OASIS) Learn more about SAML at http://saml.xml.org/ Cluster and RPC Authentication Virtually all intra-hadoop services mutually authenticate each other using Kerberos RPC. These internal checks prevent rogue services from introducing themselves into cluster activities via impersonation and subsequently capturing potentially critical data as a member of a distributed job or service. Distributed User Accounts Many Hadoop projects operate on data sets within HDFS, and this activity requires that a given user s account exist on all processing and storage nodes (NameNode, DataNode, JobTracker/Application Master, and TaskTracker/NodeManager) within a cluster in order to validate access rights and provide resource segmentation. The host operating system provides the mechanics to ensure user account propagation - for example, Pluggable Authentication Manager (PAM) or System Security Services Daemon (SSSD) on Linux hosts - allowing user accounts to be centrally managed via LDAP or AD, which facilitates the required process isolation. (See Authorization: Process Isolation ) Access Controls: Authorization Authorization is concerned with who or what has access or control over a given resource or service. Hadoop merges together the capabilities of multiple, varied, and previously separate IT systems, and then adds a wide variety of open source and commercial software tools into the mix. These different methods of accessing data deal with it at multiple levels of granularities. The challenge for Hadoop administrators is to be able to provide a single set of authorization policies that can operate across all of this infrastructure. For example, common data flows in both Hadoop and traditional data management environments require different authorization controls at each stage of processing. First, a commercial ETL tool may gather data from a variety of sources and formats, ingesting that data into a common repository. During this stage, an ETL administrator has full access to view all elements of input data in order to verify formats and configure data parsing. Then the processed data is exposed to business intelligence (BI) tools through a SQL interface to the common repository. End-users of the BI tools have varying levels of access to the data according to their assigned roles. Finally, some users query this same data through a commercial GUI-based tool which runs MapReduce jobs. Apache Sentry enables policies to be created that are enforced across all of these access methods. Patterns and Practices Authorization for data access in Hadoop typically manifests in three forms: POSIX-style permissions on files and directories, Access Control Lists (ACL) for management of services and resources, and Role-Based Access Control (RBAC) for certain services with advanced access controls to data. Authorization across different components of Hadoop can increasingly be performed within a single tool - Apache Sentry. For example, data can now be shared between Hive and other jobs such as MapReduce, Apache Spark, and Apache Pig, accessing that same data directly through HDFS - while only setting a single set of permissions for that data. 7
Process Isolation Hadoop, as a shared services environment, relies on the isolation of computing processes from one another within the cluster, thus providing strict assurance that each process and user has explicit authorization to a given set of resources. This is particularly important within MapReduce, as the Tasks of a given Job can execute Unix processes (i.e. MR Streaming), individual Java VMs, and even arbitrary code on a host server. In frameworks like MapReduce, the Task code is executed on the host server using the Job owner s UID, thus providing reliable process isolation and resource segmentation at the OS level of the Hadoop cluster. Role-Based Access Control (RBAC) Across Hadoop Central management of access policies is key due to the number of access paths and users, so users can access the data needed to do their job. Cloudera 5.x delivers this unified authorization with Apache Sentry (incubating). Sentry is an open source project that provides unified authorization via fine-grained role-based access controls (RBAC) for Impala, Apache Hive, Apache Spark, MapReduce, Apache Pig, and Search. Apache Sentry has quickly emerged as the open standard for authorization in Hadoop, with a broad set of contributions for sustained engineering quality, multi-vendor support for portability, and third-party integrations for compatibility all resulting in wide industry adoption across even the most regulated verticals. Figure 3 Figure 3 provides an overview of how Sentry authorization works. Here we have a Sentry Role (Fraud Analyst Role) and this role has permissions for this example, read access to all transaction data, though permissions can range from access to an entire database down to a specific set of columns and a particular set of rows. As defined with the perimeter security, there s also a group in AD called Fraud Analysts that is connected to the Sentry Fraud Analyst Role. Sam Smith, a member of this AD group, inherits this Sentry role and corresponding permissions. Figure 4 8
Sentry can be configured to use AD to determine a user s group assignment so any changes to group assignment in AD are automatically picked up by Sentry, resulting in updated Sentry role assignments. This eliminates manual mirroring between the two and allows organizations to continue to leverage existing AD tools and skillsets. For managing Sentry policies, Cloudera offers a UI for getting all this done in HUE. Users can go in and select TABLE or DATABASE, define roles and permissions, or create group associations - all within a single visual interface. Figure 5 HDFS Permissions Many services within the Hadoop ecosystem, from client applications like the CLI shell to tools written to use the Hadoop API, directly accessing data stored within HDFS. HDFS uses POSIX-style permissions for directories, files, and HDFS ACLs; each directory and file can be assigned multiple users and groups. Each assignment has a basic set of permissions available; file permissions are simply read, write, and execute, and directories have an additional permission to determine access to child directories. HDFS permissions/acls are sufficient when file-level granularity is good enough and when data doesn t need to be shared with SQL components like Hive and Impala. Sentry integration with HDFS enables customers to easily share data between Hive, Impala, and all the other Hadoop components that interact with HDFS (MapReduce, Spark, Pig, Sqoop, and so on) while ensuring that user access permissions only need to be set once, and that they are uniformly enforced. For example, users that have SELECT access to a Hive table will automatically have read access to all of the table s underlying files. Access Control Lists Hadoop also maintains general access controls for the services themselves in addition to the data within each service and in HDFS. Service access control lists (ACL) range from NameNode access to client-to-datanode communication. In the context of MapReduce and YARN, user and group identifiers form the basis for determining permission for job submission or modification. Apache HBase also uses ACLs for data-level authorization. HBase ACLs authorize various operations (READ, WRITE, CREATE, ADMIN) by column, column family, column family qualifier, and down to cell-level. HBase ACLs are granted and revoked to both users and groups. 9
Gain Visibility: Data Governance for Hadoop For the data in the cluster, it is critical to understand where the data is coming from and how it s being used. This is the classic question of audit. The goal of auditing is to capture a complete and immutable record of all activity within a system. Auditing plays a central role in three key activities within the enterprise. First, auditing is part of a system s security regime and can explain what happened, when, and to whom or what in case of a breach or other malicious intent. For example, if a rogue administrator deletes a user s data set, auditing provides the details of this action, and the correct data may be retrieved from backup. The second activity is compliance, and auditing participates in satisfying the core requirements of regulations associated with sensitive or personally identifiable data (PII), such as the Health Insurance Portability and Accountability Act (HIPAA) or the Payment Card Industry (PCI) Data Security Standard. Auditing provides the touchpoints necessary to construct the trail of who, how, when, and how often data is produced, viewed, and manipulated. Lastly, auditing provides the historical data and context for data forensics. Audit information leads to the understanding of how various populations use different data sets and can help establish the access patterns of these data sets. This examination, such as trend analysis, is broader in scope than compliance and can assist content and system owners in their data optimization and ROI efforts. The risks facing auditing are the reliable, timely, and tamper-proof capture of all activity, including administrative actions. Until recently, the native Hadoop ecosystem has relied primarily on using log files. Log files are unacceptable for most audit use cases in the enterprise as real-time monitoring is impossible, and log mechanics can be unreliable - a system crash before or during a write commit can compromise integrity and lose data. Other alternatives have included application-specific databases and other single-purpose data stores, and yet these approaches fail to capture all activity across the entire cluster. Cloudera Navigator is the only native end-to-end governance solution for Apache Hadoop. It provides administrators, analysts, and data scientists alike with unified visibility into the vast, diverse data being stored in Hadoop. Patterns and Practices Cloudera Navigator is a core part of Cloudera s platform and is critical for meeting compliance requirements. Key features include: Comprehensive Auditing Cloudera Navigator maintains a full audit history for HDFS, Impala, Hive, HBase, and Sentry, all in a single place. It tracks every access attempt, right down to the user ID, IP address, and full query text. This provides visibility into who has been accessing what data, ability to see point-in-time permissions and how they have changed (leveraging Sentry), and review and verify HDFS permissions. Cloudera Navigator also provides out-of-the-box integration with leading enterprise metadata, lineage, and SIEM applications. 10
Figure 6 Metadata Management For metadata and discovery, Cloudera Navigator features complete metadata storage. First, it consolidates the technical metadata for all data inside Hadoop into a single, searchable interface and allows for automatic tagging of data based on the external sources entering the cluster. For example, if there is an external ETL process, data can be automatically tagged as such when it enters Hadoop. Second, it supports user-based tagging to augment files, tables, and individual columns with custom business context, tags, and key/value pairs. Combined, this allows data to be easily discovered, classified, and located to not only support governance and compliance, but also user discovery within Hadoop. Cloudera Navigator also includes metadata policy management that can trigger actions (such as the autoclassification of metadata) for specific datasets based on arrival or scheduled intervals. This allows users to easily set, monitor, and enforce data management policies, while also integrating with common third-party tools. Lineage Cloudera Navigator provides an automatic collection and easy visualization of upstream and downstream data lineage to verify reliability. For each data source, it shows, down to the column-level within that data source, what the precise upstream data sources were, the transforms performed to produce it, and the impact that data has on downstream artifacts. Figure 7 11
Data Protection The goal of data protection is to ensure that only authorized users can view, use, and contribute to a data set. These security controls not only add another layer of protection against potential threats by end-users, but also thwart attacks from administrators and malicious actors on the network or in the data center. The means to achieving this goal consists of two elements: protecting data when it is persisted to disk or other storage mediums, commonly called data-at-rest, and protecting data while it moves from one process or system to another, or data in transit. Several common compliance regulations call for data protection in addition to the other security controls discussed in this paper. Cloudera provides security for data in transit, through TLS and other mechanisms - centrally deployed through Cloudera Manager. Transparent data-at-rest protection is provided through the combination of HDFS encryption, Navigator Encrypt, and Navigator Key Trustee. Patterns and Practices Hadoop, as a central data hub for many kinds of data within an organization, naturally has different needs for data protection depending on the audience and processes while using a given data set. Only Cloudera provides enterprise-grade encryption and key management. CDH provides transparent encryption, ensuring that all sensitive data is encrypted before being stored on disk. This capability, together with enterprise-grade encryption key management with Navigator Key Trustee, delivers the necessary protection to meet regulatory compliance for most enterprises. HDFS Encryption together with Navigator Encrypt (available with Cloudera Enterprise) provides transparent encryption for Hadoop, for both data and metadata. These solutions automatically encrypt data while the cluster continues to run as usual, with a very low performance impact. It is massively scalable, allowing encryption to happen in parallel against all the data nodes - as the cluster grows, encryption grows with it. Additionally, Cloudera s transparent encryption is optimized for the Intel chipset for high performance. Intel chipsets include AES-NI co-processors, which provide special capabilities that make encryption workloads run extremely fast. Not all AES-NI is equal, and Cloudera is able to take advantage of the latest Intel advances for even faster performance. Additionally, HDFS Encryption and Navigator Encrypt feature separation of duties, preventing even IT Administrators and root users from accessing data that they are not authorized to see. MANAGER NAVIGATOR IMPALA HIVE SENTRY HDFS HBASE Log/Config/ Spill Files Metadata Store NAVIGATOR KEY TRUSTEE HDFS Encryption Navigator Encrypt HSM Navigator Key Trustee Figure 8 12
Occasionally, if an organization s security mandates going beyond compliance, they may choose to implement additional controls in addition to encryption, such as static data masking and tokenization. Cloudera s partners provide these two options. The chart below compares these different approaches, showing how transparent encryption is used as the baseline protection for compliance, while the other options may by added for particular use cases. Data Types Usage Implementation Tokenization Credit Card Number SSN PCT-DSS scope reduction Intercept data flows at applications Make code changes to call tokenization APIs Static Data Masking Credit Card Number SSN Name Address Generate test data Prepare data sets being sent to third parties Identify location of all fields to be masked Data sitting in the clear until masking batch job runs Transparent Encryption Data & metadata: All data types All data formats (structured, unstructured, discovered, undiscovered) Ensure no sensitive data ever exists on disk Meet compliance requirements No integration needed Enable through Cloudera Manager Key Management Encryption requires encryption keys. Key generation, storage, and access all need to be carefully managed to avoid a security breach or data loss. For example, keys must never be stored with the data, so that encrypted data remains useless if stolen. Cloudera Navigator Key Trustee provides enterprise-grade key management for securing encryption keys and other Hadoop security artifacts. Navigator Key Trustee is a software-based key management system that provides key management for Navigator Encrypt. In order for Hadoop to meet compliance regulations, the encryption keys need to be stored on a separate, dedicated machine, out of the cluster and away from the data which is what Navigator Key Trustee provides. It also integrates with leading Hardware Security Modules (HSMs), so Navigator Key Trustee can use the HSMs as the root of trust for all cryptography within the environment. This also means all key management policies built around the HSM will continue to be executed within the HSM. While the primary use case for Navigator Key Trustee is to store encryption keys, it is essentially a centralized, virtual safe deposit box that can store any sensitive security related artifact within the Hadoop architecture. 13
Compliance-Ready For many industries, storing sensitive data also means complying with various regulatory requirements, such as PCI for storing credit card data and HIPAA for storing patient health records. These regulations not only mean that the system storing this data must achieve compliance, but any system that integrates with them must as well. Cloudera is the first and only Hadoop platform to achieve PCI compliance. Leveraging the four pillars of security, Cloudera s platform is well-equipped to pass technology reviews for most common compliance regulations and has experts on staff to help users through the rest of the process. Conclusion Next-generation data management - the enterprise data hub - requires authentication, authorization, governance, and data protection controls in order to establish a place to store and operate on all data within the enterprise, from batch processing to interactive SQL to advanced analytics. Hadoop is at its foundation, yet the core elements of Hadoop are only part of a complete enterprise solution. As more data and more variety of data is moved to the enterprise data hub, the more likely this data will be subject to compliance and security mandates. Comprehensive and integrated security within Hadoop, then, is a keystone to establishing the new center of data management. Cloudera s enterprise data hub has built-in security that s comprehensive, transparent, and compliance-ready. Across the four pillars, Cloudera offers a set of security and governance capabilities that s unmatched within the Hadoop environment and ecosystem. About Cloudera Cloudera delivers the modern platform for data management and analytics. The world s leading organizations trust Cloudera to help solve their most challenging business problems with Cloudera Enterprise, the fastest, easiest, and most secure data platform built on Apache Hadoop. Our customers can efficiently capture, store, process, and analyze vast amounts of data, empowering them to use advanced analytics to drive business decisions quickly, flexibly, and at lower cost than has been possible before. To ensure our customers are successful, we offer comprehensive support, training, and professional services. Learn more at cloudera.com. cloudera.com 1-888-789-1488 or 1-650-362-0488 Cloudera, Inc. 1001 Page Mill Road, Palo Alto, CA 94304, USA 2015 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice.