Enabling Big Data by Removing Security and Compliance Barriers

Size: px
Start display at page:

Download "Enabling Big Data by Removing Security and Compliance Barriers"

Transcription

1 Enabling Big Data by Removing Security and Compliance Barriers A SANS Survey Written by Barbara Filkins Advisor: John Pescatore April 2015 Sponsored by Cloudera 2015 SANS Institute

2 Executive Summary Stage of Big Data Implementations 28 % The rewards that big data can bring are widely recognized: scientific insight, competitive intelligence and improved fraud detection, as well as the benefits derived from sophisticated analyses of vast sets of transactional and behavioral data. As a result of easier-to-implement programs that can scrape this data for analysis, big data analysis is taking off in a variety of businesses, according to the results of a recently conducted SANS survey on the security of big data systems in the enterprise. In it, 55% of more than 200 respondents to take the survey were in pilot, proof of concept or full production of big data systems, while another 28% are planning such implementations. have big data applications in development 27 % 28 % have big data applications in production IDC has predicted that revenue from big data products would reach $16.55 billion during 2014 and rise at more than 26% per year, to reach $41.52 billion during The types of information being processed through these big data applications make them particularly important to secure and ensure compliance with applicable standards. In this SANS survey on big data security, 73% of companies with big data applications use them to store personal data on customers, and 72% store important business data. Respondents also report that, because of the distributed nature of big data systems, they re protecting those sensitive data types by putting security closer to the data focusing on access controls, encryption and other safety measures. They are also looking for more guidance about the risks, compliance and security processes to deploy around big data systems. plan big data applications in the next two years Sensitive Information in Big Data Implementations 2 In the context of these results, this paper explains how to develop an appropriate data-centric security and governance program to support big data architectures in a secure manner from the start. 73 % contain personally identifiable information (PII) 72 % contain business information 83 % must meet some level of compliance 1 Worldwide Big Data Technology and Services Forecast, 2 These statistics do not add up to 100% because multiple responses were allowed. Please read the complete report for additional information. 1

3 Big Data Realized The 5 V s that Define Complex Data Big data is about not just the volume of data, but also the complexity of data. As such, big data is usually defined using several dimensions: 4 Volume (data at rest) Terabytes to petabytes (1K TBs) to zettabytes (1B TBs) and beyond. Velocity (data in motion) Rate of change in data in relation to the window of analysis or how quickly the data can be made available for analysis, such as streaming data with only milliseconds to analyze, detect and respond to an event. Variety (data in many forms) Different types of information that can be collected, even by one organization, such as transactions, video/audio, text and log files. Veracity (data in doubt) Data integrity and how to establish trust in the data to confidently use it to make crucial decisions. Uncertainty/variability can stem from data inconsistency and incompleteness, ambiguities, latency, deception or model approximations. Big data has proved its value in a number of business applications, and the systems used to collect, collate, store and analyze this data continue to improve. At the same time, the amount of data organizations need to analyze is growing exponentially, as shown in Figure 1, requiring organizations to take a closer look at how they can make decisions efficiently and intelligently. Who s Adopting Figure 1. The Projected Growth of Big Data 3 From this survey s results, it s clear that big data is moving to prime time for the majority of the organizations represented by the 206 respondents who qualified to participate. Organizations with workforces over 10,000 make up 43% of the respondents that are actively implementing a big data application, clearly indicating that larger entities have the need, as well as the resources, to invest in big data initiatives. Value (type and value of data) Different In keeping with the SANS membership base, 80% of the survey respondents types of data have different values to both work in technical roles within IT and have visibility into their big data the company and bad actors. implementations (or lack thereof). Another 52% have job titles directly related to security; 28% hold titles indicating a variety of technical and managerial roles. The remaining 20% include business unit managers, application owners and executive management, as well as people with titles indicating specialties in compliance, incident response and forensics, and application development Although there is no one reference for the V s, the first acknowledgment of them was by Doug Laney: META Group. 3D Data Management: Controlling Data Volume, Velocity, and Variety. February

4 Big Data Realized (CONTINUED) Government, finance and IT organizations were the top three industries to collect and utilize big data for business intelligence, according to the survey. Table 1 presents the top five industries that are in the development or implementation phase of their big data initiative. The remaining 36% include firms involved in education (6%), manufacturing (5%), utilities (4%) and transportation (4%). Table 1. Top Five Industries Represented Industry Government Banking and finance Information technology Telecommunications Health care Response Percent 17.1% 15.3% 13.5% 9.9% 8.1% Different industries present different use cases for big data, according to the results. For example, compliance and regulatory reporting are critical big data uses for banking and finance, while fraud detection is the most common use of big data, as shown in Table 2. Table 2. Sample Big Data Use Cases Industry Government Banking and Finance Information Technology Telecommunications Health Care and Life Sciences Sample Use Cases Fraud detection Compliance and regulatory analysis Climate analysis/weather prediction Compliance and regulatory analysis Risk analysis and management Fraud detection and security analytics CRM and customer loyalty programs Credit risk scoring and analysis High-speed arbitrage trading Trade surveillance Abnormal trading pattern analysis Information security analytics Threat intelligence Log management Revenue assurance and price optimization Customer churn prevention Campaign management and customer loyalty Call detail record (CDR) analysis Network performance and optimization Mobile user location analysis Clinical trials data analysis Disease pattern analysis Campaign and sales program optimization Medicaid and Medicare fraud Patient care quality and program analysis Population health Medical device and pharma supply chain management Drug discovery and development analysis 3

5 Big Data Realized (CONTINUED) More than half of survey respondents are actively involved in some form of a big data implementation, whether establishing a proof of concept, piloting or running in production. Just under 5% of respondents overall indicated that they have no big data implementations nor plans for future implementation. For the 28% of respondents currently having no big data implementation, but with plans for such applications, 49% indicated that their implementations are delayed by funding and resource issues. See Figure 2 for a look at the state of big data implementations. At what stage are you in your big data implementation? Proof of concept Pilot Running production workloads Do not have a big data implementation, but plan to Do not have a big data implementation, and do not plan to Unknown Figure 2. Stage of Big Data Implementations Of those respondents actively pursuing a big data implementation, 70% indicated that they have selected an architecture for implementation. These respondents, whose organizations take a variety of approaches to handling large data sets and analytics, form the population for the balance of the responses in the survey. In the survey, 42% rely primarily on more traditional relational database and data warehouse systems and tools. Modern frameworks Hadoop-based (19%) and NoSQL (8%) are used by 27% of respondents, as shown in Figure 3. 4

6 Big Data Realized (CONTINUED) Please describe your approach to your big data implementations. Select the best answer. Relational database/data warehouse systems and tools Hadoop-based framework Custom applications and connectors Outsource big data services NoSQL framework Other Figure 3. Approaches to Big Data Implementations Hadoop the open source framework written in Java to distribute the storage and processing of large volumes of data across hardware clusters is a big part of these big data ecosystems and is an open source foundational technology that provides a framework for distributed processing of large data sets using simple History of Hadoop 6 programming models. 5 Hadoop started with two Internet archive developers in 2002 as Nutch. By 2008, Hadoop was behind every batch process Yahoo ran to provide its information services. In 2009, Yahoo turned the source code of Hadoop over to the Apache Software Foundation. Various elements of the Hadoop ecosystem have been published under the open source Apache license since the first editions in Many open source and commercial development organizations have contributed to the core framework and to efforts to expand the original framework with better cluster management, manageability, security, programming tools and other elements that have helped drive adoption within large corporations. 5 Visit for a list of every Hadoop-related project. 6 The history of Hadoop: From 4 nodes to the future of data, https://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data 5

7 Big Data Realized (CONTINUED) Table 3 presents a core subset of Hadoop components that have broad engineering contributions, multivendor support and compatibility across the ecosystem. All are open source components that have become standards within the Hadoop ecosystem and have multiple vendors shipping and supporting them, as well as broad engineering contributions from across the community. Table 3. The Apache Hadoop Ecosystem 7 Project Common Avro MapReduce HDFS Pig Hive HBase ZooKeeper Sqoop Oozie Spark Impala Apache Parquet Apache Flume HUE Apache Sentry Definition Set of components and interfaces for distributed file systems and general I/O Framework for modeling, serializing and making Remote Procedure Calls (RPC) Distributed data processing model and execution environment Distributed file system Data flow language/execution environment Distributed data warehouse Distributed column-oriented database Distributed, highly available coordination service Tool for bulk transfer of data Service for running and scheduling workflows of Hadoop jobs Fast and general engine for large-scale data processing, cluster computing Open source, native analytic database for Hadoop Columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language Distributed, reliable and available service for efficiently collecting, aggregating and moving large amounts of log data Web interface for analyzing data with Hadoop Tool that provides fine-grained, role-based authorization to both data and metadata stored on a Hadoop cluster 7 Adapted from White, Tom. (2012). Hadoop: The Definitive Guide, 3rd Edition, Sebastopol, CA: O Reilly. Later additions have been made by SANS. 6

8 Big Data Realized (CONTINUED) General Architectures A big data implementation can combine both traditional and more modern computational resources, which were reflected in the types of architectures (Hadoop/ custom/outsourcing, etc.) respondents reported using. However, all of these architectures share common components, as shown in Figure 4. Governance, Management, Security Data Sources Data Founda on Data Processing / Transforma on Data Analy cs Analysis, Visualiza on, U liza on Unstructured Semi- Structured Structured Rela onal Data Reference Data Transac ons Machine Generated Social Media Text, Image, Video, Audio DBMS, NoSQL Flat Files Hadoop (HDFS) ETL Transforma on Condi oning Matching Aggrega on Message- Based Real- me Hadoop (MapReduce) Discovery and Explora on Analy cs, Event Processing Streaming Data Opera onal Data Stores (ODS) Warehouse Repor ng & Dashboards Aler ng Text Analy cs/ Search Applica on/industry Specific Storage, Computa on, Networking Infrastructure Interfaces Standard & Specialized Big Data Repositories Computa onal/ High- speed Network Resources Computa onal/ Network Resources High- speed Computa onal/ In- memory Analy cs Figure 4. Big Data Reference Architecture With such complex architectures, organizations implementing big data applications must consider security and information governance across the variety of technologies, platforms, access points and connectors involved in pulling this data into these environments, and of course, in regard to accessing the resulting datasets for analytics. Respondents to the survey indicated that their current implementation focus is on those elements of the big data architecture dealing with technical integration and management: monitoring, integration with existing enterprise architectures and services, security, virtualization and the use of Hadoop constructs. 7

9 Big Data Realized (CONTINUED) As big data implementations mature over the next two years, the focus shifts more to those elements that are increasingly information-oriented: data processing, organization and analytics. See Figure 5. What elements of your big data application architecture are you focused on today? And, based on your big data roadmap, where do you plan to place your emphasis over the next two years? Management, monitoring Integration with existing enterprise architecture and services Security tools (e.g., Sentry, Knox) Virtualization/Storage (e.g., HDFS, HBase) Hadoop Now Full-text search (e.g., Solr) Interactive analytics/ Self-service business intelligence tools (e.g., Impala) Next 2 Years Batch processing (e.g., Hive, Spark, MapReduce) Real-time/stream processing tools (e.g., Spark, Flume, HBase) Data science tools (e.g., Spark, Mahout, SAS, R) Data ingest tools (e.g., Sqoop, Flume, Kafka) Other Figure 5. Key Elements of the Big Data Architecture 8

10 Big Data Realized (CONTINUED) Log management, data storage and archiving are top uses for big data at respondents organizations, with more emphasis planned on advanced analytics and the mechanics of extract/transfer and load balancing, including data warehouse offloading. Respondents with big data implementations are currently focused on use cases that are infrastructureoriented, such as log management, data archival and operational data stores (ODS), as opposed to function-oriented use cases, including data acquisition, organization and analysis. See Figure 6. What key use cases of your big data application architecture are you focused on today, and which are you planning to implement in the next two years? Please select all that apply. Log management Data archival Operational data store Advanced analytics Now Data discovery Next 2 Years Search application Extract, transfer and load (ETL) processing Data warehouse offloading Other Figure 6. Key Use Cases Now and in the Future Those respondents who have not yet implemented a big data application but plan to during the next two years also follow this current trend; 58% of them say log management is a top priority, while 39% focus on ODS. However, over the next two years, respondents indicate that the emphasis will shift away from use cases that emphasize infrastructure capabilities and become more oriented toward accessing and analyzing the data itself. 9

11 Big Data Realized (CONTINUED) Data-Centric Security Data-centric security and information governance will grow in importance in proportion to the amount of sensitive information gathered in these systems. In our survey, PII and business records account for sensitive data used by more than 70% of respondents in structured, unstructured or semi-structured formats, as shown in Figure 7. What types of sensitive data does your organization manage in its big data applications? Please indicate whether data is stored in structured or unstructured forms for all that apply. Both Unstructured Structured Personally identifiable information Business records Employee records Intellectual property Transaction data Payment card information Health/Clinical records Citizen information/public sector/ Government National security intelligence data Student records Other Figure 7. Sensitive Data in Big Data Initiatives 10

12 Big Data Realized (CONTINUED) Because of the types of information they process in their big data systems, many respondents reported that the need to comply with the requirements of one or more federal security and privacy standards has provided either the structure or the test of security for big data systems. Of those with big data implementations, 83% indicated that those systems must comply with one or more regulations or standards. And 40% of respondents prove compliance based on audits by external third parties (see Figure 8). What are the applicable regulations or standards you must comply with? Local/State jurisdictional laws or standards HIPAA PCI SOX FISMA GLBA EU Data Protection Directive EURO-SOX Other Financial Instruments and Exchange Law of 2006 (Japan) PIPEDA (Canada) FERPA FDA Title 21 CFR Part 11 What evidence does your organization rely on to prove that your big data systems are within compliance of the regulations that apply? Select the most appropriate answer. External third-party audit Multiple report sources manually combined Multiple report sources automatically combined in a single interface Single manual search and report Through our big data management interface Other No answer Figure 8. Big Data Compliance 11

13 Distributed Data: A New Emphasis for Security Aligning with trends noted previously, respondents indicate a move toward using more data-centric controls tied to identity and data classification. Today, 54% of respondents are focused on integration with existing identity and access management infrastructure, 45% on implementation of role-based authorization controls (RBAC) and 27% on monitoring around data aggregation. See Figure 9. How do you manage access to sensitive data across your big data applications? Check all that apply today, and tell us what you d like to incorporate in the next 12 months. Integrate with existing identity and access management systems Authorize user categories and roles to specific data sets based on roles (RBAC) Monitor for and limit data aggregation based on sensitivity of data and user role Monitor for rogue services impersonating authorized user or system accounts Classify and tag all data that enters the big data environment for data access permissions Authorize user categories and roles according to specific data types or tags (such as PCI or Red Zone ), based on policies (ABAC) Apply a single set of access controls across the entire big data environment Prevent sensitive data pull to devices through sandboxing the user session Data de-identification Figure 11. Growth of Information/Data-centric Controls Other [Begin figure content] Today Next 12 Months How do you manage access to sensitive data across your big data applications? Check all that apply Figure today, 9. and Growth tell of us Information/Data-centric what you d like to incorporate Controls in the next 12 months. But these methodologies are replaced as the key focal points in the next 12 months as data-centric controls are strengthened and unified, including data classification, access controlled by tagging and policy-aware infrastructure (ABAC), de-identification and session and service controls are more closely monitored. Policies and controls also will be following big data into the cloud. Of those respondents who have implemented a big data solution, 15% have a unified interface provided by their cloud service. However, 22% still have to use separate controls for tracking access and protecting sensitive data in the cloud, and another 21% plan to cover big data access and security in the cloud within the next 12 months. 12

14 Distributed Data: A New Emphasis for Security (CONTINUED) Information Governance Information governance: The activities and technologies that organizations employ to maximize the value of their information while minimizing associated risks and costs 8 A big data environment may include information derived from a data source with proprietary research information, a data source requiring regulatory compliance and a data source with PII. What policies and restrictions apply to the consolidated information and how does that align with the role of the analyst or researcher wanting to use the dataset? Protecting big data may be a case-by-case balancing act between the privacy and security requirements for each data source as well as the quality of the data itself. As with traditional data-oriented systems, big data implementers need to be able to establish the appropriate level of trust both in the information derived (especially information that leads to an actionable decision like flying into bad weather) and the source data. That means you need to fully understand each data source and its consistency, uncertainties and integrity limitations. Especially with the volumes of data being stored and the diversity of users accessing it, information governance is a critical part of big data applications. A strong governance solution must cater to compliance regulations without disrupting the business users and should rely on automation to handle the scalability demanded by big data implementations. Integrated Approach Protecting big data may be a case-by-case balancing act between the privacy and security requirements for each data source as well as the quality of the data itself. Those in charge of big data implementations have a responsibility to make sure that big data sources don t expose the organization to further governance risks or security threats. So a key element to security is how the organization defines data governance as part of its overall approach to information governance. Consider the following steps: Understand each data source. Know who will have access to the source (both initially and after its data has been analyzed and understood), the risks inside the source (such as malware that can compromise your analytics engine), the necessary privacy protections and the strategy to accomplish those protections before moving any data into production. Identify known sensitive data and datasets. Remember, jurisdictional policies, company intellectual property and pertinent industry and government regulations, including derived linkages or metadata, may constitute sensitive data. New ways of using data bring new privacy implications: laws or guidelines related to protecting consumers against the collection of private information in scenarios such as the connected home and smart electric meters; mobile phones broadcasting physical location; health devices such as medical, fitness and lifestyle trackers; and telematics data that tracks automobile locations

15 Distributed Data: A New Emphasis for Security (CONTINUED) Know where data stores are located. Where is your data linked? How do you know those links aren t compromised? Where does it reside? Can you be certain that all your data is onshore? Understand the collection interface for the data source. A big data repository collects streaming data at high volumes and velocities from a number of different data sources, each with its own interface workflows. These multiple connections can increase the attack surface of the big data implementation, especially if they are not well understood and protected. Takeaway: If You Use Hadoop Implement basic security best practices, most of which are applicable to the Hadoop platform. 2. Seek executive sponsorship for the security initiatives you need to lock down the enterprise s big data platform. 3. Harden the operating system and lock down the Java virtual machine (VM) according to best practices. Hadoop runs on an operating system and most of the software runs in a Java VM. Take Hadoop Security to the Next Level 1. Build a perimeter. Hadoop supports industry-standard Kerberos to block access to non-authenticated users. With integration to LDAP and Active Directory, Hadoop can tie into centralized user and identity management systems. 2. Configure users and permissions. Set permissions for users, groups or roles by defining access control lists. Look to the open source community and industry projects for Hadoop security and role-based access control. 3. Encrypt data. Extend protection to cleartext access over the wire using SSL, at rest using Linux encryption or via HDFS encryption. 4. Monitor, audit, detect and resolve issues. Provide a comprehensive and integrated approach to monitoring and management across the Hadoop-based implementation, including its data sources and repositories. Be aware of derived metadata and its rules of behavior. Much of the value of big data lies in uncovering patterns that do not require identification of the individual. Organizations apply de-identification techniques to mask sensitive data, remove individual identifiers and still gain utility from the information, but it takes planning. Even if identifying information is stripped out, it is still possible to identify an individual because of how the anonymized data sources were combined and the resulting information patterns that are revealed. The Netflix data mining contest showed that an adversary who knows a little bit about a Netflix subscriber can easily identify that individual s record based only on an analysis of viewing preferences. 9 Automate to keep the users from being overwhelmed by the demands of governance and security at the grander scale of big data. IT security teams are simply overwhelmed by the volume of data being handled, which is one reason why a core philosophy behind the Critical Security Controls is to automate any defense that can be automated. Automation and analytics can improve the handling, correlation and prioritization of this data, thus allowing the user to concentrate on the meaning and the models behind the data, as well as the governance and security of the information being able extracted

16 Distributed Data: A New Emphasis for Security (CONTINUED) Best Practices Today and Tomorrow Over the next 12 months, respondents will use a variety of data protection and de-identification techniques, some of which are defined in the sidebar, as shown in Figure 10. Tokenization: Substituting a sensitive data element with a nonsensitive equivalent a token that has no extrinsic or exploitable meaning or value What data de-identification technologies are you using or planning to use as data flows into or out of your big data environment? Check all that apply today, and tell us what you d like to incorporate in the next 12 months. Masking: Creating a structurally similar but inauthentic version of an organization s data that can be used for purposes such as aggregated analysis, software testing or user training to protect the actual data while having a functional substitute for occasions when the real data is not required Tokenization inbound Persistent field-level encryption inbound Transparent at-rest data encryption Masking inbound Today Redaction inbound Tokenization outbound Next 12 Months Choosing the right strategy will take planning, including how to preserve the accuracy of data analysis across all data aggregation dimensions even if the names have been changed to protect the innocent. Reaction outbound Figure 10. Data De-identification Trends Masking outbound Persistent field-level encryption outbound Other Redaction: Obscuring (or removing) information before use 15

17 Distributed Data: A New Emphasis for Security (CONTINUED) Commitment Is Key The starting point for any big data or large-scale implementation project should be identifying the strategy and goals of the business owner. One of the characteristics of a big data implementation is that it is usually tied to specific business objectives of an organization. The impetus for and the decisions around a big data implementation will most likely come from executive management. As shown in Figure 11, most respondents indicated that executive management the C-level staff is responsible for the governance of big data in their organizations. Who, ultimately, is accountable for governance of your big data, including security and compliance? Data-centric security controls need to protect data through its useful life cycle, as opposed to just protecting the infrastructure that supports the data. CIO/CTO CSO/CISO Governance/ Compliance team Data manager/ Data administrator CEO/COO Board of directors Business unit/ Application owner System administrator Security administrator Application developer/ Application manager Figure 11. Responsibility for Governance Data-centric security controls need to place the emphasis on the protection of data through its useful life cycle as opposed to the infrastructure that supports the data whether endpoints, servers or networks. Many controls are in current use data management techniques; encryption at rest, in motion and in memory; and separation of bits of identifying data or information critical to keeping stolen data from being used, for example storing credit card numbers separately from names. If organizational leadership can t provide guidance on the problem to be solved by the big data application for example, speeding up fraud protection or evaluating climate trends on sales it may be best not to pursue the project. 16

18 Distributed Data: A New Emphasis for Security (CONTINUED) Effectiveness of Controls Respondent organizations are using a variety of policy controls around their big data implementations. Of the 86% of respondents able to rank the effectiveness of their controls, 78% said host-based and 72% said that network-based security technologies were the overall most effective controls, while 40% ranked encryption as very effective. See Table 4. Table 4. Effective Security Controls 12 Security Controls for Big Data Very Effective Effective Total Effective Not Effective Host-based application firewalls/ids 46.3% 31.3% 77.6% 4.5% Network-based IDS/IPS/UTM 28.4% 43.3% 71.6% 4.5% Encryption 40.3% 19.4% 59.7% 7.5% Centralized SIEM 23.9% 29.9% 53.7% 10.4% Security controls within our big data management system 10.4% 43.3% 53.7% 9.0% User-activity monitoring 17.9% 35.8% 53.7% 10.4% Secure development and life cycle practices 20.9% 29.9% 50.7% 9.0% Database activity monitoring 11.9% 38.8% 50.7% 4.5% Unified authorization mechanism 25.4% 17.9% 43.3% 3.0% Data de-identification (masking, tokenization, etc.) 28.4% 13.4% 41.8% 4.5% SAML integration 13.4% 22.4% 35.8% 4.5% Data redaction 16.4% 17.9% 34.3% 4.5% Digital Rights Management 10.4% 20.9% 31.3% 11.9% API (Knox) gateway 11.9% 14.9% 26.9% 7.5% Automated audit aggregation 10.4% 14.9% 25.4% 7.5% Data lineage methodologies 9.0% 11.9% 20.9% 6.0% In the very effective rating, encryption (40%) ranked second to host-based application firewalls and IDS, with 46%. This ranking also indicates that perimeter defenses are still a vital technology for monitoring inbound and outbound data and communications. 12 Columns do not add up to 100% because respondents rated only those controls they use. 17

19 Big Data Security: A Call to Action Big data is not just a buzzword it s a real, distributed architecture that supports powerful, effective technologies with real vulnerabilities. The volumes of unstructured data being consumed in big data projects creates new kinds of security challenges. As the growth of big data and data-centric security accelerate over the next two years, the security community must take steps to protect data that is expanding in volume, variety and velocity whether it is used defend against a cyber attack, improve factory automation or deliver better patient care through population health. The cornerstones of big data implementations governance and security must be established. The key to understanding and managing security for big data implementations appears not to be control of specific protocols or protection from certain techniques, but an overall awareness and ability to understand and govern the movement and use of data at the highest level. This level of big requires a new mindset around security implementing data-centric security measures, not just firewalls. Moreover, in some instances, there may be no traditional way to detect an attack because the whole big data application is too new for security personnel to understand what constitutes normal behavior. The security community should listen to the urging of Gartner analysts that CISOs must take steps to protect data that is expanding in volume, variety and velocity. 13 Security professionals need to take a more holistic approach to their trade fully understanding the data sources and the characteristics of each. They need to comprehend the analytics and automation being applied to determine how best to protect a big data enterprise, because there is no practical way to fully maintain situational awareness of the data at the accelerated rates of acquisition and change. With that level of understanding, organizations and vendors working in big data will continue to evolve in their tools, techniques and best practices to enable today s big data explosion without compromising their crown jewels in the process

20 About the Author and Advisor Barbara Filkins has done extensive work in system procurement, vendor selection and vendor negotiations in her career as a systems engineering and infrastructure design consultant. Based in Southern California, she sees security as a process she calls policy, process, platforms, pipes and people. Barbara has focused most recently on HIPAA security issues in the health and human services industry, with clients ranging from federal agencies (Department of Defense and Department of Veterans Affairs) to municipalities and commercial businesses. Her interest in information security comes from its impact on all aspects of the system development life cycle, as well as its relation to many of the issues faced by a modern society dependent on automation privacy, identity theft, exposure to fraud, and the legal aspects of enforcing information security. She holds the (ISC)2 CISSP, SANS GSEC (Gold) and GCIH (Gold), and the GHSC certifications. John Pescatore joined SANS as director of emerging technologies in January 2013, bringing with him over 35 years of experience in computer, network and information security. Prior to SANS, he was Gartner s lead security analyst for more than 13 years, working with Global 5000 corporations, government agencies and major technology and service providers. In 2008, John was named one of the top 15 most influential people in security and has frequently testified before Congress on issues relating to cybersecurity. Sponsor SANS would like to thank this survey s sponsor: 19

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Future of Data Management with Hadoop and the Enterprise Data Hub The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees

More information

White paper. The Big Data Security Gap: Protecting the Hadoop Cluster

White paper. The Big Data Security Gap: Protecting the Hadoop Cluster The Big Data Security Gap: Protecting the Hadoop Cluster Introduction While the open source framework has enabled the footprint of Hadoop to logically expand, enterprise organizations face deployment and

More information

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce

More information

Big Data Management and Security

Big Data Management and Security Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value

More information

Cloudera Enterprise Data Hub in Telecom:

Cloudera Enterprise Data Hub in Telecom: Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer

More information

Are You Big Data Ready?

Are You Big Data Ready? ACS 2015 Annual Canberra Conference Are You Big Data Ready? Vladimir Videnovic Business Solutions Director Oracle Big Data and Analytics Introduction Introduction What is Big Data? If you can't explain

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches. Detecting Anomalous Behavior with the Business Data Lake Reference Architecture and Enterprise Approaches. 2 Detecting Anomalous Behavior with the Business Data Lake Pivotal the way we see it Reference

More information

Securing Hadoop. Sudheesh Narayanan. Chapter No.1 "Hadoop Security Overview"

Securing Hadoop. Sudheesh Narayanan. Chapter No.1 Hadoop Security Overview Securing Hadoop Sudheesh Narayanan Chapter No.1 "Hadoop Security Overview" In this package, you will find: A Biography of the author of the book A preview chapter from the book, Chapter NO.1 "Hadoop Security

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management Big Data and New Paradigms in Information Management Vladimir Videnovic Institute for Information Management 2 "I am certainly not an advocate for frequent and untried changes laws and institutions must

More information

Fighting Cyber Fraud with Hadoop. Niel Dunnage Senior Solutions Architect

Fighting Cyber Fraud with Hadoop. Niel Dunnage Senior Solutions Architect Fighting Cyber Fraud with Hadoop Niel Dunnage Senior Solutions Architect 1 Summary Big Data is an increasingly powerful enterprise asset and this talk will explore the relationship between big data and

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera Version: 103 Table of Contents Introduction 3 Importance of Security 3 Growing Pains 3 Security Requirements

More information

Metrics that Matter Security Risk Analytics

Metrics that Matter Security Risk Analytics Metrics that Matter Security Risk Analytics Rich Skinner, CISSP Director Security Risk Analytics & Big Data Brinqa rskinner@brinqa.com April 1 st, 2014. Agenda Challenges in Enterprise Security, Risk

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

Interactive data analytics drive insights

Interactive data analytics drive insights Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems Proactively address regulatory compliance requirements and protect sensitive data in real time Highlights Monitor and audit data activity

More information

PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management

PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management INTRODUCTION Traditional perimeter defense solutions fail against sophisticated adversaries who target their

More information

Upcoming Announcements

Upcoming Announcements Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within

More information

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop

More information

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Addressing Open Source Big Data, Hadoop, and MapReduce limitations Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?

More information

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

Comprehensive Analytics on the Hortonworks Data Platform

Comprehensive Analytics on the Hortonworks Data Platform Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page

More information

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform Optimized for the Industrial Internet: GE s Industrial Lake Platform Agenda The Opportunity The Solution The Challenges The Results Solutions for Industrial Internet, deep domain expertise 2 GESoftware.com

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

White Paper: What You Need To Know About Hadoop

White Paper: What You Need To Know About Hadoop CTOlabs.com White Paper: What You Need To Know About Hadoop June 2011 A White Paper providing succinct information for the enterprise technologist. Inside: What is Hadoop, really? Issues the Hadoop stack

More information

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera Version: 102 Table of Contents Introduction 3 Importance of Security 3 Growing Pains 3 Security Requirements

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved. Big Data Analytics 1 Priority Discussion Topics What are the most compelling business drivers behind big data analytics? Do you have or expect to have data scientists on your staff, and what will be their

More information

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools

More information

Secure Data Transmission Solutions for the Management and Control of Big Data

Secure Data Transmission Solutions for the Management and Control of Big Data Secure Data Transmission Solutions for the Management and Control of Big Data Get the security and governance capabilities you need to solve Big Data challenges with Axway and CA Technologies. EXECUTIVE

More information

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013 Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the

More information

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Data Governance in the Hadoop Data Lake. Michael Lang May 2015 Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales

More information

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013 Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013 Housekeeping 1. Any questions coming out of today s presentation can be discussed in the bar this evening 2. OCF is

More information

Detect & Investigate Threats. OVERVIEW

Detect & Investigate Threats. OVERVIEW Detect & Investigate Threats. OVERVIEW HIGHLIGHTS Introducing RSA Security Analytics, Providing: Security monitoring Incident investigation Compliance reporting Providing Big Data Security Analytics Enterprise-wide

More information

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Exploiting Data at Rest and Data in Motion with a Big Data Platform Exploiting Data at Rest and Data in Motion with a Big Data Platform Sarah Brader, sarah_brader@uk.ibm.com What is Big Data? Where does it come from? 12+ TBs of tweet data every day 30 billion RFID tags

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Ensure PCI DSS compliance for your Hadoop environment. A Hortonworks White Paper October 2015

Ensure PCI DSS compliance for your Hadoop environment. A Hortonworks White Paper October 2015 Ensure PCI DSS compliance for your Hadoop environment A Hortonworks White Paper October 2015 2 Contents Overview Why PCI matters to your business Building support for PCI compliance into your Hadoop environment

More information

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution WHITEPAPER A Technical Perspective on the Talena Data Availability Management Solution BIG DATA TECHNOLOGY LANDSCAPE Over the past decade, the emergence of social media, mobile, and cloud technologies

More information

Adopt a unified, holistic approach to a broad range of data security challenges with IBM Data Security Services.

Adopt a unified, holistic approach to a broad range of data security challenges with IBM Data Security Services. Security solutions To support your IT objectives Adopt a unified, holistic approach to a broad range of data security challenges with IBM Data Security Services. Highlights Balance effective security with

More information

#TalendSandbox for Big Data

#TalendSandbox for Big Data Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND

More information

More Data in Less Time

More Data in Less Time More Data in Less Time Leveraging Cloudera CDH as an Operational Data Store Daniel Tydecks, Systems Engineering DACH & CE Goals of an Operational Data Store Load Data Sources Traditional Architecture Operational

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013 Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES Data Consolidation and Multi-Tenancy in Financial Services CLOUDERA INDUSTRY BRIEF 2 Table of Contents Introduction 3 Security

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

Integrating a Big Data Platform into Government:

Integrating a Big Data Platform into Government: Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government

More information

High End Information Security Services

High End Information Security Services High End Information Security Services Welcome Trion Logics Security Solutions was established after understanding the market's need for a high end - End to end security integration and consulting company.

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

Data Security in Hadoop

Data Security in Hadoop Data Security in Hadoop Eric Mizell Director, Solution Engineering Page 1 What is Data Security? Data Security for Hadoop allows you to administer a singular policy for authentication of users, authorize

More information

Discover & Investigate Advanced Threats. OVERVIEW

Discover & Investigate Advanced Threats. OVERVIEW Discover & Investigate Advanced Threats. OVERVIEW HIGHLIGHTS Introducing RSA Security Analytics, Providing: Security monitoring Incident investigation Compliance reporting Providing Big Data Security Analytics

More information

Deploying an Operational Data Store Designed for Big Data

Deploying an Operational Data Store Designed for Big Data Deploying an Operational Data Store Designed for Big Data A fast, secure, and scalable data staging environment with no data volume or variety constraints Sponsored by: Version: 102 Table of Contents Introduction

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Hadoop Big Data for Processing Data and Performing Workload

Hadoop Big Data for Processing Data and Performing Workload Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer

More information

Microsoft Big Data Solutions. Anar Taghiyev P-TSP E-mail: b-anarta@microsoft.com;

Microsoft Big Data Solutions. Anar Taghiyev P-TSP E-mail: b-anarta@microsoft.com; Microsoft Big Data Solutions Anar Taghiyev P-TSP E-mail: b-anarta@microsoft.com; Why/What is Big Data and Why Microsoft? Options of storage and big data processing in Microsoft Azure. Real Impact of Big

More information

Washington State s Use of the IBM Data Governance Unified Process Best Practices

Washington State s Use of the IBM Data Governance Unified Process Best Practices STATS-DC 2012 Data Conference July 12, 2012 Washington State s Use of the IBM Data Governance Unified Process Best Practices Bill Huennekens Washington State Office of Superintendent of Public Instruction,

More information

Integrating Cloudera and SAP HANA

Integrating Cloudera and SAP HANA Integrating Cloudera and SAP HANA Version: 103 Table of Contents Introduction/Executive Summary 4 Overview of Cloudera Enterprise 4 Data Access 5 Apache Hive 5 Data Processing 5 Data Integration 5 Partner

More information

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015 Data Governance in the Hadoop Data Lake Kiran Kamreddy May 2015 One Data Lake: Many Definitions A centralized repository of raw data into which many data-producing streams flow and from which downstream

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

and Hadoop Technology

and Hadoop Technology SAS and Hadoop Technology Overview SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS and Hadoop Technology: Overview. Cary, NC: SAS Institute

More information

Information Builders Mission & Value Proposition

Information Builders Mission & Value Proposition Value 10/06/2015 2015 MapR Technologies 2015 MapR Technologies 1 Information Builders Mission & Value Proposition Economies of Scale & Increasing Returns (Note: Not to be confused with diminishing returns

More information

Securing and protecting the organization s most sensitive data

Securing and protecting the organization s most sensitive data Securing and protecting the organization s most sensitive data A comprehensive solution using IBM InfoSphere Guardium Data Activity Monitoring and InfoSphere Guardium Data Encryption to provide layered

More information

A Modern Data Architecture with Apache Hadoop

A Modern Data Architecture with Apache Hadoop Modern Data Architecture with Apache Hadoop Talend Big Data Presented by Hortonworks and Talend Executive Summary Apache Hadoop didn t disrupt the datacenter, the data did. Shortly after Corporate IT functions

More information

Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang Software Engineer, Cloudera April 07, 2014

Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang Software Engineer, Cloudera April 07, 2014 1 Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang Software Engineer, Cloudera April 07, 2014 2 Outline Introduction Hadoop security primer Authentication Authorization Data Protection

More information

Big Data Zurich, November 23. September 2011

Big Data Zurich, November 23. September 2011 Institute of Technology Management Big Data Projektskizze «Competence Center Automotive Intelligence» Zurich, November 11th 23. September 2011 Felix Wortmann Assistant Professor Technology Management,

More information

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014 5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for

More information

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015 Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO Big Data Everywhere Conference, NYC November 2015 Agenda 1. Challenges with Risk Data Aggregation and Risk Reporting (RDARR) 2. How a

More information

Big Data, Integration and Governance: Ask the Experts

Big Data, Integration and Governance: Ask the Experts Big, Integration and Governance: Ask the Experts January 29, 2013 1 The fourth dimension of Big : Veracity handling data in doubt Volume Velocity Variety Veracity* at Rest Terabytes to exabytes of existing

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/

More information

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Apache Sentry Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture

More information

How the oil and gas industry can gain value from Big Data?

How the oil and gas industry can gain value from Big Data? How the oil and gas industry can gain value from Big Data? Arild Kristensen Nordic Sales Manager, Big Data Analytics arild.kristensen@no.ibm.com, tlf. +4790532591 April 25, 2013 2013 IBM Corporation Dilbert

More information

IBM Software Top tips for securing big data environments

IBM Software Top tips for securing big data environments IBM Software Top tips for securing big data environments Why big data doesn t have to mean big security challenges 2 Top Comprehensive tips for securing data big protection data environments for physical,

More information

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved. Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!

More information

Clavister InSight TM. Protecting Values

Clavister InSight TM. Protecting Values Clavister InSight TM Clavister SSP Security Services Platform firewall VPN termination intrusion prevention anti-virus anti-spam content filtering traffic shaping authentication Protecting Values & Enterprise-wide

More information

IBM Big Data Platform

IBM Big Data Platform IBM Big Data Platform Turning big data into smarter decisions Stefan Söderlund. IBM kundarkitekt, Försvarsmakten Sesam vår-seminarie Big Data, Bigga byte kräver Pigga Hertz! May 16, 2013 By 2015, 80% of

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

We are building the next generation of Big Data and Analytics solutions!

We are building the next generation of Big Data and Analytics solutions! We are building the next generation of Big Data and Analytics solutions! Background 26 years Experience IT Industry 12 Years Solutions Architect - International Profile Passionate about Technology Genuine

More information

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD Big Analytics for Space Exploration, Entrepreneurship and Policy Opportunities Tiffani Crawford, PhD Big Analytics Characteristics Large quantities of many data types Structured Unstructured Human Machine

More information

The Five Most Common Big Data Integration Mistakes To Avoid O R A C L E W H I T E P A P E R A P R I L 2 0 1 5

The Five Most Common Big Data Integration Mistakes To Avoid O R A C L E W H I T E P A P E R A P R I L 2 0 1 5 The Five Most Common Big Data Integration Mistakes To Avoid O R A C L E W H I T E P A P E R A P R I L 2 0 1 5 Executive Summary Big Data projects have fascinated business executives with the promise of

More information

VIEWPOINT. High Performance Analytics. Industry Context and Trends

VIEWPOINT. High Performance Analytics. Industry Context and Trends VIEWPOINT High Performance Analytics Industry Context and Trends In the digital age of social media and connected devices, enterprises have a plethora of data that they can mine, to discover hidden correlations

More information

The Enterprise Data Hub and The Modern Information Architecture

The Enterprise Data Hub and The Modern Information Architecture The Enterprise Data Hub and The Modern Information Architecture Dr. Amr Awadallah CTO & Co-Founder, Cloudera Twitter: @awadallah 1 2013 Cloudera, Inc. All rights reserved. Cloudera Overview The Leader

More information

Teradata and Protegrity High-Value Protection for High-Value Data

Teradata and Protegrity High-Value Protection for High-Value Data Teradata and Protegrity High-Value Protection for High-Value Data 03.16 EB7178 DATA SECURITY Table of Contents 2 Data-Centric Security: Providing High-Value Protection for High-Value Data 3 Visibility:

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Architecture Modernization

Architecture Modernization Architecture Modernization Pragmatic Data Engineering and Pipeline Creation 1 Trends in the Market Explosion of Unstructured Data Data Warehouse Limitations Increased Processing Demands 16 billion connected

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

CHANGING THE SECURITY MONITORING STATUS QUO Solving SIEM problems with RSA Security Analytics

CHANGING THE SECURITY MONITORING STATUS QUO Solving SIEM problems with RSA Security Analytics CHANGING THE SECURITY MONITORING STATUS QUO Solving SIEM problems with RSA Security Analytics TRADITIONAL SIEMS ARE SHOWING THEIR AGE Security Information and Event Management (SIEM) tools have been a

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information