Hadoop Security Design Just Add Kerberos? Really?
|
|
|
- Sylvia Jones
- 10 years ago
- Views:
Transcription
1 isec Partners, Inc. Hadoop Security Design Just Add Kerberos? Really? isec Partners, Inc. is an information security firm that specializes in application, network, host, and product security. For more information about isec, please refer to the Appendix A or isecpartners.com. Andrew Becherer ([email protected]) The Apache Foundation s Hadoop Distributed File System (HDFS) and MapReduce engine comprise a distributed computing framework inspired by Google MapReduce and the Google File System (GFS). As originally implemented Hadoop security was completely ineffective. In late 2009 and early 2010 Apache Foundation and Yahoo! developers embarked on an effort to improve the state of Hadoop security. This paper documents and provides an analysis of those efforts. The following section summarizes the outline of this Whitepaper: What is Hadoop Hadoop Risks The New Approach to Security Concerns An Alternative Strategy Conclusion About the Author References This paper focuses on the design of new Hadoop security features available in Hadoop 0.20.S. The focus is to determine whether the new security mechanisms will scale to meet the requirement of large enterprise users.
2 What is Hadoop? The Apache Foundation s Hadoop Distributed File System (HDFS) and MapReduce engine comprise a distributed computing infrastructure inspired by Google MapReduce and the Google File System (GFS). The Hadoop framework allows processing of massive data sets with distributed computing techniques by leveraging large numbers of physical hosts. Hadoop s use is spreading far beyond its open source search engine roots. The Hadoop framework is also being offered by Platform as a Service cloud computing providers. In 2003 and 2004 Google employees released two papers describing a method for large scale distributed data intensive applications. Inspired by these papers Doug Cutting created a distributed computing framework, called Hadoop, to support the open source Nutch search engine. At this time secure deployment and use of Hadoop was not a concern. The data in Hadoop was not sensitive and access to the cluster could be sufficiently limited. Hadoop is made up of two primary components. These components are the Hadoop Distributed File System (HDFS) and the MapReduce engine. HDFS is made up of geographically distributed Data Nodes. Access to these Data Nodes is coordinated by a service called the Name Node. Data Nodes communicate over the network in order to rebalance data and ensure data is replicated throughout the cluster. The MapReduce engine is made up of two main components. Users submit jobs to a Job Tracker which then distributes the task to Task Trackers as physically close to the required data as possible. While these are the primary components of a Hadoop cluster there are often other services running in a Hadoop cluster such as a workflow manager. Hadoop is in use at many of the world's largest online media companies including Yahoo, Facebook, Fox Interactive Media, LinkedIn and Twitter. Hadoop is entering the enterprise as evidenced by Hadoop World 2009 presentations from Booz Allen Hamilton and JP Morgan Chase. Hadoop is making its way into the federal government as well. In 2009 the National Security Agency began testing a Hadoop based system for intelligence gathering to link disparate intelligence data sources. The size of Hadoop deployments can grow quite large. According to media reports Yahoo currently maintains 38,000 machines distributed across 20 different clusters. Hadoop has even been elevated to the "cloud" and made available as a Platform as a Service offering by Amazon and Sun.
3 Hadoop Risks When Hadoop development began in 2004 no effort was expended on creating a secure distributed computing environment. The Hadoop framework performed insufficient authentication and authorization of both users and services. The insufficient authentication and authorization of users allowed any user to impersonate any other user. The framework did not perform mutual authentication and this would allow a malicious network user to impersonate cluster services. The Hadoop File System s (HDFS) lax authorization allowed anyone to write data and any data to be read. Deploying a secure Hadoop cluster was essentially impossible. As a result of the insufficient authentication and authorization performed by both HDFS and the MapReduce engine any user could impersonate any other user. Arbitrary java code could be submitted to Job Trackers to be executed as the Job Tracker user account. HDFS file permissions were easily circumvented. The framework did not perform mutual authentication and this allowed malicious network users to impersonate cluster services. If a malicious user could discover a data block s ID the data could be read. Write access was essentially not limited. The only way to securely deploy Hadoop was to enforce strict network segregation. In this scenario any user given access to the cluster was trusted absolutely. A New Approach to Security In 2009 discussion about Hadoop security reached a boiling point. Security was made a high priority. The Hadoop developers 2010 goals included strong mutual authentication of users and services that would be transparent to end users. In addition to the changes of Hadoop core a new workflow manager, Oozie, was introduced. The developers chose to use the Simple Authentication and Security Layer (SASL) with Kerberos, via GSSAPI, to authenticate users to the edge services. When a user connects to a Job Tracker that connection is mutually authenticated using Kerberos. Operating system principles are matched to a set of user and group access control lists maintained in flat configuration files. In order to improve performance and ensure the KDC is not a bottleneck the developers chose to use a number of tokens for communication secured with an RPC Digest scheme. The new Hadoop security design makes use of Delegation Tokens, Job Tokens and Block Access Tokens. Each of these tokens is similar in structure and based on HMAC-SHA1. Del-
4 egation Tokens are used for clients to communicate with the Name Node in order to gain access to HDFS data. Block Access Tokens are used to secure communication between the Name Node and Data Nodes and to enforce HDFS filesystem permissions. The Job Token is used to secure communication between the MapReduce engine Task Tracker and individual tasks. It is important to note that this scheme uses symmetric encryption and depending upon the token type the shared key may be distributed to hundreds or even thousands of hosts. At the same time the new Kerberos and RPC Digest security mechanisms were unveiled the Hadoop developers at Yahoo open sourced a new workflow manager called Oozie. Oozie allows users to streamline the submission and management of MapReduce jobs. In order for Oozie to perform its function it has been designated a superuser and can perform actions on behalf of any Hadoop user. Authentication to Oozie has not been implemented. There is a pluggable authentication interface for Oozie but there are no public authentication mechanisms ready to plug in. Anyone planning to make use of Oozie will need to develop their own authentication mechanism. According to the Hadoop Security Design whitepaper, the Hadoop developers considered writing an authentication plugin based on SPNEGO, to support browser based Kerberos authentication, but the limitations of Jetty 6 and uneven browser support dissuaded them from this effort. In subsequent presentations by Hadoop developers the need for a default authentication plugin, with a preference for SPNEGO, has been discussed. In order to meet their development schedule and maintain backwards compatibility with previous versions of Hadoop the developers made several compromises. The new design requires that end users cannot have administrative rights on any machines in the cluster. If end users had administrative access to cluster machines they could discover Delegation Tokens, Job Tokens, Block Access Tokens or symmetric encryption keys and subvert the security guarantees of the system. In developing the new security features it was decided that these features must not impact GridMix performance more than 3%. This decision guided the developers toward the use of symmetric encryption algorithms and did not encourage the use of secure network transports. Concerns The limitations of the threat model used in the development of the new security design produce a number of concerns. Chief among these concerns are the poor default SASL Quality of Protection (QoP), the wide distribution of symmetric cryptographic keys, incomplete pluggable web UI authentication and the use of IP Based Authentication.
5 Because of the emphasis on performance and the perception that encryption is expensive Hadoop uses a poor default SASL Quality of Protection (QoP). The SASL framework used to add Kerberos support to Hadoop RPC communication can do much better. Other options for QoP protect the integrity and privacy of network communication. The default QoP for Hadoop is authentication, which does not provide integrity or privacy of network communication. This leaves Hadoop RPC communication vulnerable to eavesdropping and modification. The new Hadoop security design relies on the use of HMAC-SHA1, a symmetric key cryptographic algorithm. In the case of the Block Access Token the symmetric key used in the HMAC-SHA1 will need to be distributed to the Name Node and every Data Node in the cluster. This is potentially hundreds or thousands of geographically distributed machines. If the shared key is disclosed to an attacker the data on all Data Nodes is vulnerable. Given Data IDs the attacker could craft Block Access Tokens, reducing security of Hadoop to the previous level. Many Hadoop services include HTTP interfaces. These services include the Job Tracker, Task Tracker, Name Node, Data Node and the new Oozie workflow manager. In order to provide authentication for these web interfaces the Hadoop developer have implemented pluggable web UI authentication. This requires the end user of Hadoop to provide a web UI authentication mechanism. In some Hadoop deployments HDFS proxies are used for server to server bulk data transfer. The Hadoop platform uses the proxy IP addresses, and a database of roles, in order to perform authentication and authorization. IP addresses are not a strong method of authentication. This could lead to the bulk disclosure of all data the HDFS proxy is authorized to access. An Alternative Strategy There are currently several proposals for the secure use of Hadoop. One such proposal is Hadoop over the Tahoe Least Access Filesystem (Tahoe-LAFS). The Tahoe-LAFS is an open source, decentralized data store that attempts to preserve your privacy and security even in the case where an individual server has been compromised. Aaron Cordova and colleagues developed this method of running Hadoop over Tahoe, a Least-Authority File System. Hadoop over Tahoe-LAFS assumes the disk cannot be trusted, the network cannot be trusted but the memory on compute nodes can be trusted. As such individual nodes encrypt
6 files on disk and only communicate over a secure transport. Hadoop over Tahoe-LAFS has significant impact on GridMix performance. Write performance is especially impacted. Conclusion The new Hadoop security model requires additional time and effort before it will meet the requirements of many large enterprises. Changing the SASL Quality of Protection (QoP) should improve the security posture of the system but will have an unknown impact on performance. The wide distribution of symmetric cryptographic keys should be reviewed for alternative solutions. The incomplete pluggable web UI authentication and the use of IP Based Authentication are issues that must be resolved.
7 About the Author Andrew Becherer is a Senior Security Consultant with isec Partners, a strategic digital security organization. His focus is web and mobile application security as well as emerging cloud computing models. Prior to joining isec Partners, he was a Senior Consultant with Booz Allen Hamilton. Mr. Becherer spent several years as a Risk and Credit Analyst in the financial services industry. His experience in the software security, consulting, financial, non-profit and defense sectors has provided him experience with a wide range of technologies. Mr. Becherer has lectured on a number of topics including emerging cloud computing threat models, virtualization, network security tools and embedded Linux development. At the Black Hat Briefings USA 2009, Andrew, along with researchers Alex Stamos and Nathan Wilcox, presented on the topic "Cloud Computing Models and Vulnerabilities: Raining on the Trendy New Parade." Andrew's research on this topic focused on the effect of elasticity and virtualization on the Linux pseudorandom number generator (PRNG). At Black Hat USA 2008, he was a Microsoft Defend the Flag (DTF) instructor and, he is a recurring speaker at the Linuxfest Northwest conference. In addition to his educational outreach work with user groups, he is a member of several nationally recognized organizations. These organizations include the Association of Computing Machinery (ACM), FBI InfraGard and Open Web Application Security Project (OWASP). References Hadoop Security Design by Owen O Malley, Kan Zhang, Sanjay Radia, Ram Marti, and Christopher Harrell MapReduce: Simplified Data Processing on Large Clusters by Jeffrey Dean and Sanjay Ghemawat MapReduce Over Tahoe by Aaron Cordova
8 isec Partners, Inc. Appendix A: About isec Partners, Inc. isec Partners is a proven full-service security firm, dedicated to making Software Secure. Our focus areas include: Mobile Application Security Web Application Security Client/Server Security OnDemand Web Application Scanning (Automated/Manual) Published Books Notable Presentations Whitepaper, Tools, Advisories, & SDL Products 12 Published Whitepapers o Including the first whitepaper on CSRF 37 Free Security Tools o Application, Infrastructure, Mobile, VoIP, & Storage 9 Advisories o Including products from Google, Apple, and Adobe Free SDL Products o SecurityQA Toolbar (Automated Web Application Testing) o Code Coach (Secure Code Enforcement and Scanning) o Computer Based Training (Java & WebApp Security)
-- Mario G. Salvadori, Why Buildings Fall Down*
upon receiving on her ninety-second birthday the first copy of Why Buildings Stand Up, [my mother in law] said matter-of-factly, This is nice, but I d be much more interested in reading why they fall down.
Integrating Kerberos into Apache Hadoop
Integrating Kerberos into Apache Hadoop Kerberos Conference 2010 Owen O Malley [email protected] Yahoo s Hadoop Team Who am I An architect working on Hadoop full time Mainly focused on MapReduce Tech-lead
Hadoop Security Design
Hadoop Security Design Owen O Malley, Kan Zhang, Sanjay Radia, Ram Marti, and Christopher Harrell Yahoo! {owen,kan,sradia,rmari,cnh}@yahoo-inc.com October 2009 Contents 1 Overview 2 1.1 Security risks.............................
Like what you hear? Tweet it using: #Sec360
Like what you hear? Tweet it using: #Sec360 HADOOP SECURITY Like what you hear? Tweet it using: #Sec360 HADOOP SECURITY About Robert: School: UW Madison, U St. Thomas Programming: 15 years, C, C++, Java
DATA SECURITY MODEL FOR CLOUD COMPUTING
DATA SECURITY MODEL FOR CLOUD COMPUTING POOJA DHAWAN Assistant Professor, Deptt of Computer Application and Science Hindu Girls College, Jagadhri 135 001 [email protected] ABSTRACT Cloud Computing
CSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK BIG DATA HOLDS BIG PROMISE FOR SECURITY NEHA S. PAWAR, PROF. S. P. AKARTE Computer
White paper. The Big Data Security Gap: Protecting the Hadoop Cluster
The Big Data Security Gap: Protecting the Hadoop Cluster Introduction While the open source framework has enabled the footprint of Hadoop to logically expand, enterprise organizations face deployment and
L1: Introduction to Hadoop
L1: Introduction to Hadoop Feng Li [email protected] School of Statistics and Mathematics Central University of Finance and Economics Revision: December 1, 2014 Today we are going to learn... 1 General
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of
Big Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
Introduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"
Securing Privileges in the Cloud. A Clear View of Challenges, Solutions and Business Benefits
A Clear View of Challenges, Solutions and Business Benefits Introduction Cloud environments are widely adopted because of the powerful, flexible infrastructure and efficient use of resources they provide
Yuji Shirasaki (JVO NAOJ)
Yuji Shirasaki (JVO NAOJ) A big table : 20 billions of photometric data from various survey SDSS, TWOMASS, USNO-b1.0,GSC2.3,Rosat, UKIDSS, SDS(Subaru Deep Survey), VVDS (VLT), GDDS (Gemini), RXTE, GOODS,
ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE
ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics
WHITE PAPER. FortiWeb and the OWASP Top 10 Mitigating the most dangerous application security threats
WHITE PAPER FortiWeb and the OWASP Top 10 PAGE 2 Introduction The Open Web Application Security project (OWASP) Top Ten provides a powerful awareness document for web application security. The OWASP Top
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul
The future: Big Data, IoT, VR, AR. Leif Granholm Tekla / Trimble buildings Senior Vice President / BIM Ambassador
The future: Big Data, IoT, VR, AR Leif Granholm Tekla / Trimble buildings Senior Vice President / BIM Ambassador What is Big Data? 2 Big Data is when the amount of data becomes part of the problem 3 Big
Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
Olivier Renault Solu/on Engineer Hortonworks. Hadoop Security
Olivier Renault Solu/on Engineer Hortonworks Hadoop Security Agenda Why security Kerberos HDFS ACL security Network security - KNOX Hive - doas = False - ATZ-NG YARN ACL p67-91 Capacity scheduler ACL Killing
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
Policy-based Pre-Processing in Hadoop
Policy-based Pre-Processing in Hadoop Yi Cheng, Christian Schaefer Ericsson Research Stockholm, Sweden [email protected], [email protected] Abstract While big data analytics provides
Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)
Open Cloud System (Integration of Eucalyptus, Hadoop and into deployment of University Private Cloud) Thinn Thu Naing University of Computer Studies, Yangon 25 th October 2011 Open Cloud System University
HTTPS Inspection with Cisco CWS
White Paper HTTPS Inspection with Cisco CWS What is HTTPS? Hyper Text Transfer Protocol Secure (HTTPS) is a secure version of the Hyper Text Transfer Protocol (HTTP). It is a combination of HTTP and a
Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! [email protected]
Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind
Hadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
Big Data Management and Security
Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value
Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela
Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance
R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5
Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 [email protected], 2 [email protected],
A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud
A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud Thuy D. Nguyen, Cynthia E. Irvine, Jean Khosalim Department of Computer Science Ground System Architectures Workshop
Adobe Systems Incorporated
Adobe Connect 9.2 Page 1 of 8 Adobe Systems Incorporated Adobe Connect 9.2 Hosted Solution June 20 th 2014 Adobe Connect 9.2 Page 2 of 8 Table of Contents Engagement Overview... 3 About Connect 9.2...
A token-based authentication security scheme for Hadoop distributed file system using elliptic curve cryptography
J Comput Virol Hack Tech (2015) 11:137 142 DOI 10.1007/s11416-014-0236-5 ORIGINAL PAPER A token-based authentication security scheme for Hadoop distributed file system using elliptic curve cryptography
Introduction to Cloud Computing
Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services
Building Secure Cloud Applications. On the Microsoft Windows Azure platform
Building Secure Cloud Applications On the Microsoft Windows Azure platform Contents 1 Security and the cloud 3 1.1 General considerations 3 1.2 Questions to ask 3 2 The Windows Azure platform 4 2.1 Inside
Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay
Weekly Report Hadoop Introduction submitted By Anurag Sharma Department of Computer Science and Engineering Indian Institute of Technology Bombay Chapter 1 What is Hadoop? Apache Hadoop (High-availability
Media Shuttle s Defense-in- Depth Security Strategy
Media Shuttle s Defense-in- Depth Security Strategy Introduction When you are in the midst of the creative flow and tedious editorial process of a big project, the security of your files as they pass among
HDFS Users Guide. Table of contents
Table of contents 1 Purpose...2 2 Overview...2 3 Prerequisites...3 4 Web Interface...3 5 Shell Commands... 3 5.1 DFSAdmin Command...4 6 Secondary NameNode...4 7 Checkpoint Node...5 8 Backup Node...6 9
marlabs driving digital agility WHITEPAPER Big Data and Hadoop
marlabs driving digital agility WHITEPAPER Big Data and Hadoop Abstract This paper explains the significance of Hadoop, an emerging yet rapidly growing technology. The prime goal of this paper is to unveil
Big Data Security. Kevvie Fowler. kpmg.ca
Big Data Security Kevvie Fowler kpmg.ca About myself Kevvie Fowler, CISSP, GCFA Partner, Advisory Services KPMG Canada Industry contributions Big data security definitions Definitions Big data Datasets
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms
Volume 1, Issue 1 ISSN: 2320-5288 International Journal of Engineering Technology & Management Research Journal homepage: www.ijetmr.org Analysis and Research of Cloud Computing System to Comparison of
On the features and challenges of security and privacy in distributed internet of things. C. Anurag Varma [email protected] CpE 6510 3/24/2016
On the features and challenges of security and privacy in distributed internet of things C. Anurag Varma [email protected] CpE 6510 3/24/2016 Outline Introduction IoT (Internet of Things) A distributed IoT
Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment
Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,
Leverage Active Directory with Kerberos to Eliminate HTTP Password
Leverage Active Directory with Kerberos to Eliminate HTTP Password PistolStar, Inc. PO Box 1226 Amherst, NH 03031 USA Phone: 603.547.1200 Fax: 603.546.2309 E-mail: [email protected] Website: www.pistolstar.com
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop
A Study of Data Management Technology for Handling Big Data
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 9, September 2014,
Software Execution Protection in the Cloud
Software Execution Protection in the Cloud Miguel Correia 1st European Workshop on Dependable Cloud Computing Sibiu, Romania, May 8 th 2012 Motivation clouds fail 2 1 Motivation accidental arbitrary faults
Active Directory Compatibility with ExtremeZ-IP. A Technical Best Practices Whitepaper
Active Directory Compatibility with ExtremeZ-IP A Technical Best Practices Whitepaper About this Document The purpose of this technical paper is to discuss how ExtremeZ-IP supports Microsoft Active Directory.
Cloud Computing based on the Hadoop Platform
Cloud Computing based on the Hadoop Platform Harshita Pandey 1 UG, Department of Information Technology RKGITW, Ghaziabad ABSTRACT In the recent years,cloud computing has come forth as the new IT paradigm.
Host Hardening. Presented by. Douglas Couch & Nathan Heck Security Analysts for ITaP 1
Host Hardening Presented by Douglas Couch & Nathan Heck Security Analysts for ITaP 1 Background National Institute of Standards and Technology Draft Guide to General Server Security SP800-123 Server A
Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW
Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software 22 nd October 2013 10:00 Sesión B - DB2 LUW 1 Agenda Big Data The Technical Challenges Architecture of Hadoop
Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique
Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,[email protected]
Hadoop Big Data for Processing Data and Performing Workload
Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer
Securing Virtual Applications and Servers
White Paper Securing Virtual Applications and Servers Overview Security concerns are the most often cited obstacle to application virtualization and adoption of cloud-computing models. Merely replicating
Cloud Security:Threats & Mitgations
Cloud Security:Threats & Mitgations Vineet Mago Naresh Khalasi Vayana 1 What are we gonna talk about? What we need to know to get started Its your responsibility Threats and Remediations: Hacker v/s Developer
THE HADOOP DISTRIBUTED FILE SYSTEM
THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,
Cloud Computing Trends
UT DALLAS Erik Jonsson School of Engineering & Computer Science Cloud Computing Trends What is cloud computing? Cloud computing refers to the apps and services delivered over the internet. Software delivered
Hack Proof Your Webapps
Hack Proof Your Webapps About ERM About the speaker Web Application Security Expert Enterprise Risk Management, Inc. Background Web Development and System Administration Florida International University
What Do You Mean My Cloud Data Isn t Secure?
Kaseya White Paper What Do You Mean My Cloud Data Isn t Secure? Understanding Your Level of Data Protection www.kaseya.com As today s businesses transition more critical applications to the cloud, there
SafeNet DataSecure vs. Native Oracle Encryption
SafeNet vs. Native Encryption Executive Summary Given the vital records databases hold, these systems often represent one of the most critical areas of exposure for an enterprise. Consequently, as enterprises
Big Data and Hadoop with components like Flume, Pig, Hive and Jaql
Abstract- Today data is increasing in volume, variety and velocity. To manage this data, we have to use databases with massively parallel software running on tens, hundreds, or more than thousands of servers.
Evaluation of Security in Hadoop
Evaluation of Security in Hadoop MAHSA TABATABAEI Master s Degree Project Stockholm, Sweden December 22, 2014 XR-EE-LCN 2014:013 A B S T R A C T There are different ways to store and process large amount
Cyber Forensic for Hadoop based Cloud System
Cyber Forensic for Hadoop based Cloud System ChaeHo Cho 1, SungHo Chin 2 and * Kwang Sik Chung 3 1 Korea National Open University graduate school Dept. of Computer Science 2 LG Electronics CTO Division
MapReduce, Hadoop and Amazon AWS
MapReduce, Hadoop and Amazon AWS Yasser Ganjisaffar http://www.ics.uci.edu/~yganjisa February 2011 What is Hadoop? A software framework that supports data-intensive distributed applications. It enables
ETHICAL HACKING 010101010101APPLICATIO 00100101010WIRELESS110 00NETWORK1100011000 101001010101011APPLICATION0 1100011010MOBILE0001010 10101MOBILE0001
001011 1100010110 0010110001 010110001 0110001011000 011000101100 010101010101APPLICATIO 0 010WIRELESS110001 10100MOBILE00010100111010 0010NETW110001100001 10101APPLICATION00010 00100101010WIRELESS110
Accelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2
Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2 1 PDA College of Engineering, Gulbarga, Karnataka, India [email protected] 2 PDA College of Engineering, Gulbarga, Karnataka,
TIBCO Spotfire Platform IT Brief
Platform IT Brief This IT brief outlines features of the system: Communication security, load balancing and failover, authentication options, and recommended practices for licenses and access. It primarily
Interactive Application Security Testing (IAST)
WHITEPAPER Interactive Application Security Testing (IAST) The World s Fastest Application Security Software Software affects virtually every aspect of an individual s finances, safety, government, communication,
IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE
IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE Mr. Santhosh S 1, Mr. Hemanth Kumar G 2 1 PG Scholor, 2 Asst. Professor, Dept. Of Computer Science & Engg, NMAMIT, (India) ABSTRACT
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)
What IT Auditors Need to Know About Secure Shell. SSH Communications Security
What IT Auditors Need to Know About Secure Shell SSH Communications Security Agenda Secure Shell Basics Security Risks Compliance Requirements Methods, Tools, Resources What is Secure Shell? A cryptographic
FileCloud Security FAQ
is currently used by many large organizations including banks, health care organizations, educational institutions and government agencies. Thousands of organizations rely on File- Cloud for their file
Analyzing HTTP/HTTPS Traffic Logs
Advanced Threat Protection Automatic Traffic Log Analysis APTs, advanced malware and zero-day attacks are designed to evade conventional perimeter security defenses. Today, there is wide agreement that
Hadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
BIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
BIG DATA USING HADOOP
+ Breakaway Session By Johnson Iyilade, Ph.D. University of Saskatchewan, Canada 23-July, 2015 BIG DATA USING HADOOP + Outline n Framing the Problem Hadoop Solves n Meet Hadoop n Storage with HDFS n Data
Application Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
Introduction to Hadoop
Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction
Autodesk PLM 360 Security Whitepaper
Autodesk PLM 360 Autodesk PLM 360 Security Whitepaper May 1, 2015 trust.autodesk.com Contents Introduction... 1 Document Purpose... 1 Cloud Operations... 1 High Availability... 1 Physical Infrastructure
Application Layer Encryption: Protecting against Application Logic and Session Theft Attacks. Whitepaper
Application Layer Encryption: Protecting against Application Logic and Session Theft Attacks Whitepaper The security industry has extensively focused on protecting against malicious injection attacks like
Hadoop Security Analysis NOTE: This is a working draft. Notes are being collected and will be edited for readability.
Hadoop Security Analysis NOTE: This is a working draft. Notes are being collected and will be edited for readability. Introduction This document describes the state of security in a Hadoop YARN cluster.
End-to-end Secure Cloud Services a Pertino whitepaper
a Pertino whitepaper Executive summary Whether companies use the cloud as a conduit to connect remote locations and mobile users or use cloud-based applications, corporations have found that they can reduce
Hadoop as a Service. VMware vcloud Automation Center & Big Data Extension
Hadoop as a Service VMware vcloud Automation Center & Big Data Extension Table of Contents 1. Introduction... 2 1.1 How it works... 2 2. System Pre-requisites... 2 3. Set up... 2 3.1 Request the Service
Workday Mobile Security FAQ
Workday Mobile Security FAQ Workday Mobile Security FAQ Contents The Workday Approach 2 Authentication 3 Session 3 Mobile Device Management (MDM) 3 Workday Applications 4 Web 4 Transport Security 5 Privacy
Hadoop: Distributed Data Processing. Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010
Hadoop: Distributed Data Processing Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010 Outline Scaling for Large Data Processing What is Hadoop? HDFS and MapReduce
Security Considerations for DirectAccess Deployments. Whitepaper
Security Considerations for DirectAccess Deployments Whitepaper February 2015 This white paper discusses security planning for DirectAccess deployment. Introduction DirectAccess represents a paradigm shift
An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov
An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research
HDFS Federation. Sanjay Radia Founder and Architect @ Hortonworks. Page 1
HDFS Federation Sanjay Radia Founder and Architect @ Hortonworks Page 1 About Me Apache Hadoop Committer and Member of Hadoop PMC Architect of core-hadoop @ Yahoo - Focusing on HDFS, MapReduce scheduler,
