Extended Attributes and Transparent Encryption in Apache Hadoop
|
|
- Laurence Preston
- 8 years ago
- Views:
Transcription
1 Extended Attributes and Transparent Encryption in Apache Hadoop Uma Maheswara Rao G Yi Liu ( 刘 轶 )
2 Who we are? Uma Maheswara Rao G - umamahesh@apache.org - Software Engineer at Intel - PMC/committer, Apache Hadoop - PMC/committer, Apache BookKeeper Yi Liu ( 刘 轶 ) - yliu@apache.org - Software Engineer at Intel - Active committer, Apache Hadoop - PMC/committer, Apache Tajo - Senior security expert of Big data
3 Intel BigData Team Global team, local focus Worldwide (China, US and India) teams, >80% in China Local collaborations (industry & academic) a high priority Greater impact thru open source Active open source development (Spark, Hadoop, HBase, Storm, etc.) Widely used in the industry (from Facebook to Alibaba to Cloudera to China Mobile ) Strong influence in the open source community ~10 project committers in the team Technology and innovation oriented Next generations of Big Data Technologies Real-time, in-memory, complex analytics (statistic modeling, machine learning, graph analysis, ) Bridging advanced research and real-world applications
4 Agenda Extended Attributes Transparent Encryption
5 ZooKeeper HBase HADOOP Ecosystem Batch Processing MAPREDUCE, HIVE, PIG Search SQL Stream SPARK Machine Learning YARN (Resource Management) HDFS (Hadoop Distributed File System) DATA INTEGRATION (Sqoop, Flume )
6 HDFS Extended Attributes HDFS-2006
7 Introduction Allows user to associate additional metadata with files/directories Extended Attributes(Xattrs) can be set as Key-Value pair on any INode XAttrs will not be interpreted by File System Derived from Linux XAttrs feature, so it is functionally similar Allows user to set custom encoding format to XAttrs
8 Namespaces of XAttrs XAttrs should be prefixed with namespace HDFS support 5 XAttrs namespaces USER Access permission defined by file/directory permission bits For Sticky directories, only owner and privileged users can write TRUSTED Only visible and accessed by privileged users SYSTEM Not visible to users Only available for System kernel SECURITY Not visible to users Only available for System kernel for storing security information RAW They are like SYSTEM attributes, but they can be accessed the files/directories under./reserved/raw by the super users only.
9 Implementation details XAttrs implemented as separate INode feature in Namenode XAttrs will be persisted as part of INode information XAttrs will be validated against the Namespaces at the Namenode No compatibility issues. Upgrades automatically handled as Xattrs stored as Inode feature. XAttrs development was tracked under HDFS-2006
10 Configuration dfs.namenode.xattrs.enabled Whether the support of XAttrs is enabled in HDFS dfs.namenode.fs-limits.max-xattrs-per-inode Max number of XAttrs per Inode. Default 32 dfs.namenode.fs-limits.max-xattr-size Max combined size of name and value of XAttrs. Default bytes
11 Use Cases Storing the Encrypted Data Encryption Keys as XAttrs in HDFS Encrypted cluster environment Storing policy for Heterogeneous Storage Release HDFS-2006 branch merged to Trunk and Branch-2 Feature released in hadoop-2.5.0
12 How to use? Java API Command line
13 Transparent Encryption in Hadoop (HADOOP & HDFS-6134)
14 Outlines Transparent to upper layer applications and transparent access to encrypted files by all HDFS clients. High performance, it s not bottleneck. Encryption is independent of the file type, data format. Scalable key management. End-to-end encryption: data can only be encrypted and decrypted by the client. This satisfies two typical requirements for encryption: at-rest encryption and in-transit encryption. Security: HDFS never handles unencrypted data or data encryption keys.
15 Write file 5. Encrypt data using DEK DFS Client 4. Decrypt EDEK and get DEK KMS Backing keystore Fill EDEK cache in background DN DN DN NN NN 2. EDEK from cache and persist to File metadata. HDFS
16 Read file 6. Decrypt data using DEK DFS Client 4. Decrypt EDEK and get DEK KMS Backing keystore DN DN DN NN NN 2. Read EDEK from File metadata. HDFS 16
17 Implementation details Pread support. Original file and Cipher file have the same length and 1:1 corresponding by using AES-CTR Use AES-NI support on Intel platform to improve encryption performance, 20x speedup. We define encryption zone and files are transparently encrypted/decrypted in the zone. We use two layer keys: encryption zone key (EZK), and data encryption key (DEK) which is encrypted by EZK. Each file has a different DEK. 17
18 Encryption/Decryption for HDFS Blocks 18
19 User Ops Create Key hadoop key create <keyname> [-cipher <cipher>] [-size <size>] [-description <description>] [-attr <attribute=value>] [-provider <provider>] Roll Key hadoop key roll <keyname> [-provider <provider>] Delete Key hadoop key delete <keyname> [-provider <provider>] List Keys hadoop key list [-provider <provider>] [-metadata] 19
20 Admin Ops Create Encryption Zone hdfs crypto -createzone -keyname <keyname> -path <path> List Encryption Zones hdfs crypto -listzones 20
21 Usage Example As a normal user, create a new encryption key: $ hadoop key create mykey As the super user, create a new empty directory and make it an encryption zone: $ sudo -u hdfs hadoop fs -mkdir /zone $ sudo -u hdfs hdfs crypto -createzone -keyname mykey -path /zone Change its ownership to the normal user: $ sudo -u hdfs hadoop fs -chown myuser:myuser /zone As the normal user, put a file in, read it out: $ hadoop fs -put helloworld /zone $ hadoop fs -cat /zone/helloworld 21
22 Release Fs-encryption branch merged to trunk and branch-2 Feature released in hadoop
23 Performance AES-NI enabled TestDFSIO Benchmark
24 Call for Collaborations Close collaborations with local ecosystems Intel Big Data engineering teams, industry partners and academic research Building next generations of Big Data Technologies Real-time, in-memory, complex analytics, etc. Bridging advanced research and real-world applications Highly impactful through open source, university research (e.g., UC Berkeley) and industry adoptions (e.g., Alibaba, Cloudera, etc.) 24
25 Q & A Thanks!
26 Notices and Disclaimers: Intel, the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information about performance and benchmark results, visit Optimization Notice Intel's compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Intel technologies may require enabled hardware, specific software, or services activation. Check with your system manufacturer or retailer. No computer system can be absolutely secure. Intel does not assume any liability for lost or stolen data or systems or any damages resulting from such losses. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. The products described may contain design defects or errors known as errata which may cause the product to deviate from publish. 26
Next-Gen Big Data Analytics using the Spark stack
Next-Gen Big Data Analytics using the Spark stack Jason Dai Chief Architect of Big Data Technologies Software and Services Group, Intel Agenda Overview Apache Spark stack Next-gen big data analytics Our
More informationFast, Low-Overhead Encryption for Apache Hadoop*
Fast, Low-Overhead Encryption for Apache Hadoop* Solution Brief Intel Xeon Processors Intel Advanced Encryption Standard New Instructions (Intel AES-NI) The Intel Distribution for Apache Hadoop* software
More informationHadoop Applications on High Performance Computing. Devaraj Kavali devaraj@apache.org
Hadoop Applications on High Performance Computing Devaraj Kavali devaraj@apache.org About Me Apache Hadoop Committer Yarn/MapReduce Contributor Senior Software Engineer @Intel Corporation 2 Agenda Objectives
More informationIntel Media SDK Library Distribution and Dispatching Process
Intel Media SDK Library Distribution and Dispatching Process Overview Dispatching Procedure Software Libraries Platform-Specific Libraries Legal Information Overview This document describes the Intel Media
More informationEncryption and Anonymization in Hadoop
Encryption and Anonymization in Hadoop Current and Future needs Sept-28-2015 Page 1 ApacheCon, Budapest Agenda Need for data protection Encryption and Anonymization Current State of Encryption in Hadoop
More informationBig Data for Big Science. Bernard Doering Business Development, EMEA Big Data Software
Big Data for Big Science Bernard Doering Business Development, EMEA Big Data Software Internet of Things 40 Zettabytes of data will be generated WW in 2020 1 SMART CLIENTS INTELLIGENT CLOUD Richer user
More informationHDFS 2015: Past, Present, and Future
Apache: Big Data Europe 2015 HDFS 2015: Past, Present, and Future 9/30/2015 NTT DATA Corporation Akira Ajisaka Copyright 2015 NTT DATA Corporation Self introduction Akira Ajisaka (NTT DATA) Apache Hadoop
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationLecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015
Lecture 2 (08/31, 09/02, 09/09): Hadoop Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015 K. Zhang BUDT 758 What we ll cover Overview Architecture o Hadoop
More informationIntroduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.
Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in
More informationAccelerating Enterprise Big Data Success. Tim Stevens, VP of Business and Corporate Development Cloudera
Accelerating Enterprise Big Data Success Tim Stevens, VP of Business and Corporate Development Cloudera 1 Big Opportunity: Extract value from data Revenue Growth x = 50 Billion 35 ZB Cost Savings Margin
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationIntel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study
Intel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study The adoption of cloud computing creates many challenges and opportunities in big data management and storage. To
More informationSujee Maniyam, ElephantScale
Hadoop PRESENTATION 2 : New TITLE and GOES Noteworthy HERE Sujee Maniyam, ElephantScale SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member
More informationDeploying Hadoop with Manager
Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution
More informationBig Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe 20-22 May, 2013
Dubrovnik, Croatia, South East Europe 20-22 May, 2013 Big Data Value, use cases and architectures Petar Torre Lead Architect Service Provider Group 2011 2013 Cisco and/or its affiliates. All rights reserved.
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationIntel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache
More informationInfomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
More informationCloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
More informationApache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com
Apache Sentry Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture
More informationIntel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual
Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual Overview Metrics Monitor is part of Intel Media Server Studio 2015 for Linux Server. Metrics Monitor is a user space shared library
More informationCloud Computing. Big Data. High Performance Computing
Cloud Computing Big Data High Performance Computing Intel Corporation copy right 2013 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
More informationHetero Streams Library 1.0
Release Notes for release of Copyright 2013-2016 Intel Corporation All Rights Reserved US Revision: 1.0 World Wide Web: http://www.intel.com Legal Disclaimer Legal Disclaimer You may not use or facilitate
More informationHadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop, Why? Need to process huge datasets on large clusters of computers
More informationChase Wu New Jersey Ins0tute of Technology
CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at
More informationLustre* HSM in the Cloud. Robert Read, Intel HPDD
Lustre* HSM in the Cloud Robert Read, Intel HPDD Overview Lustre in the Cloud HSM for Cloud Importing from Amazon Simple Storage Service* (S3) General archive with S3 Crazy snapshot idea 2 Lustre in the
More informationAnkush Cluster Manager - Hadoop2 Technology User Guide
Ankush Cluster Manager - Hadoop2 Technology User Guide Ankush User Manual 1.5 Ankush User s Guide for Hadoop2, Version 1.5 This manual, and the accompanying software and other documentation, is protected
More informationIntel Cyber Security Briefing: Trends, Solutions, and Opportunities. Matthew Rosenquist, Cyber Security Strategist, Intel Corp
Intel Cyber Security Briefing: Trends, Solutions, and Opportunities Matthew Rosenquist, Cyber Security Strategist, Intel Corp Legal Notices and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION
More informationReal-Time Analytical Processing (RTAP) Using the Spark Stack. Jason Dai jason.dai@intel.com Intel Software and Services Group
Real-Time Analytical Processing (RTAP) Using the Spark Stack Jason Dai jason.dai@intel.com Intel Software and Services Group Project Overview Research & open source projects initiated by AMPLab in UC Berkeley
More informationReal-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software
Real-Time Big Data Analytics with the Intel Distribution for Apache Hadoop software Executive Summary is already helping businesses extract value out of Big Data by enabling real-time analysis of diverse
More informationHadoop: Embracing future hardware
Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop
More informationHadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
More informationCray XC30 Hadoop Platform Jonathan (Bill) Sparks Howard Pritchard Martha Dumler
Cray XC30 Hadoop Platform Jonathan (Bill) Sparks Howard Pritchard Martha Dumler Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations.
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationChukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
More informationHDFS. Hadoop Distributed File System
HDFS Kevin Swingler Hadoop Distributed File System File system designed to store VERY large files Streaming data access Running across clusters of commodity hardware Resilient to node failure 1 Large files
More informationHadoop Scalability at Facebook. Dmytro Molkov (dms@fb.com) YaC, Moscow, September 19, 2011
Hadoop Scalability at Facebook Dmytro Molkov (dms@fb.com) YaC, Moscow, September 19, 2011 How Facebook uses Hadoop Hadoop Scalability Hadoop High Availability HDFS Raid How Facebook uses Hadoop Usages
More informationReal-Time Big Data Analytics for the Enterprise
White Paper Intel Distribution for Apache Hadoop* Big Data Real-Time Big Data Analytics for the Enterprise SAP HANA* and the Intel Distribution for Apache Hadoop* Software Executive Summary Companies are
More informationTRAINING PROGRAM ON BIGDATA/HADOOP
Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,
More informationCloud-based Analytics and Map Reduce
1 Cloud-based Analytics and Map Reduce Datasets Many technologies converging around Big Data theme Cloud Computing, NoSQL, Graph Analytics Biology is becoming increasingly data intensive Sequencing, imaging,
More informationDell* In-Memory Appliance for Cloudera* Enterprise
Built with Intel Dell* In-Memory Appliance for Cloudera* Enterprise Find out what faster big data analytics can do for your business The need for speed in all things related to big data is an enormous
More informationPrepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
More informationScaling up to Production
1 Scaling up to Production Overview Productionize then Scale Building Production Systems Scaling Production Systems Use Case: Scaling a Production Galaxy Instance Infrastructure Advice 2 PRODUCTIONIZE
More informationComprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
More informationIntel Service Assurance Administrator. Product Overview
Intel Service Assurance Administrator Product Overview Running Enterprise Workloads in the Cloud Enterprise IT wants to Start a private cloud initiative to service internal enterprise customers Find an
More informationQsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
More informationITG Software Engineering
Introduction to Apache Hadoop Course ID: Page 1 Last Updated 12/15/2014 Introduction to Apache Hadoop Course Overview: This 5 day course introduces the student to the Hadoop architecture, file system,
More informationHortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015
Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform
More informationThe ROI from Optimizing Software Performance with Intel Parallel Studio XE
The ROI from Optimizing Software Performance with Intel Parallel Studio XE Intel Parallel Studio XE delivers ROI solutions to development organizations. This comprehensive tool offering for the entire
More informationCloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box
Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box By Kavya Mugadur W1014808 1 Table of contents 1.What is CDH? 2. Hadoop Basics 3. Ways to install CDH 4. Installation and
More informationSOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
More informationProgramming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
More informationIntel Platform and Big Data: Making big data work for you.
Intel Platform and Big Data: Making big data work for you. 1 From data comes insight New technologies are enabling enterprises to transform opportunity into reality by turning big data into actionable
More informationHadoop* on Lustre* Liu Ying (emoly.liu@intel.com) High Performance Data Division, Intel Corporation
Hadoop* on Lustre* Liu Ying (emoly.liu@intel.com) High Performance Data Division, Intel Corporation Agenda Overview HAM and HAL Hadoop* Ecosystem with Lustre * Benchmark results Conclusion and future work
More informationDeveloping High-Performance, Scalable, cost effective storage solutions with Intel Cloud Edition Lustre* and Amazon Web Services
Reference Architecture Developing Storage Solutions with Intel Cloud Edition for Lustre* and Amazon Web Services Developing High-Performance, Scalable, cost effective storage solutions with Intel Cloud
More informationHDFS Snapshots and Beyond
HDFS Snapshots and Beyond Tsz-Wo (Nicholas) Sze Jing Zhao October 29, 2013 Page 1 About Us Tsz-Wo Nicholas Sze, Ph.D. Software Engineer at Hortonworks PMC Member at Apache Hadoop One of the most active
More informationCloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
More informationComplete Java Classes Hadoop Syllabus Contact No: 8888022204
1) Introduction to BigData & Hadoop What is Big Data? Why all industries are talking about Big Data? What are the issues in Big Data? Storage What are the challenges for storing big data? Processing What
More informationBIG DATA HADOOP TRAINING
BIG DATA HADOOP TRAINING DURATION 40hrs AVAILABLE BATCHES WEEKDAYS (7.00AM TO 8.30AM) & WEEKENDS (10AM TO 1PM) MODE OF TRAINING AVAILABLE ONLINE INSTRUCTOR LED CLASSROOM TRAINING (MARATHAHALLI, BANGALORE)
More informationHur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER
Hur hanterar vi utmaningar inom området - Big Data Jan Östling Enterprise Technologies Intel Corporation, NER Legal Disclaimers All products, computer systems, dates, and figures specified are preliminary
More informationThe Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang
The Big Data Ecosystem at LinkedIn Presented by Zhongfang Zhuang Based on the paper The Big Data Ecosystem at LinkedIn, written by Roshan Sumbaly, Jay Kreps, and Sam Shah. The Ecosystems Hadoop Ecosystem
More informationA Brief Outline on Bigdata Hadoop
A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is
More informationConstructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
More informationOverview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
More informationCertified Big Data and Apache Hadoop Developer VS-1221
Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationImplementation and Performance of AES-NI in CyaSSL. Embedded SSL
Implementation and Performance of AES-NI in CyaSSL Embedded SSL In 2010, Intel introduced the 32nm Intel microarchitecture code name Westmere. With this introduction, Intel announced support for a new
More informationUpcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
More informationIntel Unite. User Guide
Intel Unite User Guide Intel Unite Users Guide September 2015 Legal Disclaimers & Copyrights All information provided here is subject to change without notice. Contact your Intel representative to obtain
More information<Insert Picture Here> Big Data
Big Data Kevin Kalmbach Principal Sales Consultant, Public Sector Engineered Systems Program Agenda What is Big Data and why it is important? What is your Big
More informationHADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM
HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction
More informationAccelerating Business Intelligence with Large-Scale System Memory
Accelerating Business Intelligence with Large-Scale System Memory A Proof of Concept by Intel, Samsung, and SAP Executive Summary Real-time business intelligence (BI) plays a vital role in driving competitiveness
More informationDominik Wagenknecht Accenture
Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna
More informationMySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering
MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation
More informationBig Data Too Big To Ignore
Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction
More informationHadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
More informationHDFS Under the Hood. Sanjay Radia. Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc.
HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc. 1 Outline Overview of Hadoop, an open source project Design of HDFS On going work 2 Hadoop Hadoop provides a framework
More informationA Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
More informationCDH AND BUSINESS CONTINUITY:
WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable
More informationHadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org June 3 rd, 2008
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org June 3 rd, 2008 Who Am I? Hadoop Developer Core contributor since Hadoop s infancy Focussed
More informationSignificantly Speed up real world big data Applications using Apache Spark
Significantly Speed up real world big data Applications using Apache Spark Mingfei Shi(mingfei.shi@intel.com) Grace Huang ( jie.huang@intel.com) Intel/SSG/Big Data Technology 1 Agenda Who are we? Case
More informationLecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
More informationPage Modification Logging for Virtual Machine Monitor White Paper
Page Modification Logging for Virtual Machine Monitor White Paper This document is intended only for VMM or hypervisor software developers and not for application developers or end-customers. Readers are
More informationHadoop Job Oriented Training Agenda
1 Hadoop Job Oriented Training Agenda Kapil CK hdpguru@gmail.com Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module
More informationNIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop. September 2014. National Institute of Standards and Technology (NIST)
NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop September 2014 Dylan Yaga NIST/ITL CSD Lead Software Designer Fernando Podio NIST/ITL CSD Project Manager National Institute of Standards
More informationData Security in Hadoop
Data Security in Hadoop Eric Mizell Director, Solution Engineering Page 1 What is Data Security? Data Security for Hadoop allows you to administer a singular policy for authentication of users, authorize
More informationIntroduction to Big Data Training
Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB
More informationCURSO: ADMINISTRADOR PARA APACHE HADOOP
CURSO: ADMINISTRADOR PARA APACHE HADOOP TEST DE EJEMPLO DEL EXÁMEN DE CERTIFICACIÓN www.formacionhadoop.com 1 Question: 1 A developer has submitted a long running MapReduce job with wrong data sets. You
More informationEnabling High performance Big Data platform with RDMA
Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery
More informationCloud based Holdfast Electronic Sports Game Platform
Case Study Cloud based Holdfast Electronic Sports Game Platform Intel and Holdfast work together to upgrade Holdfast Electronic Sports Game Platform with cloud technology Background Shanghai Holdfast Online
More informationIntelligent Business Operations
White Paper Intel Xeon Processor E5 Family Data Center Efficiency Financial Services Intelligent Business Operations Best Practices in Cash Supply Chain Management Executive Summary The purpose of any
More informationAddressing Open Source Big Data, Hadoop, and MapReduce limitations
Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?
More informationFrom Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian
From Relational to Hadoop Part 1: Introduction to Hadoop Gwen Shapira, Cloudera and Danil Zburivsky, Pythian Tutorial Logistics 2 Got VM? 3 Grab a USB USB contains: Cloudera QuickStart VM Slides Exercises
More informationContents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes
Contents Pentaho Corporation Version 5.1 Copyright Page New Features in Pentaho Data Integration 5.1 PDI Version 5.1 Minor Functionality Changes Legal Notices https://help.pentaho.com/template:pentaho/controls/pdftocfooter
More informationWhat We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea
What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea Overview Riding Google App Engine Taming Hadoop Summary Riding
More informationTHE HADOOP DISTRIBUTED FILE SYSTEM
THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationCOURSE CONTENT Big Data and Hadoop Training
COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop
More information