Hadoop, Easier than Ever

Size: px
Start display at page:

Download "Hadoop, Easier than Ever"

Transcription

1 Hadoop, Easier than Ever Etu TM Appliance. The Only Choice for Hadoop Platforms Enjoy the beauty of Hadoop cluster deployment and simple management

2 Why is an Etu TM Appliance Cluster indispensable to a Hadooper? For the sake of concision... high-performance, security, reliability, and easy management. For a developer and system administrator, dealing with Hadoop is pretty well known as a challenge or more simply put a pain in the neck. Think about this. As a Hadoop-based application developer or a Hadoop system administrator, what is a reasonable amount of time for you to get a 10-node Hadoop cluster up and running? Would it be three days, one day, three hours, or maybe one hour? Etu TM Appliance guarantees as short as 10+ minutes. Imagine a product designed for running Hadoop jobs that enables you to process Big Data on the same cluster scale with a five-fold leap in performance every day compared to current systems. Would it not make more sense to have your system capable of satisfying your users by replacing the existing application system with one that offers instantaneous access to it? Etu TM Appliance is characterized as a high-performance product offering a computing capacity that is three to twelve times more powerful than those self-installed ones. Are you still trying to figure out a solution for Hadoop Name Node High Availability? In reality, the job has been done with a broader built-in HA that covers the whole cluster system: computing, storage, host, network services, security mechanisms, network interfaces, all beyond a Hadooper s imagination of HA. Etu TM Appliance sets a new benchmark for the reliability of a Hadoop system. In regards to storage and computing of Big Data, how does one get the full benefit of an emphasized multi-tenancy in cloud computing? Imagine achieving the goal of authenticating and authorizing different users and applications to access segmented data. Etu TM Appliance incorporates a highly integrated Kerberos and LDAP as an ingenious information artisan to guarantee multi-tenant security. As a business grows a Hadoop cluster should ideally be able to scale-out accordingly. Imagine you have a 50-node cluster that needs a system adjustment overhaul, parameter optimization, and software maintenance. How many hosts do you have to connect to and how many command lines are needed to accomplish this task? Etu TM Appliance boosts first call resolution with one GUI and one URL. 2 Etu TM Appliance

3 Key Features 1. Highly Integrated Appliance A Hadoop-based Big Data processing platform that combines distributed storage with parallel computing, which integrates software and hardware. Etu TM Appliance is a design of simplicity and optimization that is an agglomeration of high-performance, effective management, security, reliability, scalability, and openness. 2. High-Performance Operating System Etu TM OS is specifically tailored for Hadoop Big Data clusters. It is the key for optimization for operational effectiveness as well as simplified deployment/management. 3. One-Click Rapid Deployment Starting from bare metal to up and running with an installed comprehensive software stack and optimized configuration, an Etu TM Appliance cluster with more than 100 s can be automatically deployed and configured in 10+ minutes. 4. Full System High Availability Using a full system rather than just the Hadoop Name Node High Availability comprehensively prevents the risk for a single point failure. 6. Ad Hoc Network Design Etu TM Appliance is designed to combine the multi-interface bandwidth and adapter fault tolerance allowing considerable high-performance network transmission and high availability. 7. Human-Based Centralized Etu TM Appliance offers a web-based graphical user interface that designates central management and settings across nodes in cluster. 8. Low-Risk Adoption The minimal Etu TM Appliance cluster contains 1 Master Node and 2 s, which offers the flexibility in purchasing less than one full Hadoop rack or even half. 9. Linear Scalability Depending on the unit(s) of s, an Etu TM Appliance cluster can be scaled-out up to tens of hundreds folds. 10. Open and Value-Added Platform The open Big Data processing platform enables a variety of applications to be deployed on it. 5. Multi-Tenant Security Etu TM Appliance is built in with a multi-tenant identity and access control mechanism providing a level of high data security comparable to a public cloud. 3 Etu TM Appliance

4 The differences between an Etu TM Appliance and a general Hadoop platform Etu TM Appliance General Hadoop Platform Operating System Etu TM OS is specialized for running Hadoop jobs. General Linux distributions. Computing Performance Offers an increase in computing performance 3 to 12 times higher than a general Hadoop platform through a full system optimization. General system with a non calibrated computing performance. Data Acquisition An Etu TM Log Collector carries the capability of receiving UDP packets higher than 60,000 EPS per node, along with an accuracy rate of more than 99.99%. General data collection performance without optimization. Cluster Deployment The ability to initiate one click deployment, meaning it has the ability to get deployment and setup finished for more than 100 s from bare metal to up and running within 10+ minutes while giving attention to a HA system architecture. Before the installation of a Hadoop Ecosystem software stack it requires the manual installation of an operating system and cluster services for each node. After this then it can be configured to run. Distinctive Network Design Combines the multi-interface bandwidth that allows for HA. General network lacks in design for performance and fault tolerance. High Availability Architecture The built-in full system HA is highly integrated covering from computing, storage, host, network services, security mechanisms, network interfaces, and etc. to avoid a single point of failure in all aspects. Generally, a Hadoop Name Node HA mechanism is provided and users are required to carry out many detailed steps, which might lead to a high failure rate. Multi-tenant Security The built-in Kerberos and LDAP fully meet the requirements of an enterprise-level multi-tenant platform authentication and authorization. Users are required to implement Kerberos and LDAP integration, which ensues a lengthy and complicated process. Other than that, threshold of a system integration is very high. Cluster A web-based graphical user interface that designates central management and settings across nodes in cluster. A root access is needed to enter each node to manage the host system. 4 Etu TM Appliance

5 How Etu TM Appliance Works Etu has taken into consideration an enterprise s investment augmentation Minimum Package: 1+2 model and acknowledge the concerns of efficiency and effectiveness. Therefore, an Etu Appliance allows your enterprise to immediately benefit from the effects whether the implementation ranges from small-scale to large-scale. Again remember, because of the flexibility that Etu TM Appliance allows, transitioning into Switch large-scale is efficiently done when it comes time for future expansions. An Master Node Etu TM Appliance cluster simply deploys three nodes (one Master Node with two Scale-out up to thousands of nodes s) to scale out and allows the option to grow by simply adding Figure 1. Etu TM Appliance minimum package and additional s to consistently scale-out architecture (Note: Switch is not included in Etu TM offering) handle the increasing workload created by Hadoop Big Data. During the scale-out, the operation requires no downtime. The integration of an Etu TM Appliance is painless and virtually seamless with an advantage of being fluidly flexible to meet the expansion needs for all enterprises. Begin by investing in a minimum package and expand accordingly to your business growth needs. A minimum package for an Etu TM Appliance consists of a cluster of three nodes (1 Master Node + 2 Worker Nodes). As your Hadoop Big Data processing workloads grow we simply increase s through scale-out expansions without interrupting running services. Benefits Enterprises that implement a Hadoop Big Data platform with an Etu TM Appliance can easily gain the following benefits: 1. Staff can do their Hadoop-related jobs respectively and make progress all together, which improves professional performance. When it comes to storage and computing for multi-structured Big Data, neither developers nor system administrators are free from taxing Hadoop Ecosystem problems. These challenges range from software stacks that are unfavorable to installation, to difficulty in managing system and applications, to not knowing the right way to reach optimization, and etc. Etu TM Appliance saves time for developers and system administrators, which is literally improving the performance of your professionals through efficiency. 2. More confidence in Mission Critical Big Data computing. Etu TM Appliance recognizes the Full System HA, enabling enterprises to feel more at ease and gain a peace of mind in knowing that the relatively more crucial Big Data computing tasks can be shifted to a Hadoop platform that will continuously extract value from multi-structured Big Data. 3. A Big Data PaaS is ready for a full run. Etu TM Appliance s centralized Web-based Console functions as a PaaS user portal for enterprises and internet service providers that deserve the best application platform solution to erect both public and private clouds. Besides simplifying Hadoop cluster management, companies can achieve enterprise-level multi-tenant authentication, authorization, and meet major requirements in self-service, security, scalability, and stability. 4. Ability to access and compute accumulated data repeatedly, which can offer unlimited business value. This platform is characterized of having a high level of openness, allowing it to have multiple applications run on it. As time goes by, the data that has been saved in an Etu TM Appliance cluster will increase the value of your business. 5. Performance doubles, TCO drops To focus on performance optimization we start from the distinctive operating system and then continue forward throughout the whole system. Etu TM Appliance compliments Hadoop and takes advantage of the features such as parallel computing architecture and linear scalability. Compared to general Hadoop platforms, the adoption of an Etu TM Appliance cluster reduces the number of nodes undertaking computing tasks, which significantly cuts down on an enterprise s CAPEX and management costs. 5 Etu TM Appliance

6 Etu TM Appliance Functionalities Etu TM Module Apache Hadoop Ecosystem Module Etu TM Console Application Table File Data Source Cluster Data Source Data Processing Module Etu TM Clusterware Sqoop Pig HiveQL Mahout SNMP FTP MapReduce Deployment Syslog Account Etu TM DataFlow Data Store Module Security Hive Meta Store HBase Configuration HDFS High Availability Etu TM OS Figure 2. Etu TM Appliance Software Architecture Software Module List Etu TM OS Etu TM Clusterware Auto-deployment Account Security Auto-configuration Etu TM Console Application Cluster Table File Data Source High Availability SNMP Big Data Store Etu TM Data Source FTP Syslog HDFS Hive Meta Store HBase Etu TM DataFlow Sqoop Big Data Processing MapReduce Pig HiveQL Mahout Data Formats Semi-structured data: such as log, XML, CSV, Meta Data, and other text files with pre-defined fields Unstructured data: such as full text, web page, content, and various binary files* * Specific processing module might be required to respectively process file formats 6 Etu TM Appliance

7 Etu TM Appliance 2 Edition Comparison Chart Cluster One-click Deployment Hadoop Ecosystem Deployment & Readiness Checks Operating System Deployment 1 Bare Metal Deployment for Built-in NTP/DNS/DHCP Services System Level Cluster Configuration Service Multiple NIC Bonding for Bandwidth Aggregation 2 Multiple NIC Bonding for Failover 2 High Availability 3 Built-in Storage HA with Distributed File System Full System Failover Isolation Mechanism 4 Hadoop Name Node HA Hadoop Ecosystem HA 5 Hive Meta Store HA DNS/DHCP Service HA Deployment/Configuration Services HA LDAP/Kerberos Services HA Configuration Role Instance Configuration and HDFS / MapReduce / HBase / Hive Configuration Etu TM Clusterware Configuration Operating System Kernel and System Configuration Automated Time Sync Configuration Automated DNS/DHCP/Host Configuration Automated SSH Key and Host CA Automated Network Configuration Automated Monitoring Configuration Cluster Monitoring Proactive Health Checks Status and Healthy Summary Performance Monitoring Host Monitoring (CPU/RAM/HDD/NIC) JVM Metrics Monitoring SNMP Integration Security Kerberos Principals and Keytabs Deployment Built-in Kerberos Built-in LDAP Automated Principal Creation Automated Keytabs Creation Automated User Environment Configuration Console Cluster Cluster Status Node Service User License Cluster Monitoring High Availability S S: Standard Edition E: Enterprise Edition E S E Application Application Status Application Deployment Application Execution Application Scheduling Data Source DataFlow FTP Service Syslog Service HDFS HDFS Cluster Status HDFS File Browser HBase HBase Cluster Status HBase Table 1. Etu TM OS is applied as the operating system 2. Extra NIC is required for implementing the Multiple NIC Bonding for Network Failover or Multiple NIC Bonding for Bandwidth Aggregation. 3. An extra Master Node is required to enable the Etu TM HA option. It is available in the Standard Edition as well as the Enterprise Edition. 4. To acquire the highest-level of Full System Failover Isolation Mechanism it may require user to purchase additional network interface cards for all the nodes and switches. 5. Hadoop Ecosystem HA consists of HDFS / MapReduce (Job Tracker) / HBase / Hive Thrift. Etu TM Appliance Specifications By Role Model Specifications Role CPU Memory Hard Drive Network Interface Card Power User Data Etu 1000M Series Master Node 64-Bit 12 Core 48GB K RPM SAS 300GB x 2 (RAID 1) 1 Gbit Ethernet Dual Port x 1 or 2 Dual Power N/A Etu 1000W Series 64-Bit 12 Core 48GB K RPM SATA 2TB x 4 or 3TB x 4 1 Gbit Ethernet Dual Port x 1 or 2 Single Power 4~40TB or 6~60TB (Depends on compression ratio) Etu 2000W Series 64-Bit 12 Core 48GB K RPM SATA 2TB x 8 or 3TB x 8 1 Gbit Ethernet Dual Port x 1 or 2 Single Power 8~80TB or 12~120TB (Depends on compression ratio) 1. Etu TM Appliance Specification Sheet is provided by request. 2. One Etu TM Appliance cluster at least includes 1 Master Node and 2 s. 3. One Etu TM Appliance cluster with HA at least includes 2 Master Nodes and 2 s. 4. User Data can be stored with more than two replicas if necessarily and store them on different s. 7 Etu TM Appliance

8 About Etu Etu is a Big Data pioneer in providing Big Data End-to-End solutions for Enterprises in Asia. Etu is dedicated to develop Hadoop-based Big Data platform technology in particular processing analytics. Cooperating with ISV/SI partners who have strong capabilities in application development and integration from various vertical markets, Etu is committed to help customers to discover, unlock, and connect business values hidden in semi-structured and unstructured data. Our team members have large-scale Big Data processing experience in online services and several of them have earned high level Cloudera Certified Developer/Administrator for Apache Hadoop certifications. Contact information For more information about Etu Appliance, please visit Etu at our official website: Trademark Disclaimer Etu, Etu Appliance and Etu Recommender are trademarks of SYSTEX GROUP. All other brands and trademarks referenced herein are acknowledged to be trademarks or registered trademarks of their respective holders. EA2_EN_2 Eco-friendly Printing with Soy Ink 〡 FSC Certified Paper

Accelerating and Simplifying Apache

Accelerating and Simplifying Apache Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information

HadoopTM Analytics DDN

HadoopTM Analytics DDN DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

XpoLog Competitive Comparison Sheet

XpoLog Competitive Comparison Sheet XpoLog Competitive Comparison Sheet New frontier in big log data analysis and application intelligence Technical white paper May 2015 XpoLog, a data analysis and management platform for applications' IT

More information

Virtualizing Apache Hadoop. June, 2012

Virtualizing Apache Hadoop. June, 2012 June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING

More information

Enabling High performance Big Data platform with RDMA

Enabling High performance Big Data platform with RDMA Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery

More information

Platfora Big Data Analytics

Platfora Big Data Analytics Platfora Big Data Analytics ISV Partner Solution Case Study and Cisco Unified Computing System Platfora, the leading enterprise big data analytics platform built natively on Hadoop and Spark, delivers

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

alcatel-lucent vitalqip Appliance manager End-to-end, feature-rich, appliance-based DNS/DHCP and IP address management

alcatel-lucent vitalqip Appliance manager End-to-end, feature-rich, appliance-based DNS/DHCP and IP address management alcatel-lucent vitalqip Appliance manager End-to-end, feature-rich, appliance-based DNS/DHCP and IP address management streamline management and cut administrative costs with the alcatel-lucent VitalQIP

More information

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

More information

Deploying Hadoop with Manager

Deploying Hadoop with Manager Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Pivot3 Desktop Virtualization Appliances. vstac VDI Technology Overview

Pivot3 Desktop Virtualization Appliances. vstac VDI Technology Overview Pivot3 Desktop Virtualization Appliances vstac VDI Technology Overview February 2012 Pivot3 Desktop Virtualization Technology Overview Table of Contents Executive Summary... 3 The Pivot3 VDI Appliance...

More information

Hadoop as a Service. VMware vcloud Automation Center & Big Data Extension

Hadoop as a Service. VMware vcloud Automation Center & Big Data Extension Hadoop as a Service VMware vcloud Automation Center & Big Data Extension Table of Contents 1. Introduction... 2 1.1 How it works... 2 2. System Pre-requisites... 2 3. Set up... 2 3.1 Request the Service

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage Cisco for SAP HANA Scale-Out Solution Solution Brief December 2014 With Intelligent Intel Xeon Processors Highlights Scale SAP HANA on Demand Scale-out capabilities, combined with high-performance NetApp

More information

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads Solution Overview Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads What You Will Learn MapR Hadoop clusters on Cisco Unified Computing System (Cisco UCS

More information

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics

More information

Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, 2015. 20014 IBM Corporation

Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, 2015. 20014 IBM Corporation Boas Betzler Cloud IBM Distinguished Computing Engineer for a Smarter Planet Globally Distributed IaaS Platform Examples AWS and SoftLayer November 9, 2015 20014 IBM Corporation Building Data Centers The

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

Peers Techno log ies Pv t. L td. HADOOP

Peers Techno log ies Pv t. L td. HADOOP Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and

More information

Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage

Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage Technical white paper Table of contents Executive summary... 2 Introduction... 2 Test methodology... 3

More information

Dell Desktop Virtualization Solutions Simplified. All-in-one VDI appliance creates a new level of simplicity for desktop virtualization

Dell Desktop Virtualization Solutions Simplified. All-in-one VDI appliance creates a new level of simplicity for desktop virtualization Dell Desktop Virtualization Solutions Simplified All-in-one VDI appliance creates a new level of simplicity for desktop virtualization Executive summary Desktop virtualization is a proven method for delivering

More information

Open Source for Cloud Infrastructure

Open Source for Cloud Infrastructure Open Source for Cloud Infrastructure June 29, 2012 Jackson He General Manager, Intel APAC R&D Ltd. Cloud is Here and Expanding More users, more devices, more data & traffic, expanding usages >3B 15B Connected

More information

Lenovo ThinkServer and Cloudera Solution for Apache Hadoop

Lenovo ThinkServer and Cloudera Solution for Apache Hadoop Lenovo ThinkServer and Cloudera Solution for Apache Hadoop For next-generation Lenovo ThinkServer systems Lenovo Enterprise Product Group Version 1.0 December 2014 2014 Lenovo. All rights reserved. LENOVO

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Please give me your feedback

Please give me your feedback Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &

More information

Big data blue print for cloud architecture

Big data blue print for cloud architecture Big data blue print for cloud architecture -COGNIZANT Image Area Prabhu Inbarajan Srinivasan Thiruvengadathan Muralicharan Gurumoorthy Praveen Codur 2012, Cognizant Next 30 minutes Big Data / Cloud challenges

More information

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm ( Apache Hadoop 1.0 High Availability Solution on VMware vsphere TM Reference Architecture TECHNICAL WHITE PAPER v 1.0 June 2012 Table of Contents Executive Summary... 3 Introduction... 3 Terminology...

More information

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform: Creating an Integrated, Optimized, and Secure Enterprise Data Platform: IBM PureData System for Transactions with SafeNet s ProtectDB and DataSecure Table of contents 1. Data, Data, Everywhere... 3 2.

More information

ORACLE BIG DATA APPLIANCE X4-2

ORACLE BIG DATA APPLIANCE X4-2 ORACLE BIG DATA APPLIANCE X4-2 BIG DATA FOR THE ENTERPRISE OPEN, SECURE AND INTEGRATED KEY FEATURES Massively scalable, open infrastructure to store and manage big data Industry-leading security, performance

More information

Dell Reference Configuration for Hortonworks Data Platform

Dell Reference Configuration for Hortonworks Data Platform Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution

More information

Big Data Management and Security

Big Data Management and Security Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

CA Big Data Management: It s here, but what can it do for your business?

CA Big Data Management: It s here, but what can it do for your business? CA Big Data Management: It s here, but what can it do for your business? Mike Harer CA Technologies August 7, 2014 Session Number: 16256 Insert Custom Session QR if Desired. Test link: www.share.org Big

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Dell s SAP HANA Appliance

Dell s SAP HANA Appliance Dell s SAP HANA Appliance SAP HANA is the next generation of SAP in-memory computing technology. Dell and SAP have partnered to deliver an SAP HANA appliance that provides multipurpose, data source-agnostic,

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

EMC Virtual Infrastructure for Microsoft Applications Data Center Solution

EMC Virtual Infrastructure for Microsoft Applications Data Center Solution EMC Virtual Infrastructure for Microsoft Applications Data Center Solution Enabled by EMC Symmetrix V-Max and Reference Architecture EMC Global Solutions Copyright and Trademark Information Copyright 2009

More information

Get More Scalability and Flexibility for Big Data

Get More Scalability and Flexibility for Big Data Solution Overview LexisNexis High-Performance Computing Cluster Systems Platform Get More Scalability and Flexibility for What You Will Learn Modern enterprises are challenged with the need to store and

More information

QRadar Security Intelligence Platform Appliances

QRadar Security Intelligence Platform Appliances DATASHEET Total Security Intelligence An IBM Company QRadar Security Intelligence Platform Appliances QRadar Security Intelligence Platform appliances combine typically disparate network and security management

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

HP SiteScope software

HP SiteScope software HP SiteScope software When you can see availability and performance, you can improve it. Improve the availability and performance of your IT environment HP SiteScope software helps you to agentlessly monitor

More information

TRANSFORM YOUR BUSINESS: BIG DATA AND ANALYTICS WITH VCE AND EMC

TRANSFORM YOUR BUSINESS: BIG DATA AND ANALYTICS WITH VCE AND EMC TRANSFORM YOUR BUSINESS: BIG DATA AND ANALYTICS WITH VCE AND EMC Vision Big data and analytic initiatives within enterprises have been rapidly maturing from experimental efforts to production-ready deployments.

More information

PISTON CLOUDOS WITH OPENSTACK: TURN-KEY WEB-SCALE INFRASTRUCTURE SOFTWARE. Easy. CloudOS Compendium TECHNICAL WHITEPAPER

PISTON CLOUDOS WITH OPENSTACK: TURN-KEY WEB-SCALE INFRASTRUCTURE SOFTWARE. Easy. CloudOS Compendium TECHNICAL WHITEPAPER PISTON CLOUDOS WITH OPENSTACK: TURN-KEY WEB-SCALE INFRASTRUCTURE SOFTWARE applications use Piston CloudOS with OpenStack to automate their IT operations and bring new products to market faster. Piston

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Introduction to VMware EVO: RAIL. White Paper

Introduction to VMware EVO: RAIL. White Paper Introduction to VMware EVO: RAIL White Paper Table of Contents Introducing VMware EVO: RAIL.... 3 Hardware.................................................................... 4 Appliance...............................................................

More information

WHITE PAPER September 2012. CA Nimsoft Monitor for Servers

WHITE PAPER September 2012. CA Nimsoft Monitor for Servers WHITE PAPER September 2012 CA Nimsoft Monitor for Servers Table of Contents CA Nimsoft Monitor for servers 3 solution overview CA Nimsoft Monitor service-centric 5 server monitoring CA Nimsoft Monitor

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Testing 3Vs (Volume, Variety and Velocity) of Big Data Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Fast, Low-Overhead Encryption for Apache Hadoop*

Fast, Low-Overhead Encryption for Apache Hadoop* Fast, Low-Overhead Encryption for Apache Hadoop* Solution Brief Intel Xeon Processors Intel Advanced Encryption Standard New Instructions (Intel AES-NI) The Intel Distribution for Apache Hadoop* software

More information

Using Red Hat Network Satellite Server to Manage Dell PowerEdge Servers

Using Red Hat Network Satellite Server to Manage Dell PowerEdge Servers Using Red Hat Network Satellite Server to Manage Dell PowerEdge Servers Enterprise Product Group (EPG) Dell White Paper By Todd Muirhead and Peter Lillian July 2004 Contents Executive Summary... 3 Introduction...

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763 International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing

More information

Interactive data analytics drive insights

Interactive data analytics drive insights Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst White Paper EMC s Enterprise Hadoop Solution Isilon Scale-out NAS and Greenplum HD By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst February 2012 This ESG White Paper was commissioned

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning

How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning Evans Ye Apache Big Data 2015 Budapest Who am I Apache Bigtop PMC member Software Engineer at Trend Micro Develop Big

More information

Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters

Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters Table of Contents Introduction... Hardware requirements... Recommended Hadoop cluster

More information

CDH AND BUSINESS CONTINUITY:

CDH AND BUSINESS CONTINUITY: WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable

More information

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction

More information

Datasheet FUJITSU Integrated System PRIMEFLEX for Hadoop

Datasheet FUJITSU Integrated System PRIMEFLEX for Hadoop Datasheet FUJITSU Integrated System PRIMEFLEX for Hadoop is a powerful and scalable platform analyzing big data volumes at high velocity FUJITSU Integrated System PRIMEFLEX Your fast track to datacenter

More information

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload

More information

Cisco Application Networking Manager Version 2.0

Cisco Application Networking Manager Version 2.0 Cisco Application Networking Manager Version 2.0 Cisco Application Networking Manager (ANM) software enables centralized configuration, operations, and monitoring of Cisco data center networking equipment

More information

IBM Software InfoSphere Guardium. Planning a data security and auditing deployment for Hadoop

IBM Software InfoSphere Guardium. Planning a data security and auditing deployment for Hadoop Planning a data security and auditing deployment for Hadoop 2 1 2 3 4 5 6 Introduction Architecture Plan Implement Operationalize Conclusion Key requirements for detecting data breaches and addressing

More information

IBM PureData System for Transactions. Technical Deep Dive. Jonathan Rossi, PureSystems Specialist rossij@us.ibm.com

IBM PureData System for Transactions. Technical Deep Dive. Jonathan Rossi, PureSystems Specialist rossij@us.ibm.com IBM expert integrated system Technical Deep Dive Maria N. Schwenger, PureSystems Specialist schwenge@us.ibm.com Jonathan Rossi, PureSystems Specialist rossij@us.ibm.com IBM PureData System for Transactions

More information

Dell In-Memory Appliance for Cloudera Enterprise

Dell In-Memory Appliance for Cloudera Enterprise Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/

More information

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM #OpenPOWERSummit Join the conversation at #OpenPOWERSummit 1 Scale-out and Cloud

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop

More information

XpoLog Center Suite Data Sheet

XpoLog Center Suite Data Sheet XpoLog Center Suite Data Sheet General XpoLog is a data analysis and management platform for Applications IT data. Business applications rely on a dynamic heterogeneous applications infrastructure, such

More information

Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions

Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions 64% of organizations were investing or planning to invest on Big Data technology

More information

Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com

Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com W H I T E P A P E R O r a c l e V i r t u a l N e t w o r k i n g D e l i v e r i n g F a b r i c

More information

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013 Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache

More information

COURSE CONTENT Big Data and Hadoop Training

COURSE CONTENT Big Data and Hadoop Training COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop

More information

Data Center Op+miza+on

Data Center Op+miza+on Data Center Op+miza+on Sept 2014 Jitender Sunke VP Applications, ITC Holdings Ajay Arora Sr. Director, Centroid Systems Justin Youngs Principal Architect, Oracle 1 Agenda! Introductions! Oracle VCA An

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

How Cisco IT Built Big Data Platform to Transform Data Management

How Cisco IT Built Big Data Platform to Transform Data Management Cisco IT Case Study August 2013 Big Data Analytics How Cisco IT Built Big Data Platform to Transform Data Management EXECUTIVE SUMMARY CHALLENGE Unlock the business value of large data sets, including

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information

Cisco Data Preparation

Cisco Data Preparation Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and

More information

1000-Channel IP System Architecture for DSS

1000-Channel IP System Architecture for DSS Solution Blueprint Intel Core i5 Processor Intel Core i7 Processor Intel Xeon Processor Intel Digital Security Surveillance 1000-Channel IP System Architecture for DSS NUUO*, Qsan*, and Intel deliver a

More information

I/O Considerations in Big Data Analytics

I/O Considerations in Big Data Analytics Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

IRON Big Data Appliance Platform for Hadoop

IRON Big Data Appliance Platform for Hadoop IRON HDPOD Big Data Appliance Commodity Hadoop Cluster Platforms for Enterprises IRON Big Data Appliance Platform for Hadoop IRON Networks Big Data Appliance HDPOD is a comprehensive Hadoop Big Data platform,

More information

Astaro Deployment Guide High Availability Options Clustering and Hot Standby

Astaro Deployment Guide High Availability Options Clustering and Hot Standby Connect With Confidence Astaro Deployment Guide Clustering and Hot Standby Table of Contents Introduction... 2 Active/Passive HA (Hot Standby)... 2 Active/Active HA (Cluster)... 2 Astaro s HA Act as One...

More information

IBM InfoSphere BigInsights Enterprise Edition

IBM InfoSphere BigInsights Enterprise Edition IBM InfoSphere BigInsights Enterprise Edition Efficiently manage and mine big data for valuable insights Highlights Advanced analytics for structured, semi-structured and unstructured data Professional-grade

More information

Integrated Storage Solutions ISS Series

Integrated Storage Solutions ISS Series N E T W O R K e d s t o r a g e Integrated Storage Solutions ISS Series scalable NAS and iscsi SAN Product Family Affordable High Performance Enterprise Features Utmost Reliability and Flexibility NAS

More information

modular Storage Solutions MSS Series

modular Storage Solutions MSS Series N E T W O R K e d s t o r a g e modular Storage Solutions MSS Series NAS and iscsi SAN Product Family High Performance Enterprise Features Easily Scalable Utmost Reliability and Flexibility NAS & iscsi

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14 Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul

More information

Private cloud computing advances

Private cloud computing advances Building robust private cloud services infrastructures By Brian Gautreau and Gong Wang Private clouds optimize utilization and management of IT resources to heighten availability. Microsoft Private Cloud

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information