EMC SOLUTION FOR AGILE AND ROBUST ANALYTICS ON HADOOP DATA LAKE WITH PIVOTAL HDB
|
|
|
- Melvin Rose
- 9 years ago
- Views:
Transcription
1 EMC SOLUTION FOR AGILE AND ROBUST ANALYTICS ON HADOOP DATA LAKE WITH PIVOTAL HDB ABSTRACT As companies increasingly adopt data lakes as a platform for storing data from a variety of sources, the need for simple management and protection grows. Increased analytic capabilities are needed to meet competitive demands. With breakthrough performance and SQL query optimization, Pivotal HDB supports a rich SQL dialect, including complex query and join operations. EMC Isilon Data Lake Storage provides a simple, scalable, and efficient repository that stores and protects massive amounts of unstructured data to the data lake. Together this enterprise data lake business analytics platform allows organizations to efficiently manage and protect their data lake and rapidly develop predictive analytics to unlock the power of their data. EMC WHITE PAPER
2 To learn more about how EMC products, services, and solutions can help solve your business and IT challenges, contact your local representative or authorized reseller, visit or explore and compare products in the EMC Store Copyright 2016 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. Part Number H
3 TABLE OF CONTENTS Executive Summary... 4 Introduction... 4 Challenges Of Traditional Hadoop Deployments... 5 EMC Data Lake Isilon Storage Solution... 5 Challenges Of Developing Business Analytics On HDFS... 5 The Advantages Of Pivotal DB... 6 Pivotal HDB with Isilon: A Winning Combination... 7 Gaining the most from Pivotal HDB And EMC Isilon HDFS... 7 Summary
4 Executive Summary Today, every business must operate like a fast-moving, innovative software company to survive. The development of innovative software begins with data, including advanced analytics and machine learning of the data. For the new-generation applications of today, this data resides within the Hadoop Data Lake. Hadoop has drastically changed the economics and capabilities for capturing and storing today s unrelenting data growth. But as these data lakes rapidly expand with new and potentially revolutionary data, making use of the data quickly becomes a challenge. The difficulty of managing and protecting the data lake, and the need for a flexible, efficient, cost-saving multi-tenant architecture become clear. Additionally, to go beyond just storing data to understanding the dark data, data lake organizations need to be able to rapidly integrate SQL with exceptional query performance. EMC Isilon Data Lake Storage provides a simple, scalable, and efficient repository that stores massive amounts of unstructured data to the data lake. It enables organizations to easily and quickly store current data, scale capacity, tune performance and ensure data protection as their data lake grows in the future. This all-inclusive data lake storage helps lower storage costs through efficient storage utilization by eliminating data silos of storage, and by lowering storage management costs for migration, security, and protection. Pivotal HDB seamlessly fulfills the competitive business demand for business insights and predictive analytics. With breakthrough performance, Pivotal HDB supports a rich SQL dialect, including complex query and join operations. Pivotal HDB also incorporates a cutting-edge, cost-based SQL query optimizer (Pivotal Query Optimizer) that features dynamic pipelining technology. Audience This white paper is intended for business decision makers, IT managers, architects, and implementers. With EMC Isilon Data Lake Storage Solution with Pivotal HDB, enterprises can build robust storage infrastructure and rapidly develop predictive analytics to truly unlock the power of the data lake. Introduction EMC Isilon Data Lake Solutions for Hadoop combines EMC Isilon network-attached storage (NAS) with integrated support for Hadoop. Isilon is the only data lake platform natively integrated with the Hadoop Distributed File System (HDFS). Using HDFS as an over-thewire protocol, customers can deploy a powerful, efficient, and flexible big data storage and analytics ecosystem. Isilon's in-place data analytics approach allows organizations to eliminate the time and resources required to replicate big data into a separate Hadoop infrastructure. The additional benefits of managing a single version of the data source include minimized data file security risks and better control of the data for data governance. Pivotal HDB is an elastic SQL query engine that combines exceptional massively parallel processing, MPP-based analytics performance, robust ANSI SQL compliance, and integrated Pivotal Greenplum Database and open source PostgreSQL that operates natively in Hadoop. Pivotal HDB s robust SQL compliance, powerful cost-based query optimization capabilities, and support for advanced analytics enable companies to implement high-performance analytics solutions on data sets of any size, much more rapidly than traditional Hadoop SQL tools. Hadoop Native SQL provided by Pivotal HDB helps companies to unleash the power of Hadoop and drive significant business change. Pivotal HDB with Isilon: An Integrated Solution for agile and responsive Hadoop analytics By combining Pivotal HDB with EMC Isilon, a customer who wants to to unlock advanced Systems of Insight from their data sources can run predictive analytics with higher accuracy utilizing the entire dataset, directly with an enterprise-level, centralized Hadoop platform. Additionally, organizations can execute near-real-time, ad-hoc queries at scale, and complete analytics tasks faster in seconds or minutes, not hours or days. Using ANSI compliant Pivotal HDB, developers can rapidly model advanced statistical functions and algorithms using MADlib to spot relationships, patterns, and trends with speed and at scale, all within the data lake. Using HDB functionality, developers can interact with petabyte range data sets with the familiarity of a complete, standardscompliant SQL interface. This winning combination is a cost-saving, flexible, enterprise-ready, multi-tenant architecture that provides consistent data protection, security, and efficient operations across all Hadoop architectures and quickly unlocks business insights with exceptional performance. This solution provides full SQL functionality and advanced query capabilities that derive new value from the data lake while they optimize, protect, and secure existing value currently residing in Hadoop repositories. 4
5 Challenges of traditional Hadoop Deployments One of the top challenges data center and Hadoop administrators face today is managing the rapidly expanding and evolving data lake. Specifically those challenges arise when implementing a traditional Hadoop scale-out architecture on commodity servers. Some of the primary challenges of managing traditional Hadoop scale-out architecture include: Hard to deploy and operate Poor utilization of storage and CPU Inefficient and underutilized direct attached storage (DAS) compounded by Hadoop mirroring of three times or more. This can multiply data center costs and is management-intensive. Time- and resource-consuming data staging and manually ingesting large-scale data sets into a separate Hadoop environment can lead to significant delays in reaping benefits or insight from the Hadoop analytics effort. Lack of multi-tenancy: traditional Hadoop architectures are isolated on specifically allocated hardware. Lack of enterprise-level data protection options including snapshot backup and recovery and data replication capabilities for disaster recovery purposes. EMC Data Lake Isilon Storage Solution The EMC Data Lake Isilon network-attached storage (NAS) platform addresses all of the IT challenges of the traditional Hadoop data lake. Powered by the distributed EMC Isilon OneFS operating system, an EMC Data Lake Isilon cluster delivers a scalable pool of storage with a global namespace, advanced and efficient data protection, and unmatched flexibility not available in a traditional scale-out Hadoop architecture. Data Protection and Efficient use of Resources For data protection and high availability, traditional Hadoop DAS architectures require data replication of three times or more, and the Hadoop distribution contains no native replication. An EMC Isilon cluster, with the OneFS file system, uses erasure coding to protect data with greater than 80 percent storage efficiency, in contrast to traditional Hadoop distributed file system (HDFS) with 33 percent storage efficiency. Additionally, applying Isilon's SmartDedupe can further dedupe data in the data lake, making HDFS storage even more efficient. The highly resilient Isilon OneFS filesystem completely eliminates the single-point-of-failure risk with traditional Hadoop deployments. The EMC Data Lake Isilon Storage allows for processing queries in parallel, one or more segment instances and dedicated CPU, memory and disk (at the Isilon compute layer). Isilon solutions also offer robust security options including role-based access control (RBAC), SEC 17a-4-compliant, write-once-readmany (WORM) data protection, file system auditing, and data-at-rest encryption to help organizations address regulatory and compliance requirements. Unmatched Flexibility within the Data Lake The EMC Data Lake Isilon Storage solution supports multiple instances of Apache Hadoop distributions from different vendors simultaneously. The solution also supports multiple versions of the Hadoop distributed file system (HDFS). This allows organizations to use the specific analytics tools from any Hadoop distribution. Lower Costs Isilon's native HDFS integration means organizations can avoid the need to invest in a separate Hadoop infrastructure and eliminate the time and expense of moving large data sets. The unmatched efficiency and ease of use of Isilon solutions helps lower storage costs and simplify management. Challenges of Developing Business Analytics on HDFS Hadoop is a technology well-suited to tackling big data problems. The fundamental principle of the Hadoop architecture is to bring analysis to the data, rather than moving the data to a system that can analyze it. Instead of carefully filtering and transforming the source data through a heavy ETL process (for instance, the rows of a relational database) the original data source, with few or no modifications, is ingested into a central repository. The Challenge of Tradition Hadoop analytics There are two key technologies that allow users of Hadoop to successfully retain and analyze data: HDFS and MapReduce. HDFS is a simple but extremely powerful distributed file system that stores data reliably at significant scale. MapReduce is parallel programming framework that integrates with HDFS. It allows users to express data analysis algorithms in terms of a small number of 5
6 functions and operators: chiefly, a map function and a reduce function. While MapReduce is convenient when performing scheduled analysis or manipulation of data stored on HDFS, it is not suitable for interactive use as it requires extensive Java programming and developers who understand how MapReduce works to solve data problems analytically. Why Hive and Impala are insufficient With the limitations of MapReduce, companies turned to Apache Hive and Cloudera Impala for solutions. These are SQL-like query engines that compile a limited dialect to MapReduce. While they do address some of the expressive shortcomings of MapReduce, performance remains an issue. Furthermore and most importantly, Impala and Apache Hive do not support all 99 of the standard TPC-DS queries. The Advantages of Pivotal HDB Pivotal HDB supports low-latency analytic SQL queries, coupled with massively parallel machine-learning capabilities, to shorten data-driven innovation cycles for the enterprise. Pivotal HDB enables discovery-based analysis of large data sets and rapid, iterative development of data analytics applications that apply deep machine learning. Leveraging Pivotal s parallel database technology, it consistently performs tens to hundreds of times faster than all other Hadoop query engines in the market. Massively Parallel Processing (MPP) with the Data Lake MPP Systems are distributed systems consisting of a cluster of independent nodes. MPP databases are excellent at structured data processing due to the architecture s fast query processing capability, embedded automatic query optimization, and industry standard SQL interfaces with BI tools. Pivotal HDB incorporates the performance advantages of MPP SQL engine into the Hadoop HDFS central repository environment, allowing customers the processing power of a relational database within a Hadoop repository without the strict schema requirements. Exceptional Hadoop Native SQL Performance Pivotal HDB provides superior performance compared to current open source SQL on Hadoop analytical tools. It can achieve fast performance for complex and advanced data analytics. With high speed interconnect for continuous pipelining of data processing, Pivotal HDB can produce near real-time latency. Pivotal HDB runs the set of TPC-DS queries roughly twice as fast as Impala. By design, Pivotal HDB also finishes most queries in a fraction of time it would take in Apache Hive. Pivotal HDB beats Impala by overall 454 percent in performance (as determined by measuring and comparing the geometric mean across the TPC queries). Compared to Hive, Pivotal HDB improves performance by an additional of 344 percent on complex queries. Full ANSI SQL Compliance Data consistency is a real issue for application developers. Pivotal HDB is fully ANSI SQL-92, SQL-99, SQL-2003 compliant, including OLAP extensions. Using Pivotal HDB, organizations can leverage existing SQL skills to ramp up and execute quickly within the Hadoop data lake. Additionally, there are no compatibility risks to SQL developers or SQL BI tools and applications. With full support for query roll-ups, dynamic partitions, and joins, enterprises can produce business-critical class analytics directly and natively within HDFS. External Tables - Pivotal Extension Framework (PXF) PXF enables SQL querying on data in the Hadoop components such as HBase, Hive, and any other distributed data file types. These queries execute in a single, zero-materialization and fully parallel workflow. PXF also uses the Pivotal HDB advanced query optimizer and executor to run analytics on these external data sources. PXF connects Hadoop-based components to facilitate data joins, such as between Pivotal HDB tables and HBase tables. PXF integrates seamlessly when using Isilon as the HDFS storage layer, and is the fastest way to load data or explore large datasets. PXF also integrates with HCatalog to query Hive tables directly. Additionally, users have added flexibility to HBase or Avro data files through the use of Pivotal HDB PxF services. Integrated MADlib Machine-Learning Machine learning algorithms allow organizations to identify patterns and trends in their big data datasets, and also enable them to make high-value predictions that can guide better decisions and smart actions in near real-time and without human intervention. Apache MADlib is an open source library for scalable in-database analytics. It provides data-parallel implementations of machine learning, mathematical, and statistical methods and is fully integrated with in the Apache HDB Hadoop Native SQL platform. MADlib uses the MPP architecture s full compute power to process very large data sets, whereas other products are limited by the amount of data that can be loaded into memory on a single node. 6
7 Pivotal HDB with Isilon: A Winning Combination This comprehensive enterprise data lake business analytics platform provides a scalable, shared data storage infrastructure. Also, it has multiple protocol access to shared data and enterprise features like backup, replication, snapshots, file system audits, and dataat-rest encryption. Combined with ANSI compliant SQL and integrated machine learning, there are some unique capabilities and advantages over direct attached storage (DAS)which are outlined in the table below. Table 1: Traditional Hadoop Environment vs EMC Isilon with Pivotal HDB Traditional DAS Hadoop Distributions with Hive Hadoop with Pivotal HDB on Isilon Data Type Files Files Protection Overhead 200% 20% NameNode Redundancy Active/Passive Node-to-Node De-Duplication x File Edit Capability x NFS (v3,v4)/smb/http/ftp x Simultaneous Multi-Protocol x Independent Scaling (Storage/Compute) x File/Object Level Access Control Lists x Multi Tenancy x Simultaneous Hadoop Distributions x Hadoop Distributions 1 A Data Tiering x WORM (SEC 17a-4) x POSIX Compliance x Snapshots Encryption x ANSI SQL-92, SQL-99, SQL-2003 compliance x Integrated MADlib Machine Learning x Fully ACID compliant Partially Gaining the most from Pivotal HDB and EMC Isilon HDFS To help our customers and partners get the most out of this proven solution, Pivotal s Platform Engineering team has tested all the common Pivotal HDB functionality used by customers with EMC Isilon as the HDFS layer. This section provides a description of the key findings, and lists some parameters used during the testing effort. Environment and Test Overview The test environment consisted of an EMC Isilon scale-out NAS platform with 16 commodity server nodes, 8 storage nodes and archetypical compute, memory, network, and disk equipment. The test used the industry standard 99 queries for TPC-DS benchmarking. NOTE: Testing was performed by Pivotal only. The testing performed and results collected were not audited by or submitted to TPC. 7
8 Isilon Configuration This section details the configuration parameters used to optimize the EMC Isilon X-series cluster utilized in the testing effort. File Pool Data Access Patterns A file pool policy is a way to assign filtering rules and operations that are both system- and user-defined. These data access settings are configured to optimize data access for the type of application accessing it. Protection settings and I/O optimization settings for file types can be set based on the specifications of the file pool filtering policy. EMC Isilon has three different access patterns that are available at a file pool level: concurrent, streaming and random. Concurrent optimizes for current load on the cluster featuring many simultaneous clients; streaming optimizes for high-speed streaming of a single file; and random optimizes for unpredictable access to the file with no cache prefetching. Multiple iterations of the TPCDS query set were run and the best results were produced using Streaming as the access pattern on Isilon with Pivotal HDB. Block Size When selecting the correct Block Size for Isilon, Pivotal Hadoop Distribution (PHD) and HDB, it is important to set Isilon s HDFS (isi_hdfs_d daemon), HDFS on the Pivotal HD cluster, and the block size for Pivotal HDB to the same parameters. Best query runtime results were achieved when block sizes were set to 128MB or 512MB. For Pivotal HD, Apache Ambari admin UI can be used to make this change. For EMC Isilon, this is a change that can only be applied using the CLI. Balancing Traffic In order to get a balanced throughput out of the Isilon cluster, separate the Isilon cluster into two IP Pools one for DataNode traffic and the other for NameNode Traffic. For the DataNode IP Pool, configure three IP addresses per network interface in each Isilon node as shown from the EMC Isilon OneFS Management UI below: Figure 1: EMC Isilon OneFS Management UI: IP configuration Round robin is recommended for the pool connection policy. The IP allocation method needs to be static in case of Isilon node or network interface failures, another Isilon node can service requests to the failed DataNode IP address in the pool, and this provides redundancy and load balancing at the DataNode level. Using the isi hdfs racks command, you can create a rack that will now use this new pool as its DataNodes as shown below: isi hdfs racks create --client-ip-ranges= --ip-pools=pool1 --rack=/rack0 In the above command, replace with the actual IP range of the compute environment. A separate IP pool to be used for NameNode traffic also needs to be created for each 10Gb Network interface per Isilon node in the 8
9 Isilon cluster as shown below: Figure 2: EMC Isilon OneFS Management UI: IP Address Pools The connection policy for the NameNode pool needs to be round robin so that each request goes to a different NameNode IP address. The IP allocation for the NameNode pool should be dynamic as SmartConnect itself will provide the redundancy needed in case of Node or network interface failures. Pivotal HDB Configuration Parameters This section details the configuration parameters used to optimize Pivotal HDB for the testing effort. Data Distribution Pivotal HDB 2.0 supports distributing table data either randomly or by a specific key. We found that selecting RANDOM distribution produced the best results when running Pivotal HDB 2.0 with EMC Isilon as its HDFS storage layer. Additionally, the truncate functionality works as expected in Pivotal HDB 2.0. Data Format Pivotal HDB supports multiple formats for creating table data. We compared Append Only tables with Parquet file tables and found that Parquet tables with snappy compression produced the best results when running Pivotal HDB with Isilon as its HDFS storage layer. Additionally we verified that the truncate functionality works in Pivotal HDB as expected when running with Isilon as its HDFS storage layer. Table Partitions Partitioning data in Pivotal HDB has significant advantages. With it, you can look only at specific partitions of data for rows or columns, and you do not have to incur the expensive scan on the entire table when looking for specific values in a large table. Partitioning improves runtimes when EMC Isilon is the HDFS storage layer. We compared partitioning large tables annually, quarterly, monthly, weekly, and daily in our test scenarios. We found that partitioning large tables by either quarterly partitions (partitioned every 90 days) or monthly partitions (partitioned every 30 days) would isolate table scans to specific partitions, reduce the amount of data scanned, and speed up requested results. 9
10 Concurrency The testing effort covered the entire 99 TPC-DS query test suite sequentially and monitored it comparatively when 5, 10, 15, or 20 users were running concurrently. Linear growth in the runtimes occurred as users were added. Pivotal HDB in the EMC Data Lake Isilon configuration was able to distribute the TPC-DS query load evenly across the resources available without running out of resources. To fine-tune the environment further, Pivotal HDB can leverage resource queues at the group or user level. Depending on workload, additional resources can be granted to specific users and groups to schedule resources efficiently. Below are the Pivotal HDB and Isilon configuration parameters that produced optimum results. Table 2: Summary: EMC Isilon and Pivotal HDB tuning recommendations Component Configuration Action Notes Isilon Access Patterns isi file pool default-policy modify -- data-access-pattern=streaming Streaming Isilon Block Size isi hdfs settings modify --defaultblock-size=128m 128M or 512M Pivotal HDB Block Size Ambari UI: edit the HDFS block-size Match Isilon Block Size Pivotal HDB Block Size Edit dfs.default.blocksize property in /usr/local/hdb/etc/hdfs-client.xml Isilon Thread Count isi hdfs settings modify --serverthreads=auto Match Isilon Block Size Auto or 64 Isilon Data Node Traffic OneFS Management UI 3 IP addresses per node load balanced Round Robin; 1 IP per interface; IP Allocation: Static Isilon Name Node Traffic OneFS Management UI Load balanced Round Robin; 3 IPs per interface; IP Allocation Dynamic Pivotal HDB Data Format Example: CREATE TABLE tpch.region WITH (appendonly=true,orientation=parquet, compresstype=snappy) AS SELECT * FROM ext_tpch.region DISTRIBUTED BY (R_REGIONKEY); Parquet with Snappy Compression Pivotal HDB Partitioning Example: CREATE TABLE p_orders (LIKE orders) WITH (appendonly=true, compresstype=snappy, orientation=parquet) DISTRIBUTED BY (o_orderkey) PARTITION BY RANGE (o_orderdate) (START (' ') END (' ') EVERY (interval '1 year')); More partitions equate performance but diminished load time to faster query 10
11 Lab Testing Environment The lab environment consisted of eight compute servers (two masters and six workers), ten 900 GB SAS drives for local scratch/spill space, 64GB of memory in each node, and Pivotal HDB version 2.0 on Pivotal HD 3.0. Pivotal HDB was configured to use spill space on each of its local drives and to have ten Pivotal HDB segment instances per node. The configuration options included random access for distribution tables, parquet data format, snappy compression, and monthly partitioning. Networking was configured to be a 10GB storage network where both the compute and Isilon cluster connected to the same switch. The compute rack has two 10Gb internal switches, which were configured with a MLAG configuration using two ports for the internal switch to the storage network. This configuration allows the Isilon cluster to be a shared resource on the 10GB network. Data can be accessed it via HDFS or any other Isilon supported protocol. Storage was provided by four 410 nodes in the Isilon cluster with two 800GB SSDs and thirty-four 1TB SATA drives. The software version on the Isilon cluster was OneFS v8.0. The Isilon storage configuration utilized SSDs for data and metadata. The Name Node pool was set to dynamic, Data Node Pool was set to static, and streaming access was enabled. Summary The advantages of using Pivotal HDB within the EMC Data Lake Isilon Storage Solution are numerous. From fast data ingestion with EMC Data Lake Isilon Storage s native HDFS to rapid SQL development with Pivotal HDB, the list of benefits to the big data lake environment is impressive. Enterprises can leverage existing SQL skillsets and avoid additional training with the fully ANSI SQL complaint analytic environment. Organizations gain the full advantages of massive data lake scalability, Hadoop distribution flexibility, and data protection and encryption of all their big data assets. 11
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE
ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics
EMC IRODS RESOURCE DRIVERS
EMC IRODS RESOURCE DRIVERS PATRICK COMBES: PRINCIPAL SOLUTION ARCHITECT, LIFE SCIENCES 1 QUICK AGENDA Intro to Isilon (~2 hours) Isilon resource driver Intro to ECS (~1.5 hours) ECS Resource driver Possibilities
Integrated Grid Solutions. and Greenplum
EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving
EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise
EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise ESSENTIALS Easy-to-use, single volume, single file system architecture Highly scalable with
Protecting Big Data Data Protection Solutions for the Business Data Lake
White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With
Advanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.
EMC Federation Big Data Solutions 1 Introduction to data analytics Federation offering 2 Traditional Analytics! Traditional type of data analysis, sometimes called Business Intelligence! Type of analytics
EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst
White Paper EMC s Enterprise Hadoop Solution Isilon Scale-out NAS and Greenplum HD By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst February 2012 This ESG White Paper was commissioned
How To Manage A Single Volume Of Data On A Single Disk (Isilon)
1 ISILON SCALE-OUT NAS OVERVIEW AND FUTURE DIRECTIONS PHIL BULLINGER, SVP, EMC ISILON 2 ROADMAP INFORMATION DISCLAIMER EMC makes no representation and undertakes no obligations with regard to product planning
EMC SOLUTION FOR SPLUNK
EMC SOLUTION FOR SPLUNK Splunk validation using all-flash EMC XtremIO and EMC Isilon scale-out NAS ABSTRACT This white paper provides details on the validation of functionality and performance of Splunk
Enabling High performance Big Data platform with RDMA
Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery
Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
EMC ISILON ONEFS OPERATING SYSTEM
EMC ISILON ONEFS OPERATING SYSTEM Powering scale-out storage for the Big Data and Object workloads of today and tomorrow ESSENTIALS Easy-to-use, single volume, single file system architecture Highly scalable
Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop
1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap
Big + Fast + Safe + Simple = Lowest Technical Risk
Big + Fast + Safe + Simple = Lowest Technical Risk The Synergy of Greenplum and Isilon Architecture in HP Environments Steffen Thuemmel (Isilon) Andreas Scherbaum (Greenplum) 1 Our problem 2 What is Big
Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian Tzolov @christzolov
Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA by Christian Tzolov @christzolov Whoami Christian Tzolov Technical Architect at Pivotal, BigData, Hadoop, SpringXD,
Understanding Enterprise NAS
Anjan Dave, Principal Storage Engineer LSI Corporation Author: Anjan Dave, Principal Storage Engineer, LSI Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA
WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5
Accelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
BIG DATA-AS-A-SERVICE
White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers
Fundamentals Curriculum HAWQ
Fundamentals Curriculum Pivotal Hadoop 2.1 HAWQ Education Services zdata Inc. 660 4th St. Ste. 176 San Francisco, CA 94107 t. 415.890.5764 zdatainc.com Pivotal Hadoop & HAWQ Fundamentals Course Description
CDH AND BUSINESS CONTINUITY:
WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable
THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.
THE EMC ISILON STORY Big Data In The Enterprise 2012 1 Big Data In The Enterprise Isilon Overview Isilon Technology Summary 2 What is Big Data? 3 The Big Data Challenge File Shares 90 and Archives 80 Bioinformatics
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE
White Paper IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE Abstract This white paper focuses on recovery of an IBM Tivoli Storage Manager (TSM) server and explores
Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing
Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer [email protected] Assistants: Henri Terho and Antti
An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
Big Data Management and Security
Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value
White. Paper. EMC Isilon: A Scalable Storage Platform for Big Data. April 2014
White Paper EMC Isilon: A Scalable Storage Platform for Big Data By Nik Rouda, Senior Analyst and Terri McClure, Senior Analyst April 2014 This ESG White Paper was commissioned by EMC Isilon and is distributed
An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
Please give me your feedback
Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &
Hadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010
Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010 Better Together Writer: Bill Baer, Technical Product Manager, SharePoint Product Group Technical Reviewers: Steve Peschka,
Isilon OneFS. Version 7.2.1. OneFS Migration Tools Guide
Isilon OneFS Version 7.2.1 OneFS Migration Tools Guide Copyright 2015 EMC Corporation. All rights reserved. Published in USA. Published July, 2015 EMC believes the information in this publication is accurate
How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
Fast, Low-Overhead Encryption for Apache Hadoop*
Fast, Low-Overhead Encryption for Apache Hadoop* Solution Brief Intel Xeon Processors Intel Advanced Encryption Standard New Instructions (Intel AES-NI) The Intel Distribution for Apache Hadoop* software
Isilon OneFS. Version 7.2. OneFS Migration Tools Guide
Isilon OneFS Version 7.2 OneFS Migration Tools Guide Copyright 2014 EMC Corporation. All rights reserved. Published in USA. Published November, 2014 EMC believes the information in this publication is
Introduction to NetApp Infinite Volume
Technical Report Introduction to NetApp Infinite Volume Sandra Moulton, Reena Gupta, NetApp April 2013 TR-4037 Summary This document provides an overview of NetApp Infinite Volume, a new innovation in
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
THE EMC ISILON SCALE-OUT DATA LAKE
THE EMC ISILON SCALE-OUT DATA LAKE Key capabilities ABSTRACT This white paper provides an introduction to the EMC Isilon scale-out data lake as the key enabler to store, manage, and protect unstructured
EMC ISILON SCALE-OUT STORAGE PRODUCT FAMILY
SCALE-OUT STORAGE PRODUCT FAMILY Unstructured data storage made simple ESSENTIALS Simple storage management designed for ease of use Massive scalability of capacity and performance Unmatched efficiency
EMC GREENPLUM DATABASE
EMC GREENPLUM DATABASE Driving the future of data warehousing and analytics Essentials A shared-nothing, massively parallel processing (MPP) architecture supports extreme performance on commodity infrastructure
I/O Considerations in Big Data Analytics
Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very
EMC ISILON SCALE-OUT NAS FOR IN-PLACE HADOOP DATA ANALYTICS
White Paper EMC ISILON SCALE-OUT NAS FOR IN-PLACE HADOOP DATA ANALYTICS Abstract This white paper shows that storing data in EMC Isilon scale-out network-attached storage optimizes data management for
Information Architecture
The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to
Dell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert [email protected]/
EXPLORATION TECHNOLOGY REQUIRES A RADICAL CHANGE IN DATA ANALYSIS
EXPLORATION TECHNOLOGY REQUIRES A RADICAL CHANGE IN DATA ANALYSIS EMC Isilon solutions for oil and gas EMC PERSPECTIVE TABLE OF CONTENTS INTRODUCTION: THE HUNT FOR MORE RESOURCES... 3 KEEPING PACE WITH
Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database
White Paper Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database Abstract This white paper explores the technology
EMC ISILON SCALE-OUT STORAGE PRODUCT FAMILY
SCALE-OUT STORAGE PRODUCT FAMILY Storage made simple ESSENTIALS Simple storage designed for ease of use Massive scalability with easy, grow-as-you-go flexibility World s fastest-performing NAS Unmatched
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
BUILDING A SCALABLE BIG DATA INFRASTRUCTURE FOR DYNAMIC WORKFLOWS
BUILDING A SCALABLE BIG DATA INFRASTRUCTURE FOR DYNAMIC WORKFLOWS ESSENTIALS Executive Summary Big Data is placing new demands on IT infrastructures. The challenge is how to meet growing performance demands
CSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing
Luncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
CitusDB Architecture for Real-Time Big Data
CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
SECURE, ENTERPRISE FILE SYNC AND SHARE WITH EMC SYNCPLICITY UTILIZING EMC ISILON, EMC ATMOS, AND EMC VNX
White Paper SECURE, ENTERPRISE FILE SYNC AND SHARE WITH EMC SYNCPLICITY UTILIZING EMC ISILON, EMC ATMOS, AND EMC VNX Abstract This white paper explains the benefits to the extended enterprise of the on-
IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look
IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based
Agenda. Big Data & Hadoop ViPR HDFS Pivotal Big Data Suite & ViPR HDFS ViON Customer Feedback #EMCVIPR
1 Agenda Big Data & Hadoop ViPR HDFS Pivotal Big Data Suite & ViPR HDFS ViON Customer Feedback 2 A World of Connected Devices Need a new data management architecture for Internet of Things 21% the % of
SQL Server 2012 Performance White Paper
Published: April 2012 Applies to: SQL Server 2012 Copyright The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication.
Hadoop: Embracing future hardware
Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop
EMC PERFORMANCE OPTIMIZATION FOR MICROSOFT FAST SEARCH SERVER 2010 FOR SHAREPOINT
Reference Architecture EMC PERFORMANCE OPTIMIZATION FOR MICROSOFT FAST SEARCH SERVER 2010 FOR SHAREPOINT Optimize scalability and performance of FAST Search Server 2010 for SharePoint Validate virtualization
The BIG Data Era has. your storage! Bratislava, Slovakia, 21st March 2013
The BIG Data Era has arrived Re-invent your storage! Bratislava, Slovakia, 21st March 2013 Luka Topic Regional Manager East Europe EMC Isilon Storage Division [email protected] 1 What is Big Data? 2 EXABYTES
Greenplum Database. Getting Started with Big Data Analytics. Ofir Manor Pre Sales Technical Architect, EMC Greenplum
Greenplum Database Getting Started with Big Data Analytics Ofir Manor Pre Sales Technical Architect, EMC Greenplum 1 Agenda Introduction to Greenplum Greenplum Database Architecture Flexible Database Configuration
Big Data - Infrastructure Considerations
April 2014, HAPPIEST MINDS TECHNOLOGIES Big Data - Infrastructure Considerations Author Anand Veeramani / Deepak Shivamurthy SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. Copyright
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
WHITE PAPER. www.fusionstorm.com. Get Ready for Big Data:
WHitE PaPER: Easing the Way to the cloud: 1 WHITE PAPER Get Ready for Big Data: How Scale-Out NaS Delivers the Scalability, Performance, Resilience and manageability that Big Data Environments Demand 2
Proact whitepaper on Big Data
Proact whitepaper on Big Data Summary Big Data is not a definite term. Even if it sounds like just another buzz word, it manifests some interesting opportunities for organisations with the skill, resources
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
Data movement for globally deployed Big Data Hadoop architectures
Data movement for globally deployed Big Data Hadoop architectures Scott Rudenstein VP Technical Services November 2015 WANdisco Background WANdisco: Wide Area Network Distributed Computing " Enterprise
Big data management with IBM General Parallel File System
Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers
Implementation of Hadoop Distributed File System Protocol on OneFS Tanuj Khurana EMC Isilon Storage Division
Implementation of Hadoop Distributed File System Protocol on OneFS Tanuj Khurana EMC Isilon Storage Division Outline HDFS Overview OneFS Overview HDFS protocol on OneFS HDFS protocol server implementation
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com
REPORT Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com The content of this evaluation guide, including the ideas and concepts contained within, are the property of Splice Machine,
Hadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT
INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT UNPRECEDENTED OBSERVABILITY, COST-SAVING PERFORMANCE ACCELERATION, AND SUPERIOR DATA PROTECTION KEY FEATURES Unprecedented observability
WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution
WHITEPAPER A Technical Perspective on the Talena Data Availability Management Solution BIG DATA TECHNOLOGY LANDSCAPE Over the past decade, the emergence of social media, mobile, and cloud technologies
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
THE HADOOP DISTRIBUTED FILE SYSTEM
THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
HadoopTM Analytics DDN
DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate
EMC XTREMIO EXECUTIVE OVERVIEW
EMC XTREMIO EXECUTIVE OVERVIEW COMPANY BACKGROUND XtremIO develops enterprise data storage systems based completely on random access media such as flash solid-state drives (SSDs). By leveraging the underlying
Simple. Extensible. Open.
White Paper Simple. Extensible. Open. Unleash the Value of Data with EMC ViPR Global Data Services Abstract The following paper opens with the evolution of enterprise storage infrastructure in the era
IBM Netezza High Capacity Appliance
IBM Netezza High Capacity Appliance Petascale Data Archival, Analysis and Disaster Recovery Solutions IBM Netezza High Capacity Appliance Highlights: Allows querying and analysis of deep archival data
Interactive data analytics drive insights
Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has
EMC Data Domain Boost for Oracle Recovery Manager (RMAN)
White Paper EMC Data Domain Boost for Oracle Recovery Manager (RMAN) Abstract EMC delivers Database Administrators (DBAs) complete control of Oracle backup, recovery, and offsite disaster recovery with
Platfora Big Data Analytics
Platfora Big Data Analytics ISV Partner Solution Case Study and Cisco Unified Computing System Platfora, the leading enterprise big data analytics platform built natively on Hadoop and Spark, delivers
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
