How Cisco IT Built Big Data Platform to Transform Data Management
|
|
|
- Claribel Fletcher
- 10 years ago
- Views:
Transcription
1 Cisco IT Case Study August 2013 Big Data Analytics How Cisco IT Built Big Data Platform to Transform Data Management EXECUTIVE SUMMARY CHALLENGE Unlock the business value of large data sets, including structured and unstructured information Provide service-level agreements (SLAs) for internal customers using big data analytics services Support multiple internal users on same platform SOLUTION Implemented enterprise Hadoop platform on Cisco UCS CPA for Big Data - a complete infrastructure solution including compute, storage, connectivity and unified management Automated job scheduling and process orchestration using Cisco Tidal Enterprise Scheduler as alternative to Oozie RESULTS Analyzed service sales opportunities in one-tenth the time, at one-tenth the cost $40 million in incremental service bookings in the current fiscal year as a result of this initiative Implemented a multi-tenant enterprise platform while delivering immediate business value LESSONS LEARNED Cisco UCS can reduce complexity, improves agility, and radically improves cost of ownership for Hadoop based applications Library of Hive and Pig user-defined functions (UDF) increases developer productivity. Cisco TES simplifies job scheduling and process orchestration Build internal Hadoop skills Educate internal users about opportunities to use big data analytics to improve data processing and decision making NEXT STEPS Enable NoSQL Database and advanced analytics capabilities on the same platform. Adoption of the platform across different business functions. Enterprise Hadoop architecture, built on Cisco UCS Common Platform Architecture (CPA) for Big Data, unlocks hidden business intelligence. Challenge Cisco is the worldwide leader in networking that transforms how people connect, communicate and collaborate. Cisco IT manages 38 global data centers comprising 334,000 square feet. Approximately 85 percent of applications in newer data centers are virtualized and IT is working toward a goal of 95 percent virtualization. At Cisco, very large datasets about customers, products, and network activity represent hidden business intelligence. The same is true of terabytes of unstructured data such as web logs, video, , documents, and images. To unlock the business intelligence hidden in globally distributed big data, Cisco IT chose Hadoop, an open-source software framework that supports dataintensive, distributed applications. Hadoop behaves like an affordable supercomputing platform, says Piyush Bhargava, a Cisco IT distinguished engineer who focuses on big data programs. It moves compute to where the data is stored, which mitigates the disk I/O bottleneck and provides almost linear scalability. Hadoop would enable us to consolidate the islands of data scattered throughout the enterprise. To offer big data analytics services to Cisco business teams, Cisco IT first needed to design and implement an enterprise platform that could support appropriate service-level agreements (SLAs) for availability and performance. Our challenge was adapting the open-source Hadoop platform for the enterprise, says Bhargava. Technical requirements for the big data architecture included: Open-source components 2013 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 1 of 9
2 Scalability and enterprise-class availability Multitenant support so that multiple Cisco teams could use the same platform at the same time Overcoming disk I/O speed limitations to accelerate performance Integration with IT support processes Solution Cisco IT Hadoop platform is build using Cisco UCS Common Platform Architecture (CPA) for Big Data. High level architecture of the solution is shown in Figure 1. Figure 1. Cisco IT Hadoop Platform Cisco UCS 6248UP Fabric Interconnects ( Per Domain ) Cisco Nexus 2232PP 10 GE Fabric Extenders ( Per Rack) Scalability High Performance High Availability ZooKeeper and WebServer CLDB JobTracker ZooKeeper and WebServer CLDB JobTracker ZooKeeper and WebServer CLDB JobTracker Operational Simplicity Unified Management Three nodes apiece for ZooKeeper, CLDB, WebServer, and JobTracker File Server and TaskTracker Run Across All Nodes Cisco IT Hadoop Platform is designed to provide high performance in a multitenant environment, anticipating that internal users will continually find more use cases for big data analytics. Cisco UCS CPA for Big Data provides the capabilities we need to use big data analytics for business advantage, including high-performance, scalability, and ease of management, says Jag Kahlon, Cisco IT architect. Linearly Scalable Hardware with Very Large Onboard Storage Capacity The compute building block of the Cisco IT Hadoop Platform is the Cisco UCS C240 M3 Rack Servers, 2-RU server power by two Intel Xeon E series processors, 256 GB of RAM, and 24 TB of local storage. Out of the 24 TB, Hadoop Distributed File System (HDFS) can use 22 TB, and the remaining 2 TB is available for the operating system. Cisco UCS C-Series Servers provide high performance access to local storage, the biggest factor in Hadoop 2013 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 2 of 9
3 performance says Virendra Singh, Cisco IT architect. Cisco UCS C-Series Servers provide high performance access to local storage, the biggest factor in Hadoop performance Virendra Singh, Cisco IT Architect The current architecture comprises of four racks, each containing 16 server nodes supporting 384 TB of raw storage per rack. This configuration can scale to 160 servers in a single management domain supporting 3.8 petabytes of raw storage capacity, says Kahlon. Low-Latency, Lossless Network Connectivity Cisco UCS 6200 Series Fabric Interconnects provides high speed, low latency connectivity for servers and centralized management for all connected devices with UCS Manager. Deployed in redundant pairs offers the full redundancy, performance (active-active), and exceptional scalability for large number of nodes typical in big data clusters. Each rack connects to the fabric interconnects through a redundant pair of Cisco Nexus 2232PP Fabric Extenders, which behave like remote line cards. Simple Management Cisco IT server administrators manage all elements of the Cisco UCS including servers, storage access, networking, and virtualization from a single Cisco UCS Manager interface. Cisco UCS Manager significantly simplifies management of our Hadoop platform says Kahlon. UCS Manager will help us manage larger clusters as our platform grows without increasing staffing. Using Cisco UCS Manager service profiles saved time and effort for server administrators by making it unnecessary to manually configure each server. Service profiles also eliminated configuration errors that could cause downtime. Cisco UCS Manager significantly simplifies management of our Hadoop platform. UCS Manager will help us manage larger clusters as our platform grows without increasing staffing Jag Kahlon, Cisco IT Architect Open, Enterprise-Class Hadoop Distribution Cisco IT uses MapR Distribution for Apache Hadoop, which speeds up MapReduce jobs with an optimized shuffle algorithm, direct access to the disk, built-in compression, and code written in advanced C++ rather than Java. Hadoop complements rather than replaces Cisco IT s traditional data processing tools, such as Oracle and Teradata, Singh says. Its unique value is to process unstructured data and very large data sets far more quickly and at far less cost Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 3 of 9
4 Hadoop Distributed File System (HDFS) aggregates the storage on all Cisco UCS C240 M3 servers in the cluster to create one large logical unit. Then, it splits data into smaller chunks to accelerate processing by eliminating time-consuming extract, transform, and load (ETL) operations. Processing can continue even if a node fails because Hadoop makes multiple copies of every data element, distributing them across several servers in the cluster, says Hari Shankar, Cisco IT architect. Even if a node fails, there is no data loss. Hadoop senses the node failure and automatically creates another copy of the data, distributing it across the remaining servers. Total data volume is no larger than it would be without replication because HDFS automatically compresses the data. Cisco Tidal Enterprise Scheduler (TES) For job scheduling and process orchestration, Cisco IT uses Cisco TES as a friendlier alternative to Oozie, the native Hadoop scheduler. Built-in Cisco TES connectors to Hadoop components eliminate manual steps such as writing Sqoop code to download data to HDFS and executing a command to load data to Hive. Using Cisco TES for job scheduling saves hours on each job compared to Oozie because reducing the number of programming steps means less time needed for debugging, says Singh. Another advantage of Cisco TES is that it operates on mobile devices, enabling Cisco end users to execute and manage big data jobs from anywhere. Using Cisco TES for job scheduling saves hours on each job compared to Oozie because reducing the number of programming steps means less time needed for debugging. Virendra Singh, Cisco IT Architect Looking Inside a Cisco IT Hadoop Job The first big data analytics program in production at Cisco helps to increase revenues by identifying hidden opportunities for partners to sell services. Previously, we used traditional data warehousing techniques to analyze the install base that identified opportunities for the next four quarters, says Srini Nagapuri, Cisco IT project manager. But analysis took 50 hours, so we could only generate reports once a week. The other limitation of the old architecture was the lack of a single source of truth for opportunity data. Instead, service opportunity information was spread out across multiple data stores, causing confusion for partners and the Cisco partner support organization. The new big data analytics solution harnesses the power of Hadoop on the Cisco UCS CPA for Big Data to process 25 percent more data in 10 percent of the time. The data foundation includes the following: Cisco Technical Services contracts that will be ready for renewal or will expire within five calendar quarters Opportunities to activate, upgrade, or upsell software subscriptions within five quarters Business rules and a management interface to identify new types of opportunity data Partner performance measurements, expressed as the opportunity-to-bookings conversion ratio Figure 2 and Table 1 show the physical architecture, and Figure 3 shows the logical architecture Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 4 of 9
5 Figure 2. Physical Architecture for Hadoop Platform to Identify Partner Sales Opportunities Software Components in Cisco IT s Big Data Analytics Platform Component MapReduce Hive Cisco Tidal Enterprise Scheduler (TES) Pig HBase Sqoop Flume ZooKeeper Function Distributed computing framework. Data is processed on the same Cisco UCS server where it resides, avoiding latency while data is accessed over the network. SQL-like interface to support analysis of large data sets and data summarization. Workload automation, job scheduling, event management and process orchestration. Data flow language. Enables Cisco IT to process big data without writing MapReduce programs. Columnar database built on top of HDFS for low-latency read and write operations. Tool to import and export data between traditional databases and HDFS. Tool to capture and load log data to HDFS in real time. Coordination of distributed processes to provide high availability Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 5 of 9
6 Figure 3. Cisco Hadoop Logical Architecture Client Job Status tracking (Read Only) Replication Admin Generic User Load Balanced MySQL Server Hive Metastore Metrics DB Performance Metrics Job Submission Cisco TES Job Orchestration Job Scheduler Flume, sqoop RDBMS OLTP/DSS/DW Hadoop Cluster CLDB ZooKeeper MapR-FS, Task/Job Tracker WebServer Tools (Server Comp) Results Cisco IT has introduced multiple big data analytics programs, all of them operating on the Cisco UCS Common Platform Architecture (CPA) for Big Data. Increased Revenues from Partner Sales The Cisco Partner Annuity Initiative program is in production. The enterprise Hadoop platform accelerated processing time for identifying partner services sales opportunities from 50 hours to 6 hours, and identifies opportunities for the next five calendar quarters instead of four quarters. It also lets partners and Cisco employees dynamically change the criteria for identifying opportunities. With our Hadoop architecture, analysis of partner sales opportunities completes in approximately one-tenth the time it did on our traditional data analysis architecture, and at one-tenth the cost, says Bhargava. With our Hadoop architecture, analysis of partner sales opportunities completes in approximately one-tenth the time it did on our traditional data analysis architecture, and at one-tenth the cost. Piyush Bhargava, Cisco IT Distinguished Engineer Business benefits of the Cisco Partner Annuity Initiative include: Generating an anticipated US$40 million incremental revenue from partners in FY13: The solution processes 1.5 billion records daily, and we identified new service opportunities the same day we placed 2013 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 6 of 9
7 the system in production, says Nagapuri. Cisco is on track to reach the revenue goal. Improving the experience for Cisco partners, customers, and employees: Consolidating to a single source of truth helps to avoid the confusion on available service opportunities. Creating the foundation for other big data analytics projects: Moving the customer install base and service contracts to the Hadoop platform will provide more value in the future because other Cisco business teams can use it for their own initiatives. Increased Productivity by Making Intellectual Capital Easier to Find Many of the 68,000 employees at Cisco are knowledge workers who tend to search for content on company websites throughout the day. But most of the content is not tagged with all relevant keywords, which makes searches take longer. People relate to content in more ways than the original classification, says Singh. In addition, employees might not realize content already exists, leading them to invest time and money recreating content that is already available. To make intellectual capital easier to find, Cisco IT is replacing static, manual tagging with dynamic tagging based on user feedback. The program uses machine-learning techniques to examine usage patterns and also acts on user suggestions for new tags. The content auto-tagging program is currently in proof-of-concept, and Cisco IT is creating a Cisco Smart Business Architecture (SBA) for other companies. Measured Adoption of Collaboration Applications and Assessed Business Value Organizations that make significant investments in collaboration technology generally want to measure adoption and assess business value. The Organization Network Analytics program is currently a proof-of-concept. Its goal is to measure collaboration within the Cisco enterprise, develop a benchmark, and present the information on an intuitive executive dashboard (Figure 4). Our idea is to identify deviations from best practices and to measure effectiveness of change management, Singh says. Figure 4. Collaboration Benchmark Usage Analysis Sample Report (100s) Phone (10s) IM (10s) Audio Conf. Social Software Web Conf. Desktop Video Immersive Video Org. X Benchmark Best practice 2013 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 7 of 9
8 The Hadoop platform analyzes logs from collaboration tools such as Cisco Unified Communications, , Cisco TelePresence, Cisco WebEx, Cisco WebEx Social, and Cisco Jabber to reveal preferred communications methods and organizational dynamics. When business users studying collaboration enter analysis criteria, the program creates an interactive visualization of the social network. Next Steps Cisco IT continues to introduce different big data analytics programs and add more types of analytics. Plans include: Identifying root causes of technical issues: The knowledge that support engineers acquire can remain hidden in case notes. We expect that mining case notes for root causes will unlock the value of this unstructured data, accelerating time to resolution for other customers, says Nagapuri The same information can contribute to better upstream systems, processes, and products. Analyzing service requests, an untapped source of business intelligence about product usage, reliability, and customer sentiment: A deeper understanding of product usage will help Cisco engineers optimize products, Singh says. The plan is to mash up data from service requests with quality data and service data. The underlying big data analytics techniques include text analytics, entity extraction, correlation, sentiment analysis, alerting, and machine learning. Supporting use cases where NoSQL provides improved performance or scalability compared to traditional relational databases. Scaling the architecture by adding another 60 nodes: Cisco IT is deciding whether to add the nodes to the same Cisco UCS cluster or build another cluster that connects to the first using Cisco Nexus switches. Lesson Learned Cisco IT shares the following observations with other organizations that are planning big data analytics programs. Technology Hive is best suited for structured data processing, but has limited SQL support. Sqoop scales well for large data loads. Network File System (NFS) saves time and effort for data loads. Cisco TES simplifies job scheduling, process orchestration and accelerates debugging. Creating a library of user-defined functions (UDF) for Hive and Pig helps to increase developer productivity. Once internal users realize that IT can offer big data analytics, demand tends to grow very quickly. Using Cisco UCS Common Platform Architecture (CPA) for Big Data, Cisco IT built a scalable Hadoop platform that can support up to 160 servers in a single switching domain. Organization Build internal Hadoop skills. People keep identifying new use cases for big data analytics, and building internal skills to implement the use cases will help us keep up with demand, says Singh. Educate internal users that they can now analyze unstructured data ( , webpages, documents, and so on) in addition to databases Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 8 of 9
9 For More Information Cisco IT case studies on a variety of business solutions, visit Cisco on Cisco: Inside Cisco IT Cisco UCS CPA for Big Data: blogs.cisco.com/datacenter/cpa/ Cisco Tidal Enterprise Scheduler: Note This publication describes how Cisco has benefited from the deployment of its own products. Many factors may have contributed to the results and benefits described; Cisco does not guarantee comparable results elsewhere. CISCO PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some jurisdictions do not allow disclaimer of express or implied warranties, therefore this disclaimer may not apply to you Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 9 of 9
Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads
Solution Overview Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads What You Will Learn MapR Hadoop clusters on Cisco Unified Computing System (Cisco UCS
Cisco IT Hadoop Journey
Cisco IT Hadoop Journey Alex Garbarini, IT Engineer, Cisco 2015 MapR Technologies 1 Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases
Cisco IT Automates Workloads for Big Data Analytics Environments
Cisco IT Case Study - September 2013 Cisco Tidal Enterprise Scheduler and Big Data Cisco IT Automates Workloads for Big Data Analytics Environments Cisco Tidal Enterprise Scheduler eliminates time-consuming
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper
Platfora Big Data Analytics
Platfora Big Data Analytics ISV Partner Solution Case Study and Cisco Unified Computing System Platfora, the leading enterprise big data analytics platform built natively on Hadoop and Spark, delivers
Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System
Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance
White Paper. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
White Paper Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations Contents Next-Generation Hadoop Solution... 3 Greenplum MR: Hadoop Reengineered... 3 : The Exclusive
Get More Scalability and Flexibility for Big Data
Solution Overview LexisNexis High-Performance Computing Cluster Systems Platform Get More Scalability and Flexibility for What You Will Learn Modern enterprises are challenged with the need to store and
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
BIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database
Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Built up on Cisco s big data common platform architecture (CPA), a
Virtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
Deploying Hadoop with Manager
Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer [email protected] Alejandro Bonilla / Sales Engineer [email protected] 2 Hadoop Core Components 3 Typical Hadoop Distribution
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
Big Data Introduction
Big Data Introduction Ralf Lange Global ISV & OEM Sales 1 Copyright 2012, Oracle and/or its affiliates. All rights Conventional infrastructure 2 Copyright 2012, Oracle and/or its affiliates. All rights
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
Hadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
Testing Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
Big Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe 20-22 May, 2013
Dubrovnik, Croatia, South East Europe 20-22 May, 2013 Big Data Value, use cases and architectures Petar Torre Lead Architect Service Provider Group 2011 2013 Cisco and/or its affiliates. All rights reserved.
Getting Started with Hadoop. Raanan Dagan Paul Tibaldi
Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop
Cisco IT Hadoop Journey
Cisco IT Hadoop Journey Srini Desikan, Program Manager IT 2015 MapR Technologies 1 Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases
Apache Hadoop: Past, Present, and Future
The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer [email protected], twitter: @awadallah Hadoop Past
Large scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
Qsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
Real-Time Big Data Analytics for the Enterprise
White Paper Intel Distribution for Apache Hadoop* Big Data Real-Time Big Data Analytics for the Enterprise SAP HANA* and the Intel Distribution for Apache Hadoop* Software Executive Summary Companies are
Optimally Manage the Data Center Using Systems Management Tools from Cisco and Microsoft
White Paper Optimally Manage the Data Center Using Systems Management Tools from Cisco and Microsoft What You Will Learn Cisco is continuously innovating to help businesses reinvent the enterprise data
Certified Big Data and Apache Hadoop Developer VS-1221
Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification
Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
HDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
Interactive data analytics drive insights
Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp
Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp Introduction to Hadoop Comes from Internet companies Emerging big data storage and analytics platform HDFS and MapReduce
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
IBM System x reference architecture for Hadoop: MapR
IBM System x reference architecture for Hadoop: MapR May 2014 Beth L Hoffman and Billy Robinson (IBM) Andy Lerner and James Sun (MapR Technologies) Copyright IBM Corporation, 2014 Table of contents Introduction...
Cisco Data Preparation
Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and
CSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing
Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches
Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches Introduction For companies that want to quickly gain insights into or opportunities from big data - the dramatic volume growth in corporate
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
Big Data Course Highlights
Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like
Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage
Cisco for SAP HANA Scale-Out Solution Solution Brief December 2014 With Intelligent Intel Xeon Processors Highlights Scale SAP HANA on Demand Scale-out capabilities, combined with high-performance NetApp
Testing 3Vs (Volume, Variety and Velocity) of Big Data
Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used
Open source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools
Hadoop Cluster Applications
Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday
Dell Reference Configuration for Hortonworks Data Platform
Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution
MapR Enterprise Edition & Enterprise Database Edition
MapR Enterprise Edition & Enterprise Database Edition Reference Architecture A PSSC Labs Reference Architecture Guide June 2015 Introduction PSSC Labs continues to bring innovative compute server and cluster
IT@Intel How Intel IT Successfully Migrated to Cloudera Apache Hadoop*
White Paper April 2015 IT@Intel How Intel IT Successfully Migrated to Cloudera Apache Hadoop* From our original experience with Apache Hadoop software, Intel IT identified new opportunities to reduce IT
Real-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software
Real-Time Big Data Analytics with the Intel Distribution for Apache Hadoop software Executive Summary is already helping businesses extract value out of Big Data by enabling real-time analysis of diverse
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing
Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.
Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in
Comprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
Big Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,[email protected]
Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,[email protected] Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
Unified Computing Systems
Unified Computing Systems Cisco Unified Computing Systems simplify your data center architecture; reduce the number of devices to purchase, deploy, and maintain; and improve speed and agility. Cisco Unified
Big Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014
Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/
The Future of Data Management with Hadoop and the Enterprise Data Hub
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
IBM System x reference architecture solutions for big data
IBM System x reference architecture solutions for big data Easy-to-implement hardware, software and services for analyzing data at rest and data in motion Highlights Accelerates time-to-value with scalable,
How To Build A Cisco Ukcsob420 M3 Blade Server
Data Sheet Cisco UCS B420 M3 Blade Server Product Overview The Cisco Unified Computing System (Cisco UCS ) combines Cisco UCS B-Series Blade Servers and C-Series Rack Servers with networking and storage
Complete Java Classes Hadoop Syllabus Contact No: 8888022204
1) Introduction to BigData & Hadoop What is Big Data? Why all industries are talking about Big Data? What are the issues in Big Data? Storage What are the challenges for storing big data? Processing What
End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING
WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING Using Cloudera to Improve Data Processing CLOUDERA WHITE PAPER 2 Table of Contents What is Data Processing? 3 Challenges 4 Flexibility and Data Quality
Move Data from Oracle to Hadoop and Gain New Business Insights
Move Data from Oracle to Hadoop and Gain New Business Insights Written by Lenka Vanek, senior director of engineering, Dell Software Abstract Today, the majority of data for transaction processing resides
Big Data: Beyond the Hype
Big Data: Beyond the Hype Why Big Data Matters to You WHITE PAPER Big Data: Beyond the Hype Why Big Data Matters to You By DataStax Corporation October 2011 Table of Contents Introduction...4 Big Data
Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
Introduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
Peers Techno log ies Pv t. L td. HADOOP
Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and
IBM InfoSphere BigInsights Enterprise Edition
IBM InfoSphere BigInsights Enterprise Edition Efficiently manage and mine big data for valuable insights Highlights Advanced analytics for structured, semi-structured and unstructured data Professional-grade
HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM
HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction
Microsoft SQL Server 2012 with Hadoop
Microsoft SQL Server 2012 with Hadoop Debarchan Sarkar Chapter No. 1 "Introduction to Big Data and Hadoop" In this package, you will find: A Biography of the author of the book A preview chapter from the
A Brief Outline on Bigdata Hadoop
A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is
An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
MANAGEMENT AND ORCHESTRATION WORKFLOW AUTOMATION FOR VBLOCK INFRASTRUCTURE PLATFORMS
VCE Word Template Table of Contents www.vce.com MANAGEMENT AND ORCHESTRATION WORKFLOW AUTOMATION FOR VBLOCK INFRASTRUCTURE PLATFORMS January 2012 VCE Authors: Changbin Gong: Lead Solution Architect Michael
An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise
An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise Solutions Group The following is intended to outline our
I/O Considerations in Big Data Analytics
Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very
How to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
Next-Generation Cloud Analytics with Amazon Redshift
Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional
A Platform Built for Server Virtualization: Cisco Unified Computing System
A Platform Built for Server Virtualization: Cisco Unified Computing System What You Will Learn This document discusses how the core features of the Cisco Unified Computing System contribute to the ease
Big Data for Investment Research Management
IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment Management firms turn big data into actionable
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
MySQL and Hadoop. Percona Live 2014 Chris Schneider
MySQL and Hadoop Percona Live 2014 Chris Schneider About Me Chris Schneider, Database Architect @ Groupon Spent the last 10 years building MySQL architecture for multiple companies Worked with Hadoop for
OnX Big Data Reference Architecture
OnX Big Data Reference Architecture Knowledge is Power when it comes to Business Strategy The business landscape of decision-making is converging during a period in which: > Data is considered by most
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
