Testing Spark: Best Practices
|
|
|
- Avis Cobb
- 10 years ago
- Views:
Transcription
1 Testing Spark: Best Practices Anupama Shetty Neil Marshall Senior SDET, Analytics, Ooyala Inc SDET, Analytics, Ooyala Inc Spark Summit 2014
2 Agenda - Anu 1. Application 2. Test 3. Best Overview Batch mode Streaming mode with kafka Overview Test environment setup Unit testing spark applications Integration testing spark applications Practices Code coverage support with scoverage and scct Auto build trigger using jenkins hook via github
3 Agenda - Neil 4. Performance testing of Spark Architecture & technology overview Performance testing setup & run Result analysis Best practices
4 Company Overview Founded in employees worldwide Global footprint of 200M unique users in 130 countries Ooyala works with the most successful broadcasts and media companies in the world Reach, measure, monetize video business Cross-device video analytics and monetization products and services
5 Application Overview Analytics ETL pipeline service Receives 5B+ player generated events such as plays, displays on a daily basis. Computed metrics include player conversion rate, video conversion rate and engagement metrics. Third party services used are Spark 1.0 used to process player generated big data. Kafka with Zookeeper as our message queue CDH5 HDFS as our intermediate storage file system
6 Spark based Log Processor details Supports two input data formats Json Thrift Batch Mode Support Uses Spark Context Consumes input data via a text file Streaming Mode Support Uses Spark streaming context Consumes data via kafka stream
7 Test pipeline setup Player simulation done using Watir (ruby gem based on Selenium). Kafka(with zookeeper) setup as local virtual machine using vagrant. VMs can be monitored using VirtualBox. Spark cluster run in local mode.
8 Unit test setup - Spark in Batch mode Spark cluster setup for testing Build your spark application jar using `sbt assembly ` Create config with spark.jar set to application jar and spark.master to local var config = ConfigFactory parsestring """spark.jar = "target/scala-2.10 /SparkLogProcessor.jar",spark.master = "local" """ Store local spark directory path for spark context creation val sparkdir = <path to local spark directory> + spark-0.9.0incubating-bin-hadoop2/assembly/target/scala-2.10/spark-assembly_ incubating-hadoop2.2.0.jar").mkString Creating spark context var sc: SparkContext = new SparkContext("local", getclass.getsimplename, sparkdir, List(config.getString("spark.jar")))
9 Test Setup for batch mode using Spark Context Before block After block Scala test framework FunSpec is used with ShouldMatchers (for assertions) and BeforeAndAfter (for setup/teardown).
10 Kafka setup for spark streaming Bring up Kafka virtual machine using Vagrantfile with following command `vagrant up kafkavm` Configure Kafka Create topic `bin/kafka-create-topic.sh --zookeeper "localhost:2181" --topic "thrift_pings"` Consume messages using `bin/kafka-console-consumer.sh --zookeeper "localhost:2181" --topic "thrift_pings" --group "testthrift" &>/tmp/thrift-consumer-msgs.log &`
11 Testing streaming mode with Spark Streaming Context
12 Test After block and assertion block for spark streaming mode After Block Test Assertion
13 Testing best practices - Code Coverage Tracking code coverage with Scoverage and/or Scct Enable fork = true to avoid spark exceptions caused by spark context conflicts. SCCT configurations ScctPlugin.instrumentSettings parallelexecution in ScctTest := false fork in ScctTest := true Command to run it - `sbt scct:test ` Scoverage configurations ScoverageSbtPlugin.instrumentSettings ScoverageSbtPlugin.ScoverageKeys.excludedPackages in ScoverageSbtPlugin.scoverage := ".*benchmark.*;.*util.* parallelexecution in ScoverageSbtPlugin.scoverageTest := false fork in ScoverageSbtPlugin.scoverageTest := true Command to run it - `sbt scoverage:test `
14 Testing best practices - Jenkins auto test build trigger Requires enabling 'github-webhook' on github repo settings page. Requires admin access for the repo. Jenkins job should be configured with corresponding github repo via GitHub Project field. Test jenkins hook by triggering a test run from github repo. "Github pull request builder" can be used while configuring jenkins job to auto publish test results on github pull requests after every test run. This also lets you rerun failed tests via github pull request.
15 What is a performance testing? A practice striving to build performance into the implementation, design and architecture of a system. Determine how a system performs in terms of responsiveness and stability under a particular workload. Can serve to investigate, measure, validate or verify other quality attributes of a system, such as scalability, reliability and resource usage.
16 What is a Gatling? Stress test tool
17 Why is Gatling selected over other Perf Test tools as JMeter? Powerful scripting using Scala Akka + Netty Run multiple scenarios in one simulation Scenarios = code + DSL Graphical reports with clear & concise graphs
18 How does Gatling work with Spark Access Web applications / services
19 Develop & setup a simple perf test example A perf test will run against spark-jobserver for word counts.
20 What is a spark jobserver? Provides a RESTful interface for submitting and managing Apache Spark jobs, jars and job contexts Scala CDH5/Hadoop Spark For more depths on jobserver, see Evan Chan & Kelvin Chu s Spark Query Service presentation.
21 Steps to set up & run Spark-jobserver Clone spark-jobserver from git-hub $ git clone Install SBT and type sbt in the sparkjobserver repo $ sbt From SBT shell, simply type re-start > re-start
22 Steps to package & upload a jar to the jobserver Package the test jar of the word count example $ sbt job-server-tests/package Upload the jar to the jobserver $ curl localhost:8090/jars/test
23 Run a request against the jobserver $ curl -d "input.string = a b c a b see" ' appname=test&classpath=spark.jobserver.wordcountexample&sync=true' { "status": "OK", "result": { "a": 2, "b": 2, "c": 1, "see": 1 } }
24 Source code of Word Count Example
25 Script Gatling for the Word Count Example Scenario defines steps that Gatling does during a runtime:
26 Script Gatling for the Word Count Example Setup puts users and scenarios as workflows plus assertions together in a performance test simulation Inject 10 users in 10 seconds into scenarios in 2 cycles Ensure successful requests greater than 80%
27 Test Results in Terminal Window
28 Gatling Graph - Indicator
29 Gatling Graph - Active Sessions
30 Best Practices on Performance Tests Run performance tests on Jenkins Set up baselines for any of performance tests with different scenarios & users
31 Any Questions?
32 References Contact Info: Anupama Shetty: Neil Marshall: References:
Spark Job Server. Evan Chan and Kelvin Chu. Date
Spark Job Server Evan Chan and Kelvin Chu Date Overview Why We Needed a Job Server Created at Ooyala in 2013 Our vision for Spark is as a multi-team big data service What gets repeated by every team: Bastion
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN Issac Buenrostro, Arup Malakar Spark Summit 2014 July 1, 2014 About Ooyala Cross-device video analytics and monetization products and services Founded
Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook
Talend Real-Time Big Data Talend Real-Time Big Data Overview of Real-time Big Data Pre-requisites to run Setup & Talend License Talend Real-Time Big Data Big Data Setup & About this cookbook What is the
Automating Big Data Benchmarking for Different Architectures with ALOJA
www.bsc.es Jan 2016 Automating Big Data Benchmarking for Different Architectures with ALOJA Nicolas Poggi, Postdoc Researcher Agenda 1. Intro on Hadoop performance 1. Current scenario and problematic 2.
Data Pipeline with Kafka
Data Pipeline with Kafka Peerapat Asoktummarungsri AGODA Senior Software Engineer Agoda.com Contributor Thai Java User Group (THJUG.com) Contributor Agile66 AGENDA Big Data & Data Pipeline Kafka Introduction
Wisdom from Crowds of Machines
Wisdom from Crowds of Machines Analytics and Big Data Summit September 19, 2013 Chetan Conikee Irfan Ahmad About Us CloudPhysics' mission is to discover the underlying principles that govern systems behavior
CI Pipeline with Docker 2015-02-27
CI Pipeline with Docker 2015-02-27 Juho Mäkinen, Technical Operations, Unity Technologies Finland http://www.juhonkoti.net http://github.com/garo Overview 1. Scale on how we use Docker 2. Overview on the
Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015
Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours
STREAM ANALYTIX. Industry s only Multi-Engine Streaming Analytics Platform
STREAM ANALYTIX Industry s only Multi-Engine Streaming Analytics Platform One Platform for All Create real-time streaming data analytics applications in minutes with a powerful visual editor Get a wide
Cloudera Manager Training: Hands-On Exercises
201408 Cloudera Manager Training: Hands-On Exercises General Notes... 2 In- Class Preparation: Accessing Your Cluster... 3 Self- Study Preparation: Creating Your Cluster... 4 Hands- On Exercise: Working
RHadoop Installation Guide for Red Hat Enterprise Linux
RHadoop Installation Guide for Red Hat Enterprise Linux Version 2.0.2 Update 2 Revolution R, Revolution R Enterprise, and Revolution Analytics are trademarks of Revolution Analytics. All other trademarks
CactoScale Guide User Guide. Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB)
CactoScale Guide User Guide Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB) Version History Version Date Change Author 0.1 12/10/2014 Initial version Athanasios Tsitsipas(UULM)
Big Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic
Big Data Analytics with Spark and Oscar BAO Tamas Jambor, Lead Data Scientist at Massive Analytic About me Building a scalable Machine Learning platform at MA Worked in Big Data and Data Science in the
DevOps. Building a Continuous Delivery Pipeline
DevOps Building a Continuous Delivery Pipeline Who Am I Bobby Warner Founder & President @bobbywarner What is the goal? Infrastructure as Code Write code to describe our infrastructure Never manually execute
Platform as a Service and Container Clouds
John Rofrano Senior Technical Staff Member, Cloud Automation Services, IBM Research [email protected] or [email protected] Platform as a Service and Container Clouds using IBM Bluemix and Docker for Cloud
CloudBees Continuous Integration and Test with Appvance Enterprise 7.0.1. August 28, 2013 Frank Cohen, [email protected], (408) 364-5508
CloudBees Continuous Integration and Test with Appvance Enterprise 7.0.1 August 28, 2013 Frank Cohen, [email protected], (408) 364-5508 The Missing Agile CI Results Database Extends CloudBees Jenkins
Openbus Documentation
Openbus Documentation Release 1 Produban February 17, 2014 Contents i ii An open source architecture able to process the massive amount of events that occur in a banking IT Infraestructure. Contents:
Building a logging pipeline with Open Source tools. Iñigo Ortiz de Urbina Cazenave
Building a logging pipeline with Open Source tools Iñigo Ortiz de Urbina Cazenave NLUUG Utrecht - Netherlands 28 May 2015 whoami; 2 Iñigo Ortiz de Urbina Cazenave Systems Engineer whoami; groups; 3 Iñigo
vcenter Operations Management Pack for SAP HANA Installation and Configuration Guide
vcenter Operations Management Pack for SAP HANA Installation and Configuration Guide This document supports the version of each product listed and supports all subsequent versions until a new edition replaces
Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf
Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf Rong Gu,Qianhao Dong 2014/09/05 0. Introduction As we want to have a performance framework for Tachyon, we need to consider two aspects
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
Hadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)
Day with Development Master Class Big Data Management System DW & Big Data Global Leaders Program Jean-Pierre Dijcks Big Data Product Management Server Technologies Part 1 Part 2 Foundation and Architecture
Assignment # 1 (Cloud Computing Security)
Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual
Important Notice. (c) 2010-2013 Cloudera, Inc. All rights reserved.
Hue 2 User Guide Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this document
Building a Continuous Integration Pipeline with Docker
Building a Continuous Integration Pipeline with Docker August 2015 Table of Contents Overview 3 Architectural Overview and Required Components 3 Architectural Components 3 Workflow 4 Environment Prerequisites
Windmill. Automated Testing for Web Applications
Windmill Automated Testing for Web Applications Demo! Requirements Quickly build regression tests Easily debug tests Run single test on all target browsers Easily fit in to continuous integration Other
Jenkins: The Definitive Guide
Jenkins: The Definitive Guide John Ferguson Smart O'REILLY8 Beijing Cambridge Farnham Koln Sebastopol Tokyo Table of Contents Foreword xiii Preface xv 1. Introducing Jenkins 1 Introduction 1 Continuous
Version Control Your Jenkins Jobs with Jenkins Job Builder
Version Control Your Jenkins Jobs with Jenkins Job Builder Abstract Wayne Warren [email protected] Puppet Labs uses Jenkins to automate building and testing software. While we do derive benefit from
Chase Wu New Jersey Ins0tute of Technology
CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at
Kaltura On-Prem Evaluation Package - Getting Started
Kaltura On-Prem Evaluation Package - Getting Started Thank you for your interest in the Kaltura On-Prem Online Video Platform (OVP). Before you get started with your Kaltura On-Prem evaluation, a Kaltura
Advantages and Disadvantages of Application Network Marketing Systems
Application Deployment Softwaretechnik II 2014/15 Thomas Kowark Outline Options for Application Hosting Automating Environment Setup Deployment Scripting Application Monitoring Continuous Deployment and
Shark Installation Guide Week 3 Report. Ankush Arora
Shark Installation Guide Week 3 Report Ankush Arora Last Updated: May 31,2014 CONTENTS Contents 1 Introduction 1 1.1 Shark..................................... 1 1.2 Apache Spark.................................
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache
Continuous Delivery for Alfresco Solutions. Satisfied customers and happy developers with!! Continuous Delivery!
Continuous Delivery for Alfresco Solutions Satisfied customers and happy developers with!! Continuous Delivery! About me Roeland Hofkens #rhofkens [email protected] http://opensource.westernacher.com
Load Testing How To. Load Testing Overview
Load Testing How To The process of load testing a web application can be a daunting task for someone new to QA Wizard Pro or to load testing in general. This How To walks you through planning, recording,
CHEF IN THE CLOUD AND ON THE GROUND
CHEF IN THE CLOUD AND ON THE GROUND Michael T. Nygard Relevance [email protected] @mtnygard Infrastructure As Code Infrastructure As Code Chef Infrastructure As Code Chef Development Models
Prepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
www.basho.com Technical Overview Simple, Scalable, Object Storage Software
www.basho.com Technical Overview Simple, Scalable, Object Storage Software Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces...
Creating Big Data Applications with Spring XD
Creating Big Data Applications with Spring XD Thomas Darimont @thomasdarimont THE FASTEST PATH TO NEW BUSINESS VALUE Journey Introduction Concepts Applications Outlook 3 Unless otherwise indicated, these
Dry Dock Documentation
Dry Dock Documentation Release 0.6.11 Taylor "Nekroze" Lawson December 19, 2014 Contents 1 Features 3 2 TODO 5 2.1 Contents:................................................. 5 2.2 Feedback.................................................
Solution White Paper Connect Hadoop to the Enterprise
Solution White Paper Connect Hadoop to the Enterprise Streamline workflow automation with BMC Control-M Application Integrator Table of Contents 1 EXECUTIVE SUMMARY 2 INTRODUCTION THE UNDERLYING CONCEPT
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
Informatica Data Director Performance
Informatica Data Director Performance 2011 Informatica Abstract A variety of performance and stress tests are run on the Informatica Data Director to ensure performance and scalability for a wide variety
Deploying Hadoop with Manager
Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer [email protected] Alejandro Bonilla / Sales Engineer [email protected] 2 Hadoop Core Components 3 Typical Hadoop Distribution
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress
Directions for VMware Ready Testing for Application Software
Directions for VMware Ready Testing for Application Software Introduction To be awarded the VMware ready logo for your product requires a modest amount of engineering work, assuming that the pre-requisites
Real-Time Analytical Processing (RTAP) Using the Spark Stack. Jason Dai [email protected] Intel Software and Services Group
Real-Time Analytical Processing (RTAP) Using the Spark Stack Jason Dai [email protected] Intel Software and Services Group Project Overview Research & open source projects initiated by AMPLab in UC Berkeley
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
The objective of this lab is to learn how to set up an environment for running distributed Hadoop applications.
Lab 9: Hadoop Development The objective of this lab is to learn how to set up an environment for running distributed Hadoop applications. Introduction Hadoop can be run in one of three modes: Standalone
Building a data analytics platform with Hadoop, Python and R
Building a data analytics platform with Hadoop, Python and R Agenda Me Sanoma Past Present Future 3 18 November 2013 /me @skieft Software architect for Sanoma Managing the data and search team Focus on
Five standard procedures for building the android system. Figure1. Procedures for building android embedded systems
Standard Operating Procedures for Android Embedded Systems Anupama M. Kulkarni, Shang-Yang Chang, Ying-Dar Lin National Chiao Tung University, Hsinchu, Taiwan November 2012 Android is considered to be
Peers Techno log ies Pv t. L td. HADOOP
Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and
Using GitHub for Rally Apps (Mac Version)
Using GitHub for Rally Apps (Mac Version) SOURCE DOCUMENT (must have a rallydev.com email address to access and edit) Introduction Rally has a working relationship with GitHub to enable customer collaboration
Ali Ghodsi Head of PM and Engineering Databricks
Making Big Data Simple Ali Ghodsi Head of PM and Engineering Databricks Big Data is Hard: A Big Data Project Tasks Tasks Build a Hadoop cluster Challenges Clusters hard to setup and manage Build a data
Cloudera Navigator Installation and User Guide
Cloudera Navigator Installation and User Guide Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or
Big Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage
Big Graph Analytics on Neo4j with Apache Spark Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage My background I only make it to the Open Stages :) Probably because Apache Neo4j
Adobe Summit 2015 Lab 712: Building Mobile Apps: A PhoneGap Enterprise Introduction for Developers
Adobe Summit 2015 Lab 712: Building Mobile Apps: A PhoneGap Enterprise Introduction for Developers 1 Table of Contents INTRODUCTION MODULE 1 AEM & PHONEGAP ENTERPRISE INTRODUCTION LESSON 1- AEM BASICS
SQL Server Instance-Level Benchmarks with DVDStore
SQL Server Instance-Level Benchmarks with DVDStore Dell developed a synthetic benchmark tool back that can run benchmark tests against SQL Server, Oracle, MySQL, and PostgreSQL installations. It is open-sourced
Sonatype CLM Enforcement Points - Continuous Integration (CI) Sonatype CLM Enforcement Points - Continuous Integration (CI)
Sonatype CLM Enforcement Points - Continuous Integration (CI) i Sonatype CLM Enforcement Points - Continuous Integration (CI) Sonatype CLM Enforcement Points - Continuous Integration (CI) ii Contents 1
Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box
Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box By Kavya Mugadur W1014808 1 Table of contents 1.What is CDH? 2. Hadoop Basics 3. Ways to install CDH 4. Installation and
Real Time Data Processing using Spark Streaming
Real Time Data Processing using Spark Streaming Hari Shreedharan, Software Engineer @ Cloudera Committer/PMC Member, Apache Flume Committer, Apache Sqoop Contributor, Apache Spark Author, Using Flume (O
Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya
Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming by Dibyendu Bhattacharya Pearson : What We Do? We are building a scalable, reliable cloud-based learning platform providing services
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
How to Run Spark Application
How to Run Spark Application Junghoon Kang Contents 1 Intro 2 2 How to Install Spark on a Local Machine? 2 2.1 On Ubuntu 14.04.................................... 2 3 How to Run Spark Application on a
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
HiBench Introduction. Carson Wang ([email protected]) Software & Services Group
HiBench Introduction Carson Wang ([email protected]) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is
Cray XC30 Hadoop Platform Jonathan (Bill) Sparks Howard Pritchard Martha Dumler
Cray XC30 Hadoop Platform Jonathan (Bill) Sparks Howard Pritchard Martha Dumler Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations.
Finding the Needle in a Big Data Haystack. Wolfgang Hoschek (@whoschek) JAX 2014
Finding the Needle in a Big Data Haystack Wolfgang Hoschek (@whoschek) JAX 2014 1 About Wolfgang Software Engineer @ Cloudera Search Platform Team Previously CERN, Lawrence Berkeley National Laboratory,
Deploying and Managing SolrCloud in the Cloud ApacheCon, April 8, 2014 Timothy Potter. Search Discover Analyze
Deploying and Managing SolrCloud in the Cloud ApacheCon, April 8, 2014 Timothy Potter Search Discover Analyze My SolrCloud Experience Currently, working on scaling up to a 200+ node deployment at LucidWorks
Putting It All Together. Vagrant Drush Version Control
Putting It All Together Vagrant Drush Version Control Vagrant Most Drupal developers now work on OSX. The Vagarant provisioning scripts may not work on Windows without subtle changes. If supplied, read
SUCCESFUL TESTING THE CONTINUOUS DELIVERY PROCESS
SUCCESFUL TESTING THE CONTINUOUS DELIVERY PROCESS @huibschoots & @mieldonkers INTRODUCTION Huib Schoots Tester @huibschoots Miel Donkers Developer @mieldonkers TYPICAL Experience with Continuous Delivery?
Distributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
SAIP 2012 Performance Engineering
SAIP 2012 Performance Engineering Author: Jens Edlef Møller ([email protected]) Instructions for installation, setup and use of tools. Introduction For the project assignment a number of tools will be used.
Technical Note: Configure SparkR to use TIBCO Enterprise Runtime for R
Technical Note: Configure SparkR to use TIBCO Enterprise Runtime for R Software Release 3.2 May 2015 Two-Second Advantage Configure SparkR to use TIBCO Enterprise Runtime for R 1 The SparkR package is
Summer Internship 2013 Group No.4-Enhancement of JMeter Week 1-Report-1 27/5/2013 Naman Choudhary
Summer Internship 2013 Group No.4-Enhancement of JMeter Week 1-Report-1 27/5/2013 Naman Choudhary For the first week I was given two papers to study. The first one was Web Service Testing Tools: A Comparative
Case Study - I. Industry: Social Networking Website Technology : J2EE AJAX, Spring, MySQL, Weblogic, Windows Server 2008.
Case Study - I Industry: Social Networking Website Technology : J2EE AJAX, Spring, MySQL, Weblogic, Windows Server 2008 Challenges The scalability of the database servers to execute batch processes under
Workshop: From Zero. Budapest DW Forum 2014
Workshop: From Zero to _ Budapest DW Forum 2014 Agenda today 1. Some setup before we start 2. (Back to the) introduction 3. Our workshop today 4. Part 2: a simple Scalding job on EMR Some setup before
Real-time Big Data Analytics with Storm
Ron Bodkin Founder & CEO, Think Big June 2013 Real-time Big Data Analytics with Storm Leading Provider of Data Science and Engineering Services Accelerating Your Time to Value IMAGINE Strategy and Roadmap
Unified Batch & Stream Processing Platform
Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built
How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5
Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark
Managing large clusters resources
Managing large clusters resources ID2210 Gautier Berthou (SICS) Big Processing with No Locality Job( /crawler/bot/jd.io/1 ) submi t Workflow Manager Compute Grid Node Job This doesn t scale. Bandwidth
Chef Patterns at Bloomberg Scale HADOOP INFRASTRUCTURE TEAM. https://github.com/bloomberg/chef-bach Freenode: #chef-bach
CHEF PATTERNS AT BLOOMBERG SCALE HADOOP INFRASTRUCTURE TEAM https://github.com/bloomberg/chef-bach Freenode: #chef-bach BLOOMBERG CLUSTERS 2 APPLICATION SPECIFIC Hadoop, Kafka ENVIRONMENT SPECIFIC Networking,
WebLogic Server Foundation Topology, Configuration and Administration
WebLogic Server Foundation Topology, Configuration and Administration Duško Vukmanović Senior Sales Consultant Agenda Topology Domain Server Admin Server Managed Server Cluster Node
Big Data Analytics with Cassandra, Spark & MLLib
Big Data Analytics with Cassandra, Spark & MLLib Matthias Niehoff AGENDA Spark Basics In A Cluster Cassandra Spark Connector Use Cases Spark Streaming Spark SQL Spark MLLib Live Demo CQL QUERYING LANGUAGE
CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment
CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment James Devine December 15, 2008 Abstract Mapreduce has been a very successful computational technique that has
Hudson Continous Integration Server. Stefan Saasen, [email protected]
Hudson Continous Integration Server Stefan Saasen, [email protected] Continous Integration Software development practice Members of a team integrate their work frequently Each integration is verified by
An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
FEEG6002 - Applied Programming 3 - Version Control and Git II
FEEG6002 - Applied Programming 3 - Version Control and Git II Sam Sinayoko 2015-10-16 1 / 26 Outline Learning outcomes Working with a single repository (review) Working with multiple versions of a repository
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
RHadoop and MapR. Accessing Enterprise- Grade Hadoop from R. Version 2.0 (14.March.2014)
RHadoop and MapR Accessing Enterprise- Grade Hadoop from R Version 2.0 (14.March.2014) Table of Contents Introduction... 3 Environment... 3 R... 3 Special Installation Notes... 4 Install R... 5 Install
Writing Standalone Spark Programs
Writing Standalone Spark Programs Matei Zaharia UC Berkeley www.spark- project.org UC BERKELEY Outline Setting up for Spark development Example: PageRank PageRank in Java Testing and debugging Building
DEPLOYING EMC DOCUMENTUM BUSINESS ACTIVITY MONITOR SERVER ON IBM WEBSPHERE APPLICATION SERVER CLUSTER
White Paper DEPLOYING EMC DOCUMENTUM BUSINESS ACTIVITY MONITOR SERVER ON IBM WEBSPHERE APPLICATION SERVER CLUSTER Abstract This white paper describes the process of deploying EMC Documentum Business Activity
SOA Solutions & Middleware Testing: White Paper
SOA Solutions & Middleware Testing: White Paper Version 1.1 (December 06, 2013) Table of Contents Introduction... 03 Solutions Testing (Beta Testing)... 03 1. Solutions Testing Methods... 03 1.1 End-to-End
