Write Once, Run Anywhere Pat McDonough

Size: px
Start display at page:

Download "Write Once, Run Anywhere Pat McDonough"

Transcription

1 Write Once, Run Anywhere Pat McDonough

2 Write Once, Run Anywhere

3 Write Once, Run Anywhere You Might Have Heard This Before!

4 Java, According to Wikipedia

5 Java, According to Wikipedia Java is a computer programming language specifically designed to have as few implementation dependencies as possible. It is intended to let application developers "write once, run anywhere" (WORA)

6 Java & WORA in the First Decade Java Client Applications Apps with GUIs (AWT or Swing) could be deployed to any OS with a JVM

7 Java & WORA in the First Decade Java Client Applications Apps with GUIs (AWT or Swing) could be deployed to any OS with a JVM Neat! but not all that useful - people don t want non-native GUIs

8 Java & WORA in the First Decade

9 Java & WORA in the First Decade Applets A way to deliver rich GUIs to many different platforms through the browser [Insert Ugly Applet Here]

10 Java & WORA in the First Decade Applets A way to deliver rich GUIs to many different platforms through the browser [Insert Ugly Applet Here] Neat!

11 Java & WORA in the First Decade Applets A way to deliver rich GUIs to many different platforms through the browser [Insert Ugly Applet Here] Neat! but basically ended at producing many gimmicky website animations

12 Java & WORA in the First Decade Back-end Applications! Windows Desktop for an IDE Unix Server for Production Neat! And actually useful too

13 Java & WORA in the First Back-end Java starts to formalize around standards > J2EE Decade Core libraries, deployment formats, etc. Vendors Offer J2EE App Servers Ironically, this immediately lead to no more WORA Specific App Servers required a specific SDK or even a specific IDE

14 Java & WORA in the First Decade Fixing WORA on the back-end: Fall back to the Least Common Denominator > Servlets (usually via Tomcat) Spring comes about to dominate as the SDK of choice for Java back-end applications specifically designed to have as few implementation dependencies as possible

15 So yes, you ve heard this before

16 So yes, you ve heard this before Which examples apply to the state of Big Data Ecosystem?

17 Important Changes Since Then Vendor Standards Open Source Data has overwhelmed us Distributed Systems Are The New Standard (specifically, Data Parallel systems)

18 Big Data Platforms Are Everywhere Now But Where Are the Big Data Applications? Big Data Applications don t exist very far beyond connecting ODBC/JDBC or simple ETL integrations Why?! Too many disparate systems to piece together Complicated matrix of compile-time and runtime dependencies across distributions i.e. each distribution effectively has it s own SDK

19 The Big Data Ecosystem Needs a Common SDK

20 The Big Data Ecosystem Needs a Common SDK Apache Spark is the answer

21 Spark An SDK for Big Data Applications SQL MLlib Streaming GraphX Core

22 Spark An SDK for Big Data Applications SQL MLlib Streaming GraphX Core Unified System With Libraries to Build a Complete Solution! Full-featured Programming Environment

23 Spark An SDK for Big Data Applications SQL MLlib Streaming GraphX Core Unified System With Libraries to Build a Complete Solution! Full-featured Programming Environment Single, Consistent Interface for Developers to Write Against! Runtimes available on several platforms

24 Develop Big Data Applications Python/Scala/Java SQL MLlib Streaming GraphX Dependencies Core Your Application

25 Develop Big Data Applications SQL MLlib Streaming GraphX Python/Scala/Java Dependencies Your Application Spark APIs Core Develop Applications using your preferred language, using existing libraries, using Spark s Public APIs (SparkContext, RDDs)

26 Work With Data SQL MLlib Streaming GraphX Python/Scala/Java Dependencies Core Your Application Data HDFS* Local S3 JDBC Cassandra

27 Work With Data Python/Scala/Java Dependencies Your Application Spark APIs SQL MLlib Streaming GraphX Core Spark Internals Care For Scheduling Data Operations Data Access & Scheduling Data HDFS* Local S3 JDBC Cassandra

28 Run Your Applications Python/Scala/Java SQL MLlib Streaming GraphX Dependencies Core Your Application YARN Mesos Spark Standalone Cluster

29 Run Your Applications Python/Scala/Java SQL MLlib Streaming GraphX Dependencies Core Your Application Submit Your Application and the Spark Runtime to a Cluster Manager YARN Mesos Spark Standalone Cluster

30 The Complete Picture Python/Scala/Java SQL MLlib Streaming GraphX Dependencies Core Your Application YARN Mesos Spark Standalone Clusters Data HDFS* Local S3 JDBC Cassandra

31 The Complete Picture Python/Scala/Java SQL MLlib Streaming GraphX Dependencies Core Your Application Spark Abstracts Runtime Dependencies from Developers YARN Mesos Spark Standalone Clusters Data HDFS* Local S3 JDBC Cassandra

32 How Spark Handles Hadoop Dependencies The Spark library is compiled with compatibility to a specific Hadoop version SQL MLlib Streaming GraphX At runtime, Spark uses reflection to find any Hadoop classes it needs Core Examples: # Apache Hadoop 2.2.X mvn -Pyarn -Phadoop-2.2 \ -Dhadoop.version=2.2.0 \ -DskipTests clean package # CDH with MapReduce v1 mvn -Dhadoop.version= mr1-cdh DskipTests \ clean package

33 How Spark Handles Hadoop Dependencies The Spark library is compiled with compatibility to a specific Hadoop version SQL MLlib Streaming GraphX At runtime, Spark uses reflection to find any Hadoop classes it needs Examples: Hadoop Client Core # Apache Hadoop 2.2.X mvn -Pyarn -Phadoop-2.2 \ -Dhadoop.version=2.2.0 \ -DskipTests clean package # CDH with MapReduce v1 mvn -Dhadoop.version= mr1-cdh DskipTests \ clean package

34 Spark Support Included on Big Data Platforms While this build process is very easy, it s even easier to have the runtime pre-built Platform support also indicates stronger integration testing, supported, and integrated management tools SQL MLlib Streaming GraphX Hadoop Client Core

35 Spark 1.0

36 Spark-Submit Spark-submit provides a consistent manner to launch jobs regardless of which platform Includes an important clean-up to make configurations more consistent # Run on a Spark standalone cluster./bin/spark-submit \ --class org.apache.spark.examples.sparkpi \ --master spark:// :7077 \ --executor-memory 20G \ --total-executor-cores 100 \ /path/to/examples.jar \ 1000! # Run on a YARN cluster export HADOOP_CONF_DIR=XXX./bin/spark-submit \ --class org.apache.spark.examples.sparkpi \ --master yarn-cluster \ --executor-memory 20G \ --num-executors 50 \ /path/to/examples.jar \ 1000

37 Spark SQL We actually wrestled with the name a bit because it s not only about SQL SQL is actually not the only developer interface - there is also a DSL SparkSQL introduces SchemaRDDs and an Optimizer (Catalyst) This provides a deeper integration for any structured data val sqlcontext = new org.apache.spark.sql.sqlcontext(sc) import sqlcontext._ val people: RDD[Person] =... // An RDD of case class objects! // The following is the same as // SELECT name FROM people WHERE age >= 10 AND age <= 19' val teenagers = people.where('age >= 10).where('age <= 19).select('name)

38 Databricks Is Committed to Growing Apache Spark s Developer Ecosystem Developer Training, Online Materials, Free Resources Strong Commitment to Open Source Certification Programs

39 We re Hiring! Evangelists Trainers Solutions Architects Software Engineers

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically

More information

How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5

How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5 Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Apache Spark Apache Spark 1

More information

Moving From Hadoop to Spark

Moving From Hadoop to Spark + Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee

More information

Architectures for massive data management

Architectures for massive data management Architectures for massive data management Apache Spark Albert Bifet albert.bifet@telecom-paristech.fr October 20, 2015 Spark Motivation Apache Spark Figure: IBM and Apache Spark What is Apache Spark Apache

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

Unified Big Data Analytics Pipeline. 连 城 lian@databricks.com

Unified Big Data Analytics Pipeline. 连 城 lian@databricks.com Unified Big Data Analytics Pipeline 连 城 lian@databricks.com What is A fast and general engine for large-scale data processing An open source implementation of Resilient Distributed Datasets (RDD) Has an

More information

Shark Installation Guide Week 3 Report. Ankush Arora

Shark Installation Guide Week 3 Report. Ankush Arora Shark Installation Guide Week 3 Report Ankush Arora Last Updated: May 31,2014 CONTENTS Contents 1 Introduction 1 1.1 Shark..................................... 1 1.2 Apache Spark.................................

More information

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce

More information

Ali Ghodsi Head of PM and Engineering Databricks

Ali Ghodsi Head of PM and Engineering Databricks Making Big Data Simple Ali Ghodsi Head of PM and Engineering Databricks Big Data is Hard: A Big Data Project Tasks Tasks Build a Hadoop cluster Challenges Clusters hard to setup and manage Build a data

More information

CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University CS 555: DISTRIBUTED SYSTEMS [SPARK] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Streaming Significance of minimum delays? Interleaving

More information

Embracing Spark as the Scalable Data Analytics Platform for the Enterprise

Embracing Spark as the Scalable Data Analytics Platform for the Enterprise Embracing Spark as the Scalable Data Analytics Platform for the Enterprise Matthew J. Glickman GS.com/Engineering Spark Summit East 2015 1 How did we get here today? Strata + Hadoop World October 2014

More information

Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com

Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com Spark Fast & Expressive Cluster computing engine Compatible with Hadoop Came

More information

Big Data Research in the AMPLab: BDAS and Beyond

Big Data Research in the AMPLab: BDAS and Beyond Big Data Research in the AMPLab: BDAS and Beyond Michael Franklin UC Berkeley 1 st Spark Summit December 2, 2013 UC BERKELEY AMPLab: Collaborative Big Data Research Launched: January 2011, 6 year planned

More information

Big Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic

Big Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic Big Data Analytics with Spark and Oscar BAO Tamas Jambor, Lead Data Scientist at Massive Analytic About me Building a scalable Machine Learning platform at MA Worked in Big Data and Data Science in the

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Functional Query Optimization with" SQL

Functional Query Optimization with SQL " Functional Query Optimization with" SQL Michael Armbrust @michaelarmbrust spark.apache.org What is Apache Spark? Fast and general cluster computing system interoperable with Hadoop Improves efficiency

More information

How Companies are! Using Spark

How Companies are! Using Spark How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made

More information

Beyond Hadoop with Apache Spark and BDAS

Beyond Hadoop with Apache Spark and BDAS Beyond Hadoop with Apache Spark and BDAS Khanderao Kand Principal Technologist, Guavus 12 April GITPRO World 2014 Palo Alto, CA Credit: Some stajsjcs and content came from presentajons from publicly shared

More information

Technical White Paper The Excel Reporting Solution for Java

Technical White Paper The Excel Reporting Solution for Java Technical White Paper The Excel Reporting Solution for Java Using Actuate e.spreadsheet Engine as a foundation for web-based reporting applications, Java developers can greatly enhance the productivity

More information

Spark Application Carousel. Spark Summit East 2015

Spark Application Carousel. Spark Summit East 2015 Spark Application Carousel Spark Summit East 2015 About Today s Talk About Me: Vida Ha - Solutions Engineer at Databricks. Goal: For beginning/early intermediate Spark Developers. Motivate you to start

More information

LAB 2 SPARK / D-STREAM PROGRAMMING SCIENTIFIC APPLICATIONS FOR IOT WORKSHOP

LAB 2 SPARK / D-STREAM PROGRAMMING SCIENTIFIC APPLICATIONS FOR IOT WORKSHOP LAB 2 SPARK / D-STREAM PROGRAMMING SCIENTIFIC APPLICATIONS FOR IOT WORKSHOP ICTP, Trieste, March 24th 2015 The objectives of this session are: Understand the Spark RDD programming model Familiarize with

More information

This is a brief tutorial that explains the basics of Spark SQL programming.

This is a brief tutorial that explains the basics of Spark SQL programming. About the Tutorial Apache Spark is a lightning-fast cluster computing designed for fast computation. It was built on top of Hadoop MapReduce and it extends the MapReduce model to efficiently use more types

More information

Hadoop2, Spark Big Data, real time, machine learning & use cases. Cédric Carbone Twitter : @carbone

Hadoop2, Spark Big Data, real time, machine learning & use cases. Cédric Carbone Twitter : @carbone Hadoop2, Spark Big Data, real time, machine learning & use cases Cédric Carbone Twitter : @carbone Agenda Map Reduce Hadoop v1 limits Hadoop v2 and YARN Apache Spark Streaming : Spark vs Storm Machine

More information

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group HiBench Introduction Carson Wang (carson.wang@intel.com) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is

More information

Apache Flink Next-gen data analysis. Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas

Apache Flink Next-gen data analysis. Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas Apache Flink Next-gen data analysis Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas What is Flink Project undergoing incubation in the Apache Software Foundation Originating from the Stratosphere research

More information

BIG DATA APPLICATIONS

BIG DATA APPLICATIONS BIG DATA ANALYTICS USING HADOOP AND SPARK ON HATHI Boyu Zhang Research Computing ITaP BIG DATA APPLICATIONS Big data has become one of the most important aspects in scientific computing and business analytics

More information

1. The orange button 2. Audio Type 3. Close apps 4. Enlarge my screen 5. Headphones 6. Questions Pane. SparkR 2

1. The orange button 2. Audio Type 3. Close apps 4. Enlarge my screen 5. Headphones 6. Questions Pane. SparkR 2 SparkR 1. The orange button 2. Audio Type 3. Close apps 4. Enlarge my screen 5. Headphones 6. Questions Pane SparkR 2 Lecture slides and/or video will be made available within one week Live Demonstration

More information

Spark and the Big Data Library

Spark and the Big Data Library Spark and the Big Data Library Reza Zadeh Thanks to Matei Zaharia Problem Data growing faster than processing speeds Only solution is to parallelize on large clusters» Wide use in both enterprises and

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Open Source Technologies on Microsoft Azure

Open Source Technologies on Microsoft Azure Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions

More information

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source DMITRIY SETRAKYAN Founder, PPMC http://www.ignite.incubator.apache.org @apacheignite @dsetrakyan Agenda About In- Memory

More information

Streaming items through a cluster with Spark Streaming

Streaming items through a cluster with Spark Streaming Streaming items through a cluster with Spark Streaming Tathagata TD Das @tathadas CME 323: Distributed Algorithms and Optimization Stanford, May 6, 2015 Who am I? > Project Management Committee (PMC) member

More information

Introduction to Spark

Introduction to Spark Introduction to Spark Shannon Quinn (with thanks to Paco Nathan and Databricks) Quick Demo Quick Demo API Hooks Scala / Java All Java libraries *.jar http://www.scala- lang.org Python Anaconda: https://

More information

Dell In-Memory Appliance for Cloudera Enterprise

Dell In-Memory Appliance for Cloudera Enterprise Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/

More information

System requirements. Java SE Runtime Environment(JRE) 7 (32bit) Java SE Runtime Environment(JRE) 6 (64bit) Java SE Runtime Environment(JRE) 7 (64bit)

System requirements. Java SE Runtime Environment(JRE) 7 (32bit) Java SE Runtime Environment(JRE) 6 (64bit) Java SE Runtime Environment(JRE) 7 (64bit) Hitachi Solutions Geographical Information System Client Below conditions are system requirements for Hitachi Solutions Geographical Information System Client. 1/5 Hitachi Solutions Geographical Information

More information

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora {mbalassi, gyfora}@apache.org The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org What is Apache Flink? Open Source Started in 2009 by the Berlin-based database research groups In the Apache

More information

Brave New World: Hadoop vs. Spark

Brave New World: Hadoop vs. Spark Brave New World: Hadoop vs. Spark Dr. Kurt Stockinger Associate Professor of Computer Science Director of Studies in Data Science Zurich University of Applied Sciences Datalab Seminar, Zurich, Oct. 7,

More information

Big Data Rethink Algos and Architecture. Scott Marsh Manager R&D Personal Lines Auto Pricing

Big Data Rethink Algos and Architecture. Scott Marsh Manager R&D Personal Lines Auto Pricing Big Data Rethink Algos and Architecture Scott Marsh Manager R&D Personal Lines Auto Pricing Agenda History Map Reduce Algorithms History Google talks about their solutions to their problems Map Reduce:

More information

Crystal Reports for Eclipse

Crystal Reports for Eclipse Crystal Reports for Eclipse Table of Contents 1 Creating a Crystal Reports Web Application...2 2 Designing a Report off the Xtreme Embedded Derby Database... 11 3 Running a Crystal Reports Web Application...

More information

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1 Spark ΕΡΓΑΣΤΗΡΙΟ 10 Prepared by George Nikolaides 4/19/2015 1 Introduction to Apache Spark Another cluster computing framework Developed in the AMPLab at UC Berkeley Started in 2009 Open-sourced in 2010

More information

Real Time Data Processing using Spark Streaming

Real Time Data Processing using Spark Streaming Real Time Data Processing using Spark Streaming Hari Shreedharan, Software Engineer @ Cloudera Committer/PMC Member, Apache Flume Committer, Apache Sqoop Contributor, Apache Spark Author, Using Flume (O

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

Tableau Spark SQL Setup Instructions

Tableau Spark SQL Setup Instructions Tableau Spark SQL Setup Instructions 1. Prerequisites 2. Configuring Hive 3. Configuring Spark & Hive 4. Starting the Spark Service and the Spark Thrift Server 5. Connecting Tableau to Spark SQL 5A. Install

More information

Big Data Frameworks: Scala and Spark Tutorial

Big Data Frameworks: Scala and Spark Tutorial Big Data Frameworks: Scala and Spark Tutorial 13.03.2015 Eemil Lagerspetz, Ella Peltonen Professor Sasu Tarkoma These slides: http://is.gd/bigdatascala www.cs.helsinki.fi Functional Programming Functional

More information

Big Data Processing. Patrick Wendell Databricks

Big Data Processing. Patrick Wendell Databricks Big Data Processing Patrick Wendell Databricks About me Committer and PMC member of Apache Spark Former PhD student at Berkeley Left Berkeley to help found Databricks Now managing open source work at Databricks

More information

Big Data Analytics Hadoop and Spark

Big Data Analytics Hadoop and Spark Big Data Analytics Hadoop and Spark Shelly Garion, Ph.D. IBM Research Haifa 1 What is Big Data? 2 What is Big Data? Big data usually includes data sets with sizes beyond the ability of commonly used software

More information

PHP vs. Java. In this paper, I am not discussing following two issues since each is currently hotly debated in various communities:

PHP vs. Java. In this paper, I am not discussing following two issues since each is currently hotly debated in various communities: PHP vs. Java *This document reflects my opinion about PHP and Java. I have written this without any references. Let me know if there is a technical error. --Hasari Tosun It isn't correct to compare Java

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Roadmap Talend : découvrez les futures fonctionnalités de Talend Roadmap Talend : découvrez les futures fonctionnalités de Talend Cédric Carbone Talend Connect 9 octobre 2014 Talend 2014 1 Connecting the Data-Driven Enterprise Talend 2014 2 Agenda Agenda Why a Unified

More information

Internet Engineering: Web Application Architecture. Ali Kamandi Sharif University of Technology kamandi@ce.sharif.edu Fall 2007

Internet Engineering: Web Application Architecture. Ali Kamandi Sharif University of Technology kamandi@ce.sharif.edu Fall 2007 Internet Engineering: Web Application Architecture Ali Kamandi Sharif University of Technology kamandi@ce.sharif.edu Fall 2007 Centralized Architecture mainframe terminals terminals 2 Two Tier Application

More information

Hadoop & Spark Using Amazon EMR

Hadoop & Spark Using Amazon EMR Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Apache Spark and the future of big data applica5ons. Eric Baldeschwieler

Apache Spark and the future of big data applica5ons. Eric Baldeschwieler Apache Spark and the future of big data applica5ons Eric Baldeschwieler Who is Eric14? Big data veteran (since 1996) Databricks Tech Advisor Twitter handle: @jeric14 Previously CTO/CEO of Hortonworks Yahoo

More information

BEST WEB PROGRAMMING LANGUAGES TO LEARN ON YOUR OWN TIME

BEST WEB PROGRAMMING LANGUAGES TO LEARN ON YOUR OWN TIME BEST WEB PROGRAMMING LANGUAGES TO LEARN ON YOUR OWN TIME System Analysis and Design S.Mohammad Taheri S.Hamed Moghimi Fall 92 1 CHOOSE A PROGRAMMING LANGUAGE FOR THE PROJECT 2 CHOOSE A PROGRAMMING LANGUAGE

More information

What s Cooking in KNIME

What s Cooking in KNIME What s Cooking in KNIME Thomas Gabriel Copyright 2015 KNIME.com AG Agenda Querying NoSQL Databases Database Improvements & Big Data Copyright 2015 KNIME.com AG 2 Querying NoSQL Databases MongoDB & CouchDB

More information

Integrate Master Data with Big Data using Oracle Table Access for Hadoop

Integrate Master Data with Big Data using Oracle Table Access for Hadoop Integrate Master Data with Big Data using Oracle Table Access for Hadoop Kuassi Mensah Oracle Corporation Redwood Shores, CA, USA Keywords: Hadoop, BigData, Hive SQL, Spark SQL, HCatalog, StorageHandler

More information

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet Ema Iancuta iorhian@gmail.com Radu Chilom radu.chilom@gmail.com Buzzwords Berlin - 2015 Big data analytics / machine

More information

Certification Study Guide. MapR Certified Spark Developer Study Guide

Certification Study Guide. MapR Certified Spark Developer Study Guide Certification Study Guide MapR Certified Spark Developer Study Guide 1 CONTENTS About MapR Study Guides... 3 MapR Certified Spark Developer (MCSD)... 3 SECTION 1 WHAT S ON THE EXAM?... 5 1. Load and Inspect

More information

CloudStack and Big Data. Sebastien Goasguen @sebgoa May 22nd 2013 LinuxTag, Berlin

CloudStack and Big Data. Sebastien Goasguen @sebgoa May 22nd 2013 LinuxTag, Berlin CloudStack and Big Data Sebastien Goasguen @sebgoa May 22nd 2013 LinuxTag, Berlin Google trends Start of Clouds Cloud computing trending down, while Big Data is booming. Virtualization BigData on the Trigger

More information

NetBeans IDE Field Guide

NetBeans IDE Field Guide NetBeans IDE Field Guide Copyright 2005 Sun Microsystems, Inc. All rights reserved. Table of Contents Introduction to J2EE Development in NetBeans IDE...1 Configuring the IDE for J2EE Development...2 Getting

More information

STREAM ANALYTIX. Industry s only Multi-Engine Streaming Analytics Platform

STREAM ANALYTIX. Industry s only Multi-Engine Streaming Analytics Platform STREAM ANALYTIX Industry s only Multi-Engine Streaming Analytics Platform One Platform for All Create real-time streaming data analytics applications in minutes with a powerful visual editor Get a wide

More information

Apache Spark 11/10/15. Context. Reminder. Context. What is Spark? A GrowingStack

Apache Spark 11/10/15. Context. Reminder. Context. What is Spark? A GrowingStack Apache Spark Document Analysis Course (Fall 2015 - Scott Sanner) Zahra Iman Some slides from (Matei Zaharia, UC Berkeley / MIT& Harold Liu) Reminder SparkConf JavaSpark RDD: Resilient Distributed Datasets

More information

WHAT S NEW IN SAS 9.4

WHAT S NEW IN SAS 9.4 WHAT S NEW IN SAS 9.4 PLATFORM, HPA & SAS GRID COMPUTING MICHAEL GODDARD CHIEF ARCHITECT SAS INSTITUTE, NEW ZEALAND SAS 9.4 WHAT S NEW IN THE PLATFORM Platform update SAS Grid Computing update Hadoop support

More information

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.0.x

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.0.x HP Vertica Analytic Database Software Version: 7.0.x Document Release Date: 5/7/2014 Legal Notices Warranty The only warranties for HP products and services are set forth in the express warranty statements

More information

1 Building, Deploying and Testing DPES application

1 Building, Deploying and Testing DPES application 1 Building, Deploying and Testing DPES application This chapter provides updated instructions for accessing the sources code, developing, building and deploying the DPES application in the user environment.

More information

Fahim Uddin http://fahim.cooperativecorner.com email@fahim.cooperativecorner.com. 1. Java SDK

Fahim Uddin http://fahim.cooperativecorner.com email@fahim.cooperativecorner.com. 1. Java SDK PREPARING YOUR MACHINES WITH NECESSARY TOOLS FOR ANDROID DEVELOPMENT SEPTEMBER, 2012 Fahim Uddin http://fahim.cooperativecorner.com email@fahim.cooperativecorner.com Android SDK makes use of the Java SE

More information

Machine- Learning Summer School - 2015

Machine- Learning Summer School - 2015 Machine- Learning Summer School - 2015 Big Data Programming David Franke Vast.com hbp://www.cs.utexas.edu/~dfranke/ Goals for Today Issues to address when you have big data Understand two popular big data

More information

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. www.spark- project.org. University of California, Berkeley UC BERKELEY

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. www.spark- project.org. University of California, Berkeley UC BERKELEY Spark in Action Fast Big Data Analytics using Scala Matei Zaharia University of California, Berkeley www.spark- project.org UC BERKELEY My Background Grad student in the AMP Lab at UC Berkeley» 50- person

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

Learning. Spark LIGHTNING-FAST DATA ANALYTICS. Holden Karau, Andy Konwinski, Patrick Wendell & Matei Zaharia

Learning. Spark LIGHTNING-FAST DATA ANALYTICS. Holden Karau, Andy Konwinski, Patrick Wendell & Matei Zaharia Compliments of Learning Spark LIGHTNING-FAST DATA ANALYTICS Holden Karau, Andy Konwinski, Patrick Wendell & Matei Zaharia Bring Your Big Data to Life Big Data Integration and Analytics Learn how to power

More information

Distributed DataFrame on Spark: Simplifying Big Data For The Rest Of Us

Distributed DataFrame on Spark: Simplifying Big Data For The Rest Of Us DATA INTELLIGENCE FOR ALL Distributed DataFrame on Spark: Simplifying Big Data For The Rest Of Us Christopher Nguyen, PhD Co-Founder & CEO Agenda 1. Challenges & Motivation 2. DDF Overview 3. DDF Design

More information

BIG DATA SOLUTION DATA SHEET

BIG DATA SOLUTION DATA SHEET BIG DATA SOLUTION DATA SHEET Highlight. DATA SHEET HGrid247 BIG DATA SOLUTION Exploring your BIG DATA, get some deeper insight. It is possible! Another approach to access your BIG DATA with the latest

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

What s next for the Berkeley Data Analytics Stack?

What s next for the Berkeley Data Analytics Stack? What s next for the Berkeley Data Analytics Stack? Michael Franklin June 30th 2014 Spark Summit San Francisco UC BERKELEY AMPLab: Collaborative Big Data Research 60+ Students, Postdocs, Faculty and Staff

More information

JAVA WEB START OVERVIEW

JAVA WEB START OVERVIEW JAVA WEB START OVERVIEW White Paper May 2005 Sun Microsystems, Inc. Table of Contents Table of Contents 1 Introduction................................................................. 1 2 A Java Web Start

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based

More information

Apache Spark and Distributed Programming

Apache Spark and Distributed Programming Apache Spark and Distributed Programming Concurrent Programming Keijo Heljanko Department of Computer Science University School of Science November 25th, 2015 Slides by Keijo Heljanko Apache Spark Apache

More information

What Is the Java TM 2 Platform, Enterprise Edition?

What Is the Java TM 2 Platform, Enterprise Edition? Page 1 de 9 What Is the Java TM 2 Platform, Enterprise Edition? This document provides an introduction to the features and benefits of the Java 2 platform, Enterprise Edition. Overview Enterprises today

More information

Big Data Management and Security

Big Data Management and Security Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value

More information

the missing log collector Treasure Data, Inc. Muga Nishizawa

the missing log collector Treasure Data, Inc. Muga Nishizawa the missing log collector Treasure Data, Inc. Muga Nishizawa Muga Nishizawa (@muga_nishizawa) Chief Software Architect, Treasure Data Treasure Data Overview Founded to deliver big data analytics in days

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Introduction to Big Data! with Apache Spark UC#BERKELEY# Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"

More information

The Inside Scoop on Hadoop

The Inside Scoop on Hadoop The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. Orion.Gebremedhin@Neudesic.COM B-orgebr@Microsoft.com @OrionGM The Inside Scoop

More information

Why Spark on Hadoop Matters

Why Spark on Hadoop Matters Why Spark on Hadoop Matters MC Srivas, CTO and Founder, MapR Technologies Apache Spark Summit - July 1, 2014 1 MapR Overview Top Ranked Exponential Growth 500+ Customers Cloud Leaders 3X bookings Q1 13

More information

Productionizing a 24/7 Spark Streaming Service on YARN

Productionizing a 24/7 Spark Streaming Service on YARN Productionizing a 24/7 Spark Streaming Service on YARN Issac Buenrostro, Arup Malakar Spark Summit 2014 July 1, 2014 About Ooyala Cross-device video analytics and monetization products and services Founded

More information

MATLAB in Business Critical Applications Arvind Hosagrahara Principal Technical Consultant Arvind.Hosagrahara@mathworks.

MATLAB in Business Critical Applications Arvind Hosagrahara Principal Technical Consultant Arvind.Hosagrahara@mathworks. MATLAB in Business Critical Applications Arvind Hosagrahara Principal Technical Consultant Arvind.Hosagrahara@mathworks.com 310-819-3970 2014 The MathWorks, Inc. 1 Outline Problem Statement The Big Picture

More information

Overview. The Android operating system is like a cake consisting of various layers.

Overview. The Android operating system is like a cake consisting of various layers. The Android Stack Overview The Android operating system is like a cake consisting of various layers. Each layer has its own characteristics and purpose but the layers are not always cleanly separated and

More information

Developing modular Java applications

Developing modular Java applications Developing modular Java applications Julien Dubois France Regional Director SpringSource Julien Dubois France Regional Director, SpringSource Book author :«Spring par la pratique» (Eyrolles, 2006) new

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Karmjeet Kahlon Director Global z Systems Software Sales

Karmjeet Kahlon Director Global z Systems Software Sales Karmjeet Kahlon Director Global z Systems Software Sales The market is moving, forcing businesses to transform Explosion in transaction growth driven by mobility and the Internet of Things Analytics is

More information

Leveraging the Power of SOLR with SPARK. Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015

Leveraging the Power of SOLR with SPARK. Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015 Leveraging the Power of SOLR with SPARK Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015 Welcome Johannes Weigend - CTO QAware GmbH - Software architect / developer - 25 years

More information

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture. Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in

More information

Enable BI, Reporting, and ETL Integration with Your App

Enable BI, Reporting, and ETL Integration with Your App Enable BI, Reporting, and ETL Integration with Your App Using OpenAccess for customized data connectivity Brad Wright Data Connectivity & Integration Progress Software Agenda Challenges of Data Integration

More information

Apache Jakarta Tomcat

Apache Jakarta Tomcat Apache Jakarta Tomcat 20041058 Suh, Junho Road Map 1 Tomcat Overview What we need to make more dynamic web documents? Server that supports JSP, ASP, database etc We concentrates on Something that support

More information

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

Next-Gen Big Data Analytics using the Spark stack

Next-Gen Big Data Analytics using the Spark stack Next-Gen Big Data Analytics using the Spark stack Jason Dai Chief Architect of Big Data Technologies Software and Services Group, Intel Agenda Overview Apache Spark stack Next-gen big data analytics Our

More information