Creating Big Data Applications with Spring XD
|
|
- Buddy Edwards
- 8 years ago
- Views:
Transcription
1 Creating Big Data Applications with Spring XD Thomas
2 THE FASTEST PATH TO NEW BUSINESS VALUE
3 Journey Introduction Concepts Applications Outlook 3 Unless otherwise indicated, these slides are Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license:
4 Introduction 4
5 Spring XD - Overview Platform for Big Data Applications Ingestion, Processing, Movement, Analytics Stream and Batch Processing Scalable Distributed Runtime Support for Deep Analytics Proven Spring Technologies 5
6 Spring XD - Why yet another Big Data Platform? Alternative to Frameworks like Flume, Oozie, Sqoop, Storm Just one Platform instead of many Common things easy, complex things possible Complementary to many technologies Big SQL / MPP Databases - Impala, HAWQ Stream Processing - Apache Spark NoSQL DataStores - Cassandra, MongoDB 6
7 extreme X Data D Spring XD - one stop shop for developing and deploying Big Data Apps 7
8 Spring XD - 10,000 Foot View >_ Rest Spring XD Runtime taps Streams ingest BIDIRECTIONAL Jobs workflow RDBMS Redis Compute NoSQL HDFS export Predictive Modelling R, SAS 8
9 Spring XD - Easy to Setup and Run Store incoming HTTP data into HDFS 9
10 Spring XD - Easy to Setup and Run 1. Install via package manager / unzip 2. Start $ xd-singlenode $ xd-shell 3. Define xd:> stream create ingest --definition http hdfs 4. Run xd:> stream deploy ingest Yes, writing HTTP Data to HDFS can be that simple! 10
11 Core Concepts 11
12 Spring XD - Core Concepts Runtime Modules Streams Taps Analytics Jobs Extensibility Deployment 12
13 Spring XD - Runtime Hosts Stream Processing & Batch Workflows & Analytics Manages Component Distribution Communication via MessageBus Additional Services Configuration / Cluster State: ZooKeeper Analytics: Redis, In-Memory Message Bus: Redis, RabbitMQ, Kafka, Local 13
14 Spring XD - Instance Types XD-Admin Assigns Modules to Containers Manages Cluster Failover & HA XD-Container Loads / Executes Modules Connects to Data Bus Standalone, YARN, Cloud Foundry XD UI XD Shell XD XD Admin Admin Leader XD Admin Leader Leader XD Container module module module module Batch Job State DB Analytics Repository ZK XD Container module module module module Kafka/RabbitMQ/Redis 14
15 Spring XD - Runtime Modes XD Admin XD Admin JVM ZK DB MB ZK JVM DB JVM MB single-node standalone XD Container Module JVM multi-node distributed XD Container Module JVM XD Container Module JVM Development Production 15
16 Spring XD - Distributed Runtime XDA XDA deploy Zookeeper XDC time XDC log XDC XDA = XD Admin XDC = XD Container bind Message Bus 16
17 Modules 17
18 Modules Unit of execution Source, Sink, Processor, Jobs Defined in XML or JVM Language Spring config file with Spring Bean Definitions Can have Parameters 50+ already included in XD Define new Modules via Composition 18
19 Modules - Overview HTTP SFTP Tail File Mail Syslog TCP / Source TCP Client Reactor IP #20 JMS RabbitMQ Time MQTT Mongo Kafka JDBC Gemfire CQ, Source Twitter Search, Stream Stdout Capture Filter Transform Splitter Aggregator HTTP Client Processor Shell Command Script #13 Groovy Python Java JPMML-Evaluator JSON-to-Tuple Object-to-JSON Log File JDBC TCP MQTT Mongo Sink Mail Null Sink #20 Redis RabbitMQ HDFS HDFS Dataset Shell Command GemFire Server Splunk Server Dynamic Router Counter + 1 Gauge
20 Streams 20
21 Streams Programming model for real-time processing How data is collected, processed, and stored or forwarded DSL analog to Unix Pipes and Filters Source Processor 0 * Sink Data is pumped through MessageBus Spring Integration Components Stream Source Message Bus Processor Sink 21
22 Streams - Example Transform payload incoming from HTTP to uppercase and send to log stream create test1 --definition "http transform --expression=payload.touppercase() log --deploy Source Processor Option Sink 22
23 Taps 23
24 Taps Special type of Stream Consume data along the processing pipeline Original stream stays unaffected Collect metrics and perform analytics Stream Source Processor Sink Processor Sink Message Bus Tap 24
25 Taps Example First create the stream stream create test1 definition "http transform --expression=payload.touppercase() log --deploy Then create the tap: onto transform stage, add prefix and send to log stream create test1tap --definition tap:stream:test1.transform > transform --expression='tapped: '+payload log --deploy Tap Source Redirection 25
26 Analytics 26
27 Analytics Counters Simple Counter - how many tweets? Field Value Counter - how many for tag=#java? Aggregate Counter - how many tweets for #java per time interval? Gauges Gauge - what was the last seen value? Rich Gauge - what was the last seen value/avg/min/max? Backed by Redis, In-Memory via Spring Data Repositories Accessible via XD-Shell and REST API on XD-Admin 27
28 Advanced Analytics Processor Modules Python: numpy, pandas, scikit-learn, NLTK, SimpleCV Shell: R-Project rscript, OpenCV Java / Groovy PMML Processor Module Predictive Model Markup Language Description of Parameterised Data Mining Models Allows to Operationalise Predictive Models Real-time evaluation and scoring 28
29 Jobs 29
30 Jobs Programming Model for Batch Processing Create, Schedule, Execute and Monitor Spring Batch and Spring Hadoop Components CSV to JDBC FTP to Jobs HDFS JDBC to HDFS #5 HDFS to JDBC HDFS to MongoDB 30
31 Jobs - Example Create job from existing job definition job create --name "helloworld-job" --definition helloworld" --deploy Run job once job launch --name "helloworld-job" Run job periodically stream create --name "hw-cron" --definition "trigger --cron='0/5 * * * * *' > queue:job:helloworld-job deploy 31
32 Management 32
33 Spring XD - Shell CLI based on Spring Shell Manages Streams, Jobs, Analytics and Deployment Completion / Assist Many built-in Commands try help Started via xd-shell 33
34 Spring XD - Admin UI Management Interface accessible from XD-Admin Node XD-ADMIN:9393/admin-ui 34
35 Spring XD - REST Interface accessible from XD-Admin Node used by XD-Shell and Admin-UI 35
36 Extensibility 36
37 Extensibility Custom Modules Source, Sink, Processor, Job Spring Integration, Spring Batch E.g. to wrap a Java Library Upload new modules via XD-Shell / REST Register custom Spring Expression Language Aliases from java.lang.double.parsedouble(payload.sensorvalue) to #parsedouble(payload.sensorvalue) Scripts Collection of XD commands Automation 37
38 Deployment 38
39 Deployment deploy or --deploy stream deploy firststream stream create secondstream --deploy Deployment Manifest Customize via --properties Parameter Control # of Module Instances Define Target Server or Group Direct Binding Stream Data Partitioning 39
40 Deployment Manifest - Module Count http worker hdfs stream deploy --properties module.http.count=2, module.worker.count=4, module.hdfs.count=3 http http worker worker worker hdfs hdfs hdfs worker 40
41 Deployment Manifest - Module Placement http worker hdfs stream deploy WEB worker --properties module.http.count=2, module.worker.count=4, module.hdfs.count=3 module.http.criteria= group.contains( WEB ) http http worker worker worker hdfs hdfs hdfs xd/bin/xd-container --groups="web" 41
42 Deployment Manifest - Data Partitioning http worker hdfs stream deploy WEB 0 worker --properties module.worker.count=4, module.http.producer.partitionkeyexpression= payload.customerid http http worker worker worker hdfs hdfs hdfs partition := hash(payload.customerid) % worker.count 42
43 Applications 43
44 Spring XD - Measuring Live Usage for a Major Sports League Measuring live video usage through mobile applications 44
45 Spring XD - IoT Connected Car Journey and Range Prediction 45
46 Spring XD - Smartgrid ACM Distributed Event Based Systems 2014 Scalable, Real-Time Analytics, High Volume Sensor Data Short-Term Load Forecasting in a Power Grid Sensor Data from Smart Plugs Stream Components Sensor Data Ingestion Data Aggregation Load Prediction Demo Analytics via REST 46
47 What s next? 47
48 Roadmap and beyond Custom Modules in HDFS More OOTB Modules Web based Editor for Streams & Jobs Apache Ambari Support Security Enhancements Spring XD on Pivotal Cloud Foundry GA Release Planned for May
49 Learn more Project GitHub Wiki Samples Modules JIRA Stackoverflow 49
50 Spring XD - Takeaway Increased Productivity through out-of-the-box components Unified runtime for both Real-time and Batch use cases Scalable, Distributed and Fault Tolerant Runtime Closed Loop Analytics through online (stream) and offline (batch) data Data Ingestion, Processing, Movement, Analytics Swiss-army knife of data movement and data pipelines Repeatable turnkey solution for next generation data-centric use cases 50
51 Learn More. Stay Connected. Twitter: twitter.com/springcentral YouTube: spring.io/video LinkedIn: spring.io/linkedin Google Plus: spring.io/gplus 51 Unless otherwise indicated, these slides are Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license:
52 Backup Slides 52
53 Lambda Architecture 53
54 Lambda Architecture
55 Lambda Architecture - Spring XD Gemfire XD> Spring Stream Processing Serving Layer Speed Layer Real-time Views Spring Boot Batch Processing Workflow Orchestration Ingest Data Lake Spring Boot HAWQ Spring Boot Export Analytics Batch Layer Predictive Analytics Batch Views Spring Boot
56 Predictive Models Model Parameterised Algorithm Model Building Derive a parameterised algorithm from the data Slow process Usually large data volume -> done offline as a batch process Model Scoring Use the model to predict new information Fast process Can be done as part of stream processing 56
57 PMML Predictive Model Markup Language Open Standard Maintained by Data Mining Group (DMG) XML based DSL for predictive models Can be interpreted 15 Model Types (Naive Bayes, General Regression, Neural Networks, etc.) First Version (1999) Current Version Lingua Franca for Predictive Models Bridge the Gap between Data Scientists and Engineers 57
58 Anatomy of a PMML Model Predictive Model Algorithm description(s) Parameterisation trained model Pre Processing Post Processing Transform model output Thresholds / Business rules Source:(PMML(in(Ac/on,(2 nd (Edi/on,(2012,(p.(7. 58
59 Predictive Analytics with Spring XD XD Module analytic-pmml Introduced in Spring M6 (April 2014) Real-time evaluation and scoring Based on JPMML-Evaluator Wide range of Model types spring-xd-modules/analytics-ml-pmml on Github
Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian Tzolov @christzolov
Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA by Christian Tzolov @christzolov Whoami Christian Tzolov Technical Architect at Pivotal, BigData, Hadoop, SpringXD,
More informationData Lake In Action: Real-time, Closed Looped Analytics On Hadoop
1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationUpcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
More informationHADOOP. Revised 10/19/2015
HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...
More informationAn Open-Source Streaming Machine Learning and Real-Time Analytics Architecture
An Open-Source Streaming Machine Learning and Real-Time Analytics Architecture Using an IoT example (incubating) (incubating) Fred Melo @fredmelo_br 1 William Markito @william_markito Traditional Data
More informationCollaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationQsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationXpoLog Competitive Comparison Sheet
XpoLog Competitive Comparison Sheet New frontier in big log data analysis and application intelligence Technical white paper May 2015 XpoLog, a data analysis and management platform for applications' IT
More informationInternet of Things. Opportunity Challenges Solutions
Internet of Things Opportunity Challenges Solutions Copyright 2014 Boeing. All rights reserved. GPDIS_2015.ppt 1 ANALYZING INTERNET OF THINGS USING BIG DATA ECOSYSTEM Internet of Things matter for... Industrial
More informationBuilding Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
More informationHortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015
Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform
More informationHADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM
HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction
More informationDatabricks. A Primer
Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically
More informationGAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.
More informationUnified Batch & Stream Processing Platform
Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationDeveloping Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
More informationComprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
More informationPeers Techno log ies Pv t. L td. HADOOP
Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and
More informationHow To Write A Nosql Database In Spring Data Project
Spring Data Modern Data Access for Enterprise Java Mark Pollack, Oliver Gierke, Thomas Risberg, Jon Brisbin, and Michael Hunger O'REILLY* Beijing Cambridge Farnham Koln Sebastopol Tokyo Table of Contents
More informationPulsar Realtime Analytics At Scale. Tony Ng April 14, 2015
Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours
More informationDatabricks. A Primer
Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful
More informationIntroduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.
Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in
More informationSimplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!
Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Streaming Analy.cs S S S Scale- up Database Data And Compute Grid
More informationState-of-the-Art ENTERPRISE JAVA APPLICATIONS WITH SPRING BOOT / @OLIVERGIERKE
State-of-the-Art ENTERPRISE JAVA APPLICATIONS WITH SPRING BOOT / @OLIVERGIERKE Unless otherwise indicated, these slides are 2013-2015 Pivotal Software, Inc. Licensed under a Creative Commons Attribution-NonCommercial
More informationSTREAM ANALYTIX. Industry s only Multi-Engine Streaming Analytics Platform
STREAM ANALYTIX Industry s only Multi-Engine Streaming Analytics Platform One Platform for All Create real-time streaming data analytics applications in minutes with a powerful visual editor Get a wide
More informationThe Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org
The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org What is Apache Flink? Open Source Started in 2009 by the Berlin-based database research groups In the Apache
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationBig Data Course Highlights
Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like
More informationTRAINING PROGRAM ON BIGDATA/HADOOP
Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,
More informationAutomated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer
Automated Data Ingestion Bernhard Disselhoff Enterprise Sales Engineer Agenda Pentaho Overview Templated dynamic ETL workflows Pentaho Data Integration (PDI) Use Cases Pentaho Overview Overview What we
More informationKafka & Redis for Big Data Solutions
Kafka & Redis for Big Data Solutions Christopher Curtin Head of Technical Research @ChrisCurtin About Me 25+ years in technology Head of Technical Research at Silverpop, an IBM Company (14 + years at Silverpop)
More information3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS
. 3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS Deliver fast actionable business insights for data scientists, rapid application creation for developers and enterprise-grade
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationHDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
More informationBig Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
More informationIntroduction to Big Data Training
Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB
More informationEMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.
EMC Federation Big Data Solutions 1 Introduction to data analytics Federation offering 2 Traditional Analytics! Traditional type of data analysis, sometimes called Business Intelligence! Type of analytics
More informationBig Data Management and Security
Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value
More informationwww.basho.com Technical Overview Simple, Scalable, Object Storage Software
www.basho.com Technical Overview Simple, Scalable, Object Storage Software Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces...
More informationHow To Write A Trusted Analytics Platform (Tap)
Trusted Analytics Platform (TAP) TAP Technical Brief October 2015 TAP Technical Brief Overview Trusted Analytics Platform (TAP) is open source software, optimized for performance and security, that accelerates
More informationReal-time Big Data Analytics with Storm
Ron Bodkin Founder & CEO, Think Big June 2013 Real-time Big Data Analytics with Storm Leading Provider of Data Science and Engineering Services Accelerating Your Time to Value IMAGINE Strategy and Roadmap
More informationPivotal HD Enterprise
PRODUCT DOCUMENTATION Pivotal HD Enterprise Version 1.1.1 Release Notes Rev: A02 2014 GoPivotal, Inc. Table of Contents 1 Welcome to Pivotal HD Enterprise 4 2 PHD Components 5 2.1 Core Apache Stack 5 2.2
More informationTowards Smart and Intelligent SDN Controller
Towards Smart and Intelligent SDN Controller - Through the Generic, Extensible, and Elastic Time Series Data Repository (TSDR) YuLing Chen, Dell Inc. Rajesh Narayanan, Dell Inc. Sharon Aicler, Cisco Systems
More informationHadoop: The Definitive Guide
FOURTH EDITION Hadoop: The Definitive Guide Tom White Beijing Cambridge Famham Koln Sebastopol Tokyo O'REILLY Table of Contents Foreword Preface xvii xix Part I. Hadoop Fundamentals 1. Meet Hadoop 3 Data!
More informationMoving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
More informationComplete Java Classes Hadoop Syllabus Contact No: 8888022204
1) Introduction to BigData & Hadoop What is Big Data? Why all industries are talking about Big Data? What are the issues in Big Data? Storage What are the challenges for storing big data? Processing What
More informationIntegrating a Big Data Platform into Government:
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
More informationBIG DATA - HADOOP PROFESSIONAL amron
0 Training Details Course Duration: 30-35 hours training + assignments + actual project based case studies Training Materials: All attendees will receive: Assignment after each module, video recording
More informationReal Time Big Data Processing
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
More informationCOURSE CONTENT Big Data and Hadoop Training
COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationIntroduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!
More informationWhat s Cooking in KNIME
What s Cooking in KNIME Thomas Gabriel Copyright 2015 KNIME.com AG Agenda Querying NoSQL Databases Database Improvements & Big Data Copyright 2015 KNIME.com AG 2 Querying NoSQL Databases MongoDB & CouchDB
More informationITG Software Engineering
IBM WebSphere Administration 8.5 Course ID: Page 1 Last Updated 12/15/2014 WebSphere Administration 8.5 Course Overview: This 5 Day course will cover the administration and configuration of WebSphere 8.5.
More informationHow To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)
Day with Development Master Class Big Data Management System DW & Big Data Global Leaders Program Jean-Pierre Dijcks Big Data Product Management Server Technologies Part 1 Part 2 Foundation and Architecture
More information#TalendSandbox for Big Data
Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND
More informationDeploying Hadoop with Manager
Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution
More informationData Governance in the Hadoop Data Lake. Michael Lang May 2015
Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales
More informationThe Technology of the Business Data Lake
The Technology of the Business Data Lake Table of Contents Overview 3 Business Data Lake Architecture 5 Designing the Business Data Lake 11 Conclusion 15 Appendix 16 2 BIM the way we do it Overview A new
More informationBig Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic
Big Data Analytics with Spark and Oscar BAO Tamas Jambor, Lead Data Scientist at Massive Analytic About me Building a scalable Machine Learning platform at MA Worked in Big Data and Data Science in the
More informationApache Flink Next-gen data analysis. Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas
Apache Flink Next-gen data analysis Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas What is Flink Project undergoing incubation in the Apache Software Foundation Originating from the Stratosphere research
More informationPutting Apache Kafka to Use!
Putting Apache Kafka to Use! Building a Real-time Data Platform for Event Streams! JAY KREPS, CONFLUENT! A Couple of Themes! Theme 1: Rise of Events! Theme 2: Immutability Everywhere! Level! Example! Immutable
More informationThe Internet of Things
The Internet of Things Vijay Sethia Senior Product Manager, IBM Software Group 2014 IBM Corporation Agenda The Internet of Things The IBM IoT On-Prem Cloud Sample IoT Application 1 The Internet of Things
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationBIG DATA SOLUTION DATA SHEET
BIG DATA SOLUTION DATA SHEET Highlight. DATA SHEET HGrid247 BIG DATA SOLUTION Exploring your BIG DATA, get some deeper insight. It is possible! Another approach to access your BIG DATA with the latest
More informationHortonworks Data Platform for Hadoop and SAP HANA
Hortonworks Data Platform for Hadoop and SAP HANA Prasad illapani, Big Data & SAP HANA- Product Management & Strategy SAP Labs LLC., Bellevue, WA Bob Page, VP Partner Products, Hortonworks Inc. Palo Alto,
More informationReal Time Data Processing using Spark Streaming
Real Time Data Processing using Spark Streaming Hari Shreedharan, Software Engineer @ Cloudera Committer/PMC Member, Apache Flume Committer, Apache Sqoop Contributor, Apache Spark Author, Using Flume (O
More informationProgramming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
More informationBig Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
More informationDeploy Your First CF App on Azure with Template and Service Broker. Thomas Shao, Rita Zhang, Bin Xia Microsoft Azure Team
Deploy Your First CF App on Azure with Template and Service Broker Thomas Shao, Rita Zhang, Bin Xia Microsoft Azure Team Build, Stage, Deploy, Publish Applications with one Command Supporting Languages
More informationCollaborative Open Market to Place Objects at your Service
Collaborative Open Market to Place Objects at your Service D6.4.1 Marketplace integration First version Project Acronym COMPOSE Project Title Project Number 317862 Work Package WP6 Open marketplace Lead
More informationBeyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations
Beyond Lambda - how to get from logical to physical Artur Borycki, Director International Technology & Innovations Simplification & Efficiency Teradata believe in the principles of self-service, automation
More informationTalend Real-Time Big Data Sandbox. Big Data Insights Cookbook
Talend Real-Time Big Data Talend Real-Time Big Data Overview of Real-time Big Data Pre-requisites to run Setup & Talend License Talend Real-Time Big Data Big Data Setup & About this cookbook What is the
More informationSession 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,
Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC, Bellevue, WA Legal disclaimer The information in this
More informationReal-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH
Real-time Data Analytics mit Elasticsearch Bernhard Pflugfelder inovex GmbH Bernhard Pflugfelder Big Data Engineer @ inovex Fields of interest: search analytics big data bi Working with: Lucene Solr Elasticsearch
More informationInfomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
More informationHADOOP IN ENTERPRISE FUTURE-PROOF YOUR BIG DATA INVESTMENTS WITH CASCADING. Supreet Oberoi Nov. 4-6, 2014 Big Data Expo Santa Clara
DRIVING INNOVATION THROUGH DATA HADOOP IN ENTERPRISE FUTURE-PROOF YOUR BIG DATA INVESTMENTS WITH CASCADING Supreet Oberoi Nov. 4-6, 2014 Big Data Expo Santa Clara ABOUT ME I am a Data Engineer, not a Data
More informationTraining Catalog. Summer 2015 Training Catalog. Apache Hadoop Training from the Experts. Apache Hadoop Training From the Experts
Training Catalog Apache Hadoop Training from the Experts Summer 2015 Training Catalog Apache Hadoop Training From the Experts September 2015 provides an immersive and valuable real world experience In
More informationGigaSpaces Real-Time Analytics for Big Data
GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and
More informationReal-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising
Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising Open Data Partners and AdReady April 2012 1 Executive Summary AdReady is working to develop and deploy sophisticated
More informationSearch and Real-Time Analytics on Big Data
Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its
More informationCapitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
More informationBuilding Data-Driven Internet of Things (IoT) Applications
Building Data-Driven Internet of Things (IoT) Applications A four-step primer IOT DEMANDS NEW APPLICATIONS Automated homes. Connected cars. Smart cities. The Internet of Things (IoT) will forever change
More informationBIG DATA HADOOP TRAINING
BIG DATA HADOOP TRAINING DURATION 40hrs AVAILABLE BATCHES WEEKDAYS (7.00AM TO 8.30AM) & WEEKENDS (10AM TO 1PM) MODE OF TRAINING AVAILABLE ONLINE INSTRUCTOR LED CLASSROOM TRAINING (MARATHAHALLI, BANGALORE)
More informationCloud3DView: Gamifying Data Center Management
Cloud3DView: Gamifying Data Center Management Yonggang Wen Assistant Professor School of Computer Engineering Nanyang Technological University ygwen@ntu.edu.sg November 26, 2013 School of Computer Engineering
More informationAligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed
More informationCertified Big Data and Apache Hadoop Developer VS-1221
Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification
More informationPerformance Testing of Big Data Applications
Paper submitted for STC 2013 Performance Testing of Big Data Applications Author: Mustafa Batterywala: Performance Architect Impetus Technologies mbatterywala@impetus.co.in Shirish Bhale: Director of Engineering
More informationStreaming items through a cluster with Spark Streaming
Streaming items through a cluster with Spark Streaming Tathagata TD Das @tathadas CME 323: Distributed Algorithms and Optimization Stanford, May 6, 2015 Who am I? > Project Management Committee (PMC) member
More informationBig Data Storage Challenges for the Industrial Internet of Things
Big Data Storage Challenges for the Industrial Internet of Things Shyam V Nath Diwakar Kasibhotla SDC September, 2014 Agenda Introduction to IoT and Industrial Internet Industrial & Sensor Data Big Data
More informationCloudera Manager Training: Hands-On Exercises
201408 Cloudera Manager Training: Hands-On Exercises General Notes... 2 In- Class Preparation: Accessing Your Cluster... 3 Self- Study Preparation: Creating Your Cluster... 4 Hands- On Exercise: Working
More informationData Services Advisory
Data Services Advisory Modern Datastores An Introduction Created by: Strategy and Transformation Services Modified Date: 8/27/2014 Classification: DRAFT SAFE HARBOR STATEMENT This presentation contains
More informationReal Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA
Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,
More information