Trend Micro Big Data Platform and Apache Bigtop. 葉 祐 欣 (Evans Ye) Big Data Conference 2015
|
|
|
- Giles Lee
- 10 years ago
- Views:
Transcription
1 Trend Micro Big Data Platform and Apache Bigtop 葉 祐 欣 (Evans Ye) Big Data Conference 2015
2 Who am I Apache Bigtop PMC member Apache Big Data Europe 2015 Speaker Software Trend Micro Develop big data apps & infra Has some experience in Hadoop, HBase, Pig, Spark, Kafka, Fluentd, Akka, and Docker
3 Outline Quick Intro to Bigtop Trend Micro Big Data Platform Mission-specific Platform Big Data Landscape (3p) Bigtop 1.1 Release (6p)
4 Quick Intro to Bigtop
5 Linux Distributions
6 Hadoop Distributions
7 Hadoop Distributions We re fully open sourced!
8 How do I add patches?
9
10 From source code to packages Bigtop Packaging
11 Bigtop feature set Packaging Testing Deployment Virtualization for you to easily build your own Big Data Stack
12 Supported components
13 One click to build packages $ git clone $ docker run \ --rm \ --volume `pwd`/bigtop:/bigtop \ --workdir /bigtop \ bigtop/slaves:trunk-centos-7 \ bash -l -c./gradlew rpm
14 $./gradlew tasks
15 Easy to do CI ci.bigtop.apache.org
16 RPM/DEB packages
17 One click Hadoop provisioning./docker-hadoop.sh -c 3
18 One click Hadoop provisioning bigtop/deploy image on Docker hub./docker-hadoop.sh -c 3
19 One click Hadoop provisioning bigtop/deploy image on Docker hub puppet apply puppet apply puppet apply./docker-hadoop.sh -c 3 Just google bigtop provisioner
20 Should I use Bigtop?
21 If you want to build your own customised Big Data Stack
22 Curves ahead
23 Pros & cons Bigtop You need a talented Hadoop team Self-service: troubleshoot, find solutions, develop patches Add any patch at any time you want (additional efforts) Choose any version of component you want (additional efforts) Vendors (Hortonworks, Cloudera, etc) Better support since they re the guy who write the code! $
24 Trend Micro Big Data Platform
25 Trend Micro Hadoop (TMH) Use Bigtop as the basis for our internal custom distribution of Hadoop Apply community, private patches to upstream projects for business and operational need Newest TMH7 is based on Bigtop 1.0 SNAPSHOT
26 Working with community made our life easier Knowing community status made TMH7 release based on Bigtop 1.0 SNAPSHOT possible
27 Working with community made our life easier Knowing community status made TMH7 release based on Bigtop 1.0 SNAPSHOT possible Contribute Bigtop Provisioner, packaging code, puppet recipes, bugfixes, CI infra, anything!
28 Working with community made our life easier Leverage Bigtop smoke tests and integration tests with Bigtop Provisioner to evaluate TMH7
29 Working with community made our life easier Leverage Bigtop smoke tests and integration tests with Bigtop Provisioner to evaluate TMH7 Contribute feedback, evaluation, use case through Production level adoption
30 Trend Micro Big Data Stack Powered by Bigtop In-house Apps App A App B App C App D APIs and Interfases Processing Engine Storage Kerberos Ad-hoc Query UDFs Pig Mapreduce Hadoop HDFS Wuji Oozie HBase Solr Cloud Resource Management Hadoop YARN Deployment Hadooppet (prod) Hadoocker (dev)
31 Hadooppet Puppet recipes to deploy and manage TMH Big Data Platform HDFS, YARN, HA auto-configured Kerberos, LDAP auto-configured Kerberos cross realm authentication auto-configured (For distcp to run across secured clusters)
32
33 Hadoocker A Devops toolkit for Hadoop app developer to develop and test its code on Big Data Stack preload images > dev & test env w/o deployment > support end-to-end CI test A Hadoop env for apps to test against new Hadoop distribution
34 Docker based dev & test env internal Docker registry Hadoop server TMH7 Hadoop client Hadoop app Restful APIs data sample data./execute.sh hadoop fs put
35 Docker based dev & test env internal Docker registry Oozie(Wuji) Hadoop server TMH7 Hadoop client Dependency service data Hadoop app Restful APIs sample data./execute.sh Solr hadoop fs put
36 Mission-specific Platform
37 Use case Real-time streaming data flows in Lookup external info when data flows in Detect threat/malicious activities on streaming data Correlate with other historical data (batch query) to gather more info Can also run batch detections by specifying arbitrary start time and end time Support Investigation down to raw log level
38 Lambda Architecture
39 receiver
40 buffer receiver
41 receiver transformation, lookup ext info buffer
42 receiver transformation, lookup ext info buffer streaming batch
43 receiver transformation, lookup ext info buffer streaming batch
44 High-throughput, distributed publish-subscribe messaging system Supports multiple consumers attached to a topic Configurable partition(shard), replication factor Load-balance within same consumer group Only consume message once a b c
45 Distributed NoSQL key-value storage, no SPOF Super fast on write, suitable for data keeps coming in Decent read performance, if design it right Build data model around your queries Spark Cassandra Connector Configurable CA (CAP theorem) Choose A over C for availability and vise-versa Dynamo: Amazon s Highly Available Key-value Store
46 Fast, distributed, in-memory processing engine One system for streaming and batch workloads Spark streaming
47 Akka High performance concurrency framework for Java and Scala Actor model for message-driven processing Asynchronous by design to achieve high throughput Each message is handled in a single threaded context (no lock, synchronous needed) Let-it-crash model for fault tolerance and auto-healing system Clustering mechanism to scale out The Road to Akka Cluster, and Beyond
48 Akka Streams Akka Streams is a DSL library for streaming computation on Akka Source Flow Sink Materializer to transform each step into Actor Back-pressure enabled by default The Reactive Manifesto
49 No back-pressure Source Fast!!! Slow Sink v( ) )y (> <)
50 No back-pressure Source Fast!!! Slow Sink v( ) )y (> <)
51 With back-pressure Source Fast!!! Slow Sink
52 With back-pressure Source Fast!!! Slow Sink request 3 request 3
53 Data pipeline with Akka Streams Scale up using balance and merge worker balance worker worker merge source:
54 Data pipeline with Akka Streams Scale out using docker $ docker-compose scale pipeline=3
55 Reactive Kafka Akka Streams wrapper for Kafka Commit processed offset back into Kafka Provide at-least-once delivery guarantee
56 Message delivery guarantee Actor Model: at-most-once Akka Persistence: at-least-once Persist log to external storage (like WAL) Reactive Kafka: at-least-once + back-pressure Write offset back into Kafka At-least-once + Idempotent writes = exactly-once
57 Recap: SDACK Stack Spark: both streaming and batch analytics Docker: resource management (fine for one app) Akka: fine-grained, elastic data pipelines Cassandra: batch queries Kafka: durable buffer, fan-out to multiple consumers
58 Your mileage may vary
59 we re still evolving
60 Remember this:
61 The SMACK Stack Toolbox for wide variety of data processing scenarios
62 SMACK Stack Spark: fast and general engine for large-scale data processing Mesos: cluster resource management system Akka: toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications Cassandra: distributed, highly available database designed to handle large amounts of data across datacenters Kafka: high-throughput, low-latency distributed pub-sub messaging system for real-time data feeds Source:
63 Reference Spark Summit Europe 2015 Streaming Analytics with Spark, Kafka, Cassandra, and Akka (Helena Edelson) Big Data AW Meetup SMACK Architectures (Anton Kirillov)
64 Big Data Landscape
65 Big Data moving trend Memory is faster than SSD/disk, and is cheaper In Memory Computing & Fast Data Spark : In memory batch/streaming engine Flink : In memory streaming/batch engine Iginte : In memory data fabric Geode (incubating) : In memory database
66 Off-Heap, Off-Heap, Off-Heap Off-Heap storage is a JVM process memory outside of the heap, which is allocated and managed using native calls. size not limited by JVM (it is limited by physical memory limits) is not subject to GC which essentially removes long GC pauses Project Tungsten, Flink, Iginte, Geode, HBase
67 (Some) Apache Big Data APIs and Interfases Processing Engine Storage Resource Management Flink ML, Gelly Flink Components Streaming, MLlib, GraphX Spark Pig Hadoop HDFS Hadoop YARN Hive Tez Hadoop Distribution } Phoenix Trafodion Bigtop HBase Slider Mesos Ambari Hadoop Management messaging system in memory data grid search engine NoSQL Kafka Ignite Geode Solr Cassandra
68 Bigtop 1.1 Release Jan, 2016, I expect
69 Bigtop 1.1 Release Hadoop Spark Hive Pig Oozie Flume Zeppelin Ignite Hadoop Phoenix Hue Crunch 0.12, 24 components included!
70
71 Hadoop 2.6 Heterogeneous Storages SSD + hard drive Placement policy (all_ssd, hot, warm, cold) Archival Storage (cost saving) HDFS-7285 (Hadoop 3.0) Erasure code to save storage from 3X to 1.5X
72 Hadoop 2.7 Transparent encryption (encryption zone) Available in 2.6 Known issue: Encryption is sometimes done incorrectly (HADOOP-11343) Fixed in HDFS2015_Past_present_future.pdf
73 Rising star: Flink Streaming dataflow engine Treat batch computing as fixed length streaming Exactly-once by distributed snapshotting Event time handling by watermarks
74 Bigtop Roadmap Integrate and package Apache Flink Re-implement Bigtop Provisioner using docker-machine, compose, swarm Deploy containers on multiple hosts Support any kind of base image for deployment
75 Wrap up
76 Wrap up Hadoop Distribution Choose Bigtop if you want more control The SMACK Stack Toolbox for variety data processing scenarios Big Data Landscape In-memory, off-heap solutions are hot
77 Thank you! Questions?
How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning
How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning Evans Ye Apache Big Data 2015 Budapest Who am I Apache Bigtop PMC member Software Engineer at Trend Micro Develop Big
Upcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC [email protected] Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org
The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org What is Apache Flink? Open Source Started in 2009 by the Berlin-based database research groups In the Apache
Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source
Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source DMITRIY SETRAKYAN Founder, PPMC http://www.ignite.incubator.apache.org @apacheignite @dsetrakyan Agenda About In- Memory
Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015
Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform
Dominik Wagenknecht Accenture
Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna
Moving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com [email protected] Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
HDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
Data Security in Hadoop
Data Security in Hadoop Eric Mizell Director, Solution Engineering Page 1 What is Data Security? Data Security for Hadoop allows you to administer a singular policy for authentication of users, authorize
Apache Flink Next-gen data analysis. Kostas Tzoumas [email protected] @kostas_tzoumas
Apache Flink Next-gen data analysis Kostas Tzoumas [email protected] @kostas_tzoumas What is Flink Project undergoing incubation in the Apache Software Foundation Originating from the Stratosphere research
Deploying Hadoop with Manager
Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer [email protected] Alejandro Bonilla / Sales Engineer [email protected] 2 Hadoop Core Components 3 Typical Hadoop Distribution
TE's Analytics on Hadoop and SAP HANA Using SAP Vora
TE's Analytics on Hadoop and SAP HANA Using SAP Vora Naveen Narra Senior Manager TE Connectivity Santha Kumar Rajendran Enterprise Data Architect TE Balaji Krishna - Director, SAP HANA Product Mgmt. -
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
Hadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
Big Data and Industrial Internet
Big Data and Industrial Internet Keijo Heljanko Department of Computer Science and Helsinki Institute for Information Technology HIIT School of Science, Aalto University [email protected] 16.6-2015
Big Data. Lyle Ungar, University of Pennsylvania
Big Data Big data will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. McKinsey Data Scientist: The Sexiest Job of the 21st Century -
Introduction to Big Data Training
Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB
Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,
Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC, Bellevue, WA Legal disclaimer The information in this
HADOOP. Revised 10/19/2015
HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.
Why Spark on Hadoop Matters
Why Spark on Hadoop Matters MC Srivas, CTO and Founder, MapR Technologies Apache Spark Summit - July 1, 2014 1 MapR Overview Top Ranked Exponential Growth 500+ Customers Cloud Leaders 3X bookings Q1 13
Apache Flink. Fast and Reliable Large-Scale Data Processing
Apache Flink Fast and Reliable Large-Scale Data Processing Fabian Hueske @fhueske 1 What is Apache Flink? Distributed Data Flow Processing System Focused on large-scale data analytics Real-time stream
Shark Installation Guide Week 3 Report. Ankush Arora
Shark Installation Guide Week 3 Report Ankush Arora Last Updated: May 31,2014 CONTENTS Contents 1 Introduction 1 1.1 Shark..................................... 1 1.2 Apache Spark.................................
Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!
Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Streaming Analy.cs S S S Scale- up Database Data And Compute Grid
Cray XC30 Hadoop Platform Jonathan (Bill) Sparks Howard Pritchard Martha Dumler
Cray XC30 Hadoop Platform Jonathan (Bill) Sparks Howard Pritchard Martha Dumler Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations.
Qsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
TRAINING PROGRAM ON BIGDATA/HADOOP
Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,
Savanna Hadoop on. OpenStack. Savanna Technical Lead
Savanna Hadoop on OpenStack Sergey Lukjanov Savanna Technical Lead Mirantis, 2013 Agenda Savanna Overview Savanna Use Cases Roadmap & Current Status Architecture & Features Overview Hadoop vs. Virtualization
INTRODUCING APACHE IGNITE An Apache Incubator Project
WHITE PAPER BY GRIDGAIN SYSTEMS FEBRUARY 2015 INTRODUCING APACHE IGNITE An Apache Incubator Project COPYRIGHT AND TRADEMARK INFORMATION 2015 GridGain Systems. All rights reserved. This document is provided
Case Study : 3 different hadoop cluster deployments
Case Study : 3 different hadoop cluster deployments Lee moon soo [email protected] HDFS as a Storage Last 4 years, our HDFS clusters, stored Customer 1500 TB+ data safely served 375,000 TB+ data to customer
How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5
Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark
Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
Self-service BI for big data applications using Apache Drill
Self-service BI for big data applications using Apache Drill 2015 MapR Technologies 2015 MapR Technologies 1 Data Is Doubling Every Two Years Unstructured data will account for more than 80% of the data
Real Time Data Processing using Spark Streaming
Real Time Data Processing using Spark Streaming Hari Shreedharan, Software Engineer @ Cloudera Committer/PMC Member, Apache Flume Committer, Apache Sqoop Contributor, Apache Spark Author, Using Flume (O
Hadoop2, Spark Big Data, real time, machine learning & use cases. Cédric Carbone Twitter : @carbone
Hadoop2, Spark Big Data, real time, machine learning & use cases Cédric Carbone Twitter : @carbone Agenda Map Reduce Hadoop v1 limits Hadoop v2 and YARN Apache Spark Streaming : Spark vs Storm Machine
Next-Gen Big Data Analytics using the Spark stack
Next-Gen Big Data Analytics using the Spark stack Jason Dai Chief Architect of Big Data Technologies Software and Services Group, Intel Agenda Overview Apache Spark stack Next-gen big data analytics Our
Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang
The Big Data Ecosystem at LinkedIn Presented by Zhongfang Zhuang Based on the paper The Big Data Ecosystem at LinkedIn, written by Roshan Sumbaly, Jay Kreps, and Sam Shah. The Ecosystems Hadoop Ecosystem
Big Data Management and Security
Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value
Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
Managing large clusters resources
Managing large clusters resources ID2210 Gautier Berthou (SICS) Big Processing with No Locality Job( /crawler/bot/jd.io/1 ) submi t Workflow Manager Compute Grid Node Job This doesn t scale. Bandwidth
Comprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
Self-service BI for big data applications using Apache Drill
Self-service BI for big data applications using Apache Drill 2015 MapR Technologies 2015 MapR Technologies 1 Management - MCS MapR Data Platform for Hadoop and NoSQL APACHE HADOOP AND OSS ECOSYSTEM Batch
Dell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert [email protected]/
3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS
. 3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS Deliver fast actionable business insights for data scientists, rapid application creation for developers and enterprise-grade
YARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing
YARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing Eric Charles [http://echarles.net] @echarles Datalayer [http://datalayer.io] @datalayerio FOSDEM 02 Feb 2014 NoSQL DevRoom
Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source
Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source DMITRIY SETRAKYAN Founder, PPMC http://www.ignite.incubator.apache.org #apacheignite Agenda Apache Ignite (tm) In- Memory
How Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
Big Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data
Peers Techno log ies Pv t. L td. HADOOP
Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and
Analytics on Spark & Shark @Yahoo
Analytics on Spark & Shark @Yahoo PRESENTED BY Tim Tully December 3, 2013 Overview Legacy / Current Hadoop Architecture Reflection / Pain Points Why the movement towards Spark / Shark New Hybrid Environment
The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader
The Digital Enterprise Demands a Modern Integration Approach Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader Yesterday s approach to data and application integration is a barrier
Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014
Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ Cloudera World Japan November 2014 WANdisco Background WANdisco: Wide Area Network Distributed Computing Enterprise ready, high availability
EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.
EMC Federation Big Data Solutions 1 Introduction to data analytics Federation offering 2 Traditional Analytics! Traditional type of data analysis, sometimes called Business Intelligence! Type of analytics
Apache Sentry. Prasad Mujumdar [email protected] [email protected]
Apache Sentry Prasad Mujumdar [email protected] [email protected] Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture
ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE
ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics
Introduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
Copyright 2012, Oracle and/or its affiliates. All rights reserved.
1 Oracle Big Data Appliance Releases 2.5 and 3.0 Ralf Lange Global ISV & OEM Sales Agenda Quick Overview on BDA and its Positioning Product Details and Updates Security and Encryption New Hadoop Versions
From Spark to Ignition:
From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for
Big Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Source Software Sudhir Tonse (@stonse) Danny Yuan (@g9yuayon) Netflix is a log generating company that also happens to stream movies
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
Encryption and Anonymization in Hadoop
Encryption and Anonymization in Hadoop Current and Future needs Sept-28-2015 Page 1 ApacheCon, Budapest Agenda Need for data protection Encryption and Anonymization Current State of Encryption in Hadoop
Apache Hadoop: Past, Present, and Future
The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer [email protected], twitter: @awadallah Hadoop Past
Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing
Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing Andre Luckow, Peter M. Kasson, Shantenu Jha STREAMING 2016, 03/23/2016 RADICAL, Rutgers, http://radical.rutgers.edu
Native Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
Big Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com StreamHorizon & Big Data Integrates into your Data Processing Pipeline Seamlessly integrates at any point of your your data processing pipeline Implements
<Insert Picture Here> Big Data
Big Data Kevin Kalmbach Principal Sales Consultant, Public Sector Engineered Systems Program Agenda What is Big Data and why it is important? What is your Big
CloudStack and Big Data. Sebastien Goasguen @sebgoa May 22nd 2013 LinuxTag, Berlin
CloudStack and Big Data Sebastien Goasguen @sebgoa May 22nd 2013 LinuxTag, Berlin Google trends Start of Clouds Cloud computing trending down, while Big Data is booming. Virtualization BigData on the Trigger
Oracle Big Data Fundamentals Ed 1 NEW
Oracle University Contact Us: +90 212 329 6779 Oracle Big Data Fundamentals Ed 1 NEW Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
The Future of Data Management with Hadoop and the Enterprise Data Hub
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu
Lecture 4 Introduction to Hadoop & GAE Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Outline Introduction to Hadoop The Hadoop ecosystem Related projects
6.S897 Large-Scale Systems
6.S897 Large-Scale Systems Instructor: Matei Zaharia" Fall 2015, TR 2:30-4, 34-301 bit.ly/6-s897 Outline What this course is about" " Logistics" " Datacenter environment What this Course is About Large-scale
Bringing Big Data to People
Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process
GigaSpaces Real-Time Analytics for Big Data
GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and
#TalendSandbox for Big Data
Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND
CAPTURING & PROCESSING REAL-TIME DATA ON AWS
CAPTURING & PROCESSING REAL-TIME DATA ON AWS @ 2015 Amazon.com, Inc. and Its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent
Hortonworks Data Platform for Hadoop and SAP HANA
Hortonworks Data Platform for Hadoop and SAP HANA Prasad illapani, Big Data & SAP HANA- Product Management & Strategy SAP Labs LLC., Bellevue, WA Bob Page, VP Partner Products, Hortonworks Inc. Palo Alto,
Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
Big Data Course Highlights
Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like
BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS
WHITEPAPER BASHO DATA PLATFORM BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS INTRODUCTION Big Data applications and the Internet of Things (IoT) are changing and often improving our
Roadmap Talend : découvrez les futures fonctionnalités de Talend
Roadmap Talend : découvrez les futures fonctionnalités de Talend Cédric Carbone Talend Connect 9 octobre 2014 Talend 2014 1 Connecting the Data-Driven Enterprise Talend 2014 2 Agenda Agenda Why a Unified
Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015
Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib
Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian Tzolov @christzolov
Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA by Christian Tzolov @christzolov Whoami Christian Tzolov Technical Architect at Pivotal, BigData, Hadoop, SpringXD,
STeP-IN SUMMIT 2014. June 2014 at Bangalore, Hyderabad, Pune - INDIA. Performance testing Hadoop based big data analytics solutions
11 th International Conference on Software Testing June 2014 at Bangalore, Hyderabad, Pune - INDIA Performance testing Hadoop based big data analytics solutions by Mustufa Batterywala, Performance Architect,
