How To Write A Trusted Analytics Platform (Tap)

Size: px
Start display at page:

Download "How To Write A Trusted Analytics Platform (Tap)"

Transcription

1 Trusted Analytics Platform (TAP) TAP Technical Brief October 2015

2 TAP Technical Brief Overview Trusted Analytics Platform (TAP) is open source software, optimized for performance and security, that accelerates the creation of Cloud-native applications driven by Big Data Analytics. The project brings together many proven and familiar open-source components, integrating them in one vibrant communities behind these open source projects. Purpose In many cases, enterprises struggle to deliver tangible value from Big Data Analytics. The time required to adapt essential technologies and extend them into a custom solution is simply too long. These challenges span data, analytics and application development. TAP has been designed, from the ground up, to reduce these complexities. It provides a shared environment for advanced analytics on Big Data, making it easier for developers to collaborate with data scientists on both public and private Cloud infrastructures with hardware-enhanced performance and security optimizations. TAP was designed with three types of users in mind: data scientists, application developers and system operators. data scientists by eliminating many of the current limitations associated with analytics at scale and help data scientists with the most common tasks. Data Sampling Selecting representative feature/data subset Data Preparation Transform and fuse data sets Model Select/Train Manually select from multiple algorithms Model Validation Simulate at scale resolve issues Model Deployment Deploy into scoring engine, track drift Horizontal Solutions Service discovery and binding Using TAP, data scientists are able to build and train models for Big Data using familiar interfaces like ipython notebooks, RStudio, H2O Flow or Eclipse IDE. TAP provides libraries for graph, deep learning and classical machine learning algorithms. Each algorithm is open source and almost all algorithms are parallelized and can be executed on a distributed processing system, such as Apache Spark or Apache Hadoop with, in some cases, CPU and GPU hardware acceleration.

3 Connectors Processors Stores Analytics Message Brokers & Queues Kadka, RabbitMQ MQTT, WS, REST... Distributed stream & batch data Hadoop, Spark GearPump Optimized data stores HDFS, HBase, PostgressSQL, MySQL, Cassandra, etc... Model deployment, scoring, search, charts, dashboards Spark, Impala, H2O... Key components in TAP for working with data Application developers gain immediate access to a polyglot application platform based on open source Cloud runtimes, combined with dynamically bindable services and expressive APIs, allow application developers to greatly reduce their development time and simplify integration with all the data analytical capabilities developed by data scientist. Many common ingestion protocols, like MQTT, WS and REST as well as message queues based on Apache Kafka and RabbitMQ, are implemented in TAP and allow application developers to easily connect to data processing services, such as Apache Spark. TAP also includes native framework, based on Akka called Gearpump, which delivers distributed and low-latency data processing capabilities. A screenshot of TAP s console marketplace shows some of the available services and data stores For system operators, TAP eliminates many of the complexities normally associated with a secure and scalable Big Data platform. Using the simple administration interface and an open framework for services, system

4 Platform Architecture TAP is a multi-tenant platform designed to ease and accelerate the delivery of end-to-end analytical applications. Its loosely-coupled, layered architecture solution delivery. The Data Layer combines the essential components of any Big Data management system: Distributed File Systems and Distributed Processing Frameworks. Additionally, the Data Layer has a range of data stores each optimized for a particular structure: Key-Value, Relational Database Management Systems, Document-Oriented, Columnar, Graph, Time Series and others. While the number of data stores supported in TAP continues to grow, the exact composition Provisioning, Management, Monitoring Application Layer Custom Apps Built-in Apps Analytics Layer Engines/Frameworks Algorithms, Models & Pipelines Data Layer Distributed Processing Scalable Data Persistence APIs The Analytics Layer includes analytics tools and the API server that translates function calls from the analytics tools to the supporting algorithms for data wrangling and machine learning. Its plugin architecture allows system operators to expand TAP capabilities and automatically expose user-accessible APIs for newly added functions. In this layer, TAP also leverages many of the technologies from its partners, including H2O and Based on Cloud Foundry, the Application Layer exposes various runtimes, messaging frameworks or connectors for dynamic service binding, service brokers for ease of access and a service catalog to enable service discovery. Additionally, the Application Layer supports an extension construct called buildpacks. When developers publish their applications, Cloud Foundry automatically detects which buildpack is required and installs it on the Droplet Execution Agent (DEA) where the application needs to run. At every layer of the platform architecture, performance optimizations have been incorporated to maximize the speed of analytic operations. In parallel, security enhancements, from silicon up, ensure data and operations are protected.

5 Performance Optimizations Data Analytics Acceleration Library (DAAL). extends DAAL to persist the analytical models in its metastore to deliver a consistent end-to-end pipeline for data scientists with broad support for number of Machine Learning algorithms. TAP can also integrate Intel s Math Kernel Library (MKL). MKL accelerates math processing routines that increase application performance and reduce development time. It includes highly-vectorized and threaded Linear Algebra, Fast Fourier Transforms (FFT), Vector Math and Statistics functions. These functions automatically scale on Intel processor architectures by selecting the best code path for each processor generation. Security TAP inherits and extends the OAuth 2.0 authentication mechanisms enabled in Cloud Foundry (CF) by integrating them with the Kerberos authentication mechanisms in Hadoop. TAP integrates these two authentication frameworks using Apache Kerby, which allows a CF application to authenticate its users via OAuth2.0 and convert the OAuth token into a service ticket that is accepted by Kerberos. As a result, the user gets a single-sign on experience across the Data and Application Layers and each one of their action within TAP is performed within the contexts of that user. This deep context awareness allows TAP to audit each activity whether performed in analytical tools or within custom application. Deployment Model TAP supports two deployment models. Organizations that have already deployed Apache Hadoop in their data center (virtualized or on bare metal) can reuse the existing cluster or choose TAP to deploy it. SRV APP TAP APP DATA Hadoop CloudFoundry OTHER CloudFoundry TAP TAP IaaS Hadoop IaaS Physical Hardware Physical Hardware

6 From a networking standpoint, TAP deploys into two subnets. The components within a subnet are isolated and all communication are routed through trusted APIs. Control Accec ss MQ Clus ter Manager Node s Service Nodes Stat e Clus te r Wo rker No des System Operator Command Line or Browser Jump Box (SSH) Data Subnet IOT Devices Gateway or Direct Application User Web, Mobile or Desktop Data Scientist & App Builder ipython, RStudio, H 2O Proxy (MQT QTT) T) Proxy (HTT TTP) API (HTTP) App Controll er Health Manager User Services Run-time App Subnet TAP Deployment Recipes the common ways the platform can be extended. These recipes are reproducible, in many cases with minimal amount of development. Each recipe comprises of number of smaller microservices whose behaviors can be easily customized by the developer. The following diagram demonstrates one such recipe showing for hospital patient re-admission. Data Sources TCP/IP-based Protocols (REST, WS, XMPP, MQTT/AMQP/DDS) Unique Secure Endpoint HAProxy Gateway Go App Message Queue Kafka Intel Analytics Toolkit for Hadoop* Data Store HDFS Model API Re-admission Visualization & Dashboard App

7 TAP Adoption and Contributions TAP has been released as an open source project, including many popular and reliable components supported by a large community. The project, its code and documentation, including a Getting Started Guide with beginning steps for each user, can be found at To learn how a growing number of organizations support, use and are contributing to the TAP project to accelerate their Big Data Analytics application development across a broad range of use cases, visit

IBM Bluemix. The Digital Innovation Platform. Simon Moser (smoser@de.ibm.com) @mosersd

IBM Bluemix. The Digital Innovation Platform. Simon Moser (smoser@de.ibm.com) @mosersd IBM Bluemix The Digital Innovation Platform Simon Moser (smoser@de.ibm.com) @mosersd Who am I? - Senior Technical Staff Member at IBM Research & Development Lab in Böblingen, Germany - Bluemix Application

More information

QuickSpecs. HP Helion Development Platform. Overview

QuickSpecs. HP Helion Development Platform. Overview Overview What is it? The is a service that enables developers to rapidly develop, deploy and scale applications across a mix of public and private clouds. We provide support for applications developed

More information

STREAM ANALYTIX. Industry s only Multi-Engine Streaming Analytics Platform

STREAM ANALYTIX. Industry s only Multi-Engine Streaming Analytics Platform STREAM ANALYTIX Industry s only Multi-Engine Streaming Analytics Platform One Platform for All Create real-time streaming data analytics applications in minutes with a powerful visual editor Get a wide

More information

Dell* In-Memory Appliance for Cloudera* Enterprise

Dell* In-Memory Appliance for Cloudera* Enterprise Built with Intel Dell* In-Memory Appliance for Cloudera* Enterprise Find out what faster big data analytics can do for your business The need for speed in all things related to big data is an enormous

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

Hadoop & Spark Using Amazon EMR

Hadoop & Spark Using Amazon EMR Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?

More information

CLOUD TECH SOLUTION AT INTEL INFORMATION TECHNOLOGY ICApp Platform as a Service

CLOUD TECH SOLUTION AT INTEL INFORMATION TECHNOLOGY ICApp Platform as a Service CLOUD TECH SOLUTION AT INTEL INFORMATION TECHNOLOGY ICApp Platform as a Service Open Data Center Alliance, Inc. 3855 SW 153 rd Dr. Beaverton, OR 97003 USA Phone +1 503-619-2368 Fax: +1 503-644-6708 Email:

More information

Upcoming Announcements

Upcoming Announcements Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within

More information

Big Data Management and Security

Big Data Management and Security Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value

More information

Unified Batch & Stream Processing Platform

Unified Batch & Stream Processing Platform Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built

More information

Bluemix: The Open Platform as a Service

Bluemix: The Open Platform as a Service Jia Tan (tanjia@cn.ibm.com) Senior Software Architect IBM China Software Development Lab Apr 2014 Bluemix: The Open Platform as a Service 2013 IBM Corporation New models of product & service innovation

More information

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016 Big Data Approaches Making Sense of Big Data Ian Crosland Jan 2016 Accelerate Big Data ROI Even firms that are investing in Big Data are still struggling to get the most from it. Make Big Data Accessible

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically

More information

From Spark to Ignition:

From Spark to Ignition: From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

Interactive data analytics drive insights

Interactive data analytics drive insights Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has

More information

SAP HANA Cloud Platform. Technical Overview Uwe Heinz

SAP HANA Cloud Platform. Technical Overview Uwe Heinz SAP HANA Cloud Platform Technical Overview Uwe Heinz The way into the Cloud 75% New IT invests 2016 are cloud based Source: IDC 2013 Report: New Enterprise IT Spend 80% New IT invests Driven by users Source:

More information

Sentinet for BizTalk Server SENTINET

Sentinet for BizTalk Server SENTINET Sentinet for BizTalk Server SENTINET Sentinet for BizTalk Server 1 Contents Introduction... 2 Sentinet Benefits... 3 SOA and APIs Repository... 4 Security... 4 Mediation and Virtualization... 5 Authentication

More information

SEIZE THE DATA. 2015 SEIZE THE DATA. 2015

SEIZE THE DATA. 2015 SEIZE THE DATA. 2015 1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BIG DATA CONFERENCE 2015 Boston August 10-13 Predicting and reducing deforestation

More information

Private Cloud Management

Private Cloud Management Private Cloud Management Speaker Systems Engineer Unified Data Center & Cloud Team Germany Juni 2016 Agenda Cisco Enterprise Cloud Suite Two Speeds of Applications DevOps Starting Point into PaaS Cloud

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

w w w. u l t i m u m t e c h n o l o g i e s. c o m Infrastructure-as-a-Service on the OpenStack platform

w w w. u l t i m u m t e c h n o l o g i e s. c o m Infrastructure-as-a-Service on the OpenStack platform w w w. u l t i m u m t e c h n o l o g i e s. c o m Infrastructure-as-a-Service on the OpenStack platform http://www.ulticloud.com http://www.openstack.org Introduction to OpenStack 1. What OpenStack is

More information

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution WHITEPAPER A Technical Perspective on the Talena Data Availability Management Solution BIG DATA TECHNOLOGY LANDSCAPE Over the past decade, the emergence of social media, mobile, and cloud technologies

More information

Raising Abstractions for the Software Defined Business

Raising Abstractions for the Software Defined Business Smart Process is Smart Business Raising Abstractions for the Software Defined Business Presented to GoTo Chicago, May 12, 2015 Dave Duggal, Managing Director dave@enterpriseweb.com Bill Malyk, Chief System

More information

Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing

Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing Andre Luckow, Peter M. Kasson, Shantenu Jha STREAMING 2016, 03/23/2016 RADICAL, Rutgers, http://radical.rutgers.edu

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Towards Smart and Intelligent SDN Controller

Towards Smart and Intelligent SDN Controller Towards Smart and Intelligent SDN Controller - Through the Generic, Extensible, and Elastic Time Series Data Repository (TSDR) YuLing Chen, Dell Inc. Rajesh Narayanan, Dell Inc. Sharon Aicler, Cisco Systems

More information

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Data Governance in the Hadoop Data Lake. Michael Lang May 2015 Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales

More information

Zend and IBM: Bringing the power of PHP applications to the enterprise

Zend and IBM: Bringing the power of PHP applications to the enterprise Zend and IBM: Bringing the power of PHP applications to the enterprise A high-performance PHP platform that helps enterprises improve and accelerate web and mobile application development Highlights: Leverages

More information

Creating Big Data Applications with Spring XD

Creating Big Data Applications with Spring XD Creating Big Data Applications with Spring XD Thomas Darimont @thomasdarimont THE FASTEST PATH TO NEW BUSINESS VALUE Journey Introduction Concepts Applications Outlook 3 Unless otherwise indicated, these

More information

Open Source for Cloud Infrastructure

Open Source for Cloud Infrastructure Open Source for Cloud Infrastructure June 29, 2012 Jackson He General Manager, Intel APAC R&D Ltd. Cloud is Here and Expanding More users, more devices, more data & traffic, expanding usages >3B 15B Connected

More information

HYBRID CLOUD SUPPORT FOR LARGE SCALE ANALYTICS AND WEB PROCESSING. Navraj Chohan, Anand Gupta, Chris Bunch, Kowshik Prakasam, and Chandra Krintz

HYBRID CLOUD SUPPORT FOR LARGE SCALE ANALYTICS AND WEB PROCESSING. Navraj Chohan, Anand Gupta, Chris Bunch, Kowshik Prakasam, and Chandra Krintz HYBRID CLOUD SUPPORT FOR LARGE SCALE ANALYTICS AND WEB PROCESSING Navraj Chohan, Anand Gupta, Chris Bunch, Kowshik Prakasam, and Chandra Krintz Overview Google App Engine (GAE) GAE Analytics Libraries

More information

EMC Enterprise Hybrid Cloud 2.5, Federation Software-Defined Data Center Edition

EMC Enterprise Hybrid Cloud 2.5, Federation Software-Defined Data Center Edition Solution Guide EMC Enterprise Hybrid Cloud 2.5, Federation Software-Defined Data Center Edition Pivotal CF Platform as a Service Solution Guide EMC Solutions Abstract This Solution Guide describes the

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook Talend Real-Time Big Data Talend Real-Time Big Data Overview of Real-time Big Data Pre-requisites to run Setup & Talend License Talend Real-Time Big Data Big Data Setup & About this cookbook What is the

More information

Cisco Integration Platform

Cisco Integration Platform Data Sheet Cisco Integration Platform The Cisco Integration Platform fuels new business agility and innovation by linking data and services from any application - inside the enterprise and out. Product

More information

What s Happening to the Mainframe? Mobile? Social? Cloud? Big Data?

What s Happening to the Mainframe? Mobile? Social? Cloud? Big Data? December, 2014 What s Happening to the Mainframe? Mobile? Social? Cloud? Big Data? Glenn Anderson IBM Lab Services and Training Today s mainframe is a hybrid system z/os Linux on Sys z DB2 Analytics Accelerator

More information

Using Patterns with WMBv8 and IIBv9

Using Patterns with WMBv8 and IIBv9 Ben Thompson IBM Integration Bus Architect bthomps@uk.ibm.com Using Patterns with WMBv8 and IIBv9 Patterns What is a Pattern, and why do I care? Pattern Example File Record Distribution to WMQ Pattern

More information

3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS

3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS . 3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS Deliver fast actionable business insights for data scientists, rapid application creation for developers and enterprise-grade

More information

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon. Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new

More information

WEBAPP PATTERN FOR APACHE TOMCAT - USER GUIDE

WEBAPP PATTERN FOR APACHE TOMCAT - USER GUIDE WEBAPP PATTERN FOR APACHE TOMCAT - USER GUIDE Contents 1. Pattern Overview... 3 Features 3 Getting started with the Web Application Pattern... 3 Accepting the Web Application Pattern license agreement...

More information

ORACLE MOBILE APPLICATION FRAMEWORK DATA SHEET

ORACLE MOBILE APPLICATION FRAMEWORK DATA SHEET ORACLE MOBILE APPLICATION FRAMEWORK DATA SHEET PRODUCTIVE ENTERPRISE MOBILE APPLICATIONS DEVELOPMENT KEY FEATURES Visual and declarative development Mobile optimized user experience Simplified access to

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information

Delivering secure, real-time business insights for the Industrial world

Delivering secure, real-time business insights for the Industrial world Delivering secure, real-time business insights for the Industrial world Arnaud Mathieu: Program Director, Internet of Things Dev., IBM amathieu@us.ibm.com @arnomath 1 We are on the threshold of massive

More information

NCTA Cloud Operations

NCTA Cloud Operations NCTA Cloud Operations 093018 Lesson 1: Cloud Operations Topic A: Overview of Cloud Computing Solutions Identify the core concepts of cloud computing. Operations Terminology Identify the terminology used

More information

CRITEO INTERNSHIP PROGRAM 2015/2016

CRITEO INTERNSHIP PROGRAM 2015/2016 CRITEO INTERNSHIP PROGRAM 2015/2016 A. List of topics PLATFORM Topic 1: Build an API and a web interface on top of it to manage the back-end of our third party demand component. Challenge(s): Working with

More information

Case Study : 3 different hadoop cluster deployments

Case Study : 3 different hadoop cluster deployments Case Study : 3 different hadoop cluster deployments Lee moon soo moon@nflabs.com HDFS as a Storage Last 4 years, our HDFS clusters, stored Customer 1500 TB+ data safely served 375,000 TB+ data to customer

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

YARN Apache Hadoop Next Generation Compute Platform

YARN Apache Hadoop Next Generation Compute Platform YARN Apache Hadoop Next Generation Compute Platform Bikas Saha @bikassaha Hortonworks Inc. 2013 Page 1 Apache Hadoop & YARN Apache Hadoop De facto Big Data open source platform Running for about 5 years

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

TRAINING PROGRAM ON BIGDATA/HADOOP

TRAINING PROGRAM ON BIGDATA/HADOOP Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,

More information

CF & IoT Protocol Support

CF & IoT Protocol Support CF & IoT Protocol Support Atul Kshirsagar Senior Engineer, GE Software Dedicated Committer, CF Diego Project May 11, 2015 Imagination at work Agenda Protocol landscape in Industrial application Multi protocol

More information

API Management: Powered by SOA Software Dedicated Cloud

API Management: Powered by SOA Software Dedicated Cloud Software Dedicated Cloud The Challenge Smartphones, mobility and the IoT are changing the way users consume digital information. They re changing the expectations and experience of customers interacting

More information

Pluribus Netvisor Solution Brief

Pluribus Netvisor Solution Brief Pluribus Netvisor Solution Brief Freedom Architecture Overview The Pluribus Freedom architecture presents a unique combination of switch, compute, storage and bare- metal hypervisor OS technologies, and

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

Scalable Architecture on Amazon AWS Cloud

Scalable Architecture on Amazon AWS Cloud Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Accelerating Enterprise Big Data Success. Tim Stevens, VP of Business and Corporate Development Cloudera

Accelerating Enterprise Big Data Success. Tim Stevens, VP of Business and Corporate Development Cloudera Accelerating Enterprise Big Data Success Tim Stevens, VP of Business and Corporate Development Cloudera 1 Big Opportunity: Extract value from data Revenue Growth x = 50 Billion 35 ZB Cost Savings Margin

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Xeon+FPGA Platform for the Data Center

Xeon+FPGA Platform for the Data Center Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system

More information

PROPOSAL To Develop an Enterprise Scale Disease Modeling Web Portal For Ascel Bio Updated March 2015

PROPOSAL To Develop an Enterprise Scale Disease Modeling Web Portal For Ascel Bio Updated March 2015 Enterprise Scale Disease Modeling Web Portal PROPOSAL To Develop an Enterprise Scale Disease Modeling Web Portal For Ascel Bio Updated March 2015 i Last Updated: 5/8/2015 4:13 PM3/5/2015 10:00 AM Enterprise

More information

How to choose the right PaaS Platform?

How to choose the right PaaS Platform? How to choose the right PaaS Platform? Rajagopalan. S Senior Solution Architect Wipro Technologies 1 The Problem Which one is suitable for your Enterprise? How do you identify that? 2 Agenda PaaS Landscape

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015 Data Governance in the Hadoop Data Lake Kiran Kamreddy May 2015 One Data Lake: Many Definitions A centralized repository of raw data into which many data-producing streams flow and from which downstream

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Hadoop in the Hybrid Cloud

Hadoop in the Hybrid Cloud Presented by Hortonworks and Microsoft Introduction An increasing number of enterprises are either currently using or are planning to use cloud deployment models to expand their IT infrastructure. Big

More information

Professional Hadoop Solutions

Professional Hadoop Solutions Brochure More information from http://www.researchandmarkets.com/reports/2542488/ Professional Hadoop Solutions Description: The go-to guidebook for deploying Big Data solutions with Hadoop Today's enterprise

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Apache Sentry Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture

More information

PaaS - Platform as a Service Google App Engine

PaaS - Platform as a Service Google App Engine PaaS - Platform as a Service Google App Engine Pelle Jakovits 14 April, 2015, Tartu Outline Introduction to PaaS Google Cloud Google AppEngine DEMO - Creating applications Available Google Services Costs

More information

How Solace Message Routers Reduce the Cost of IT Infrastructure

How Solace Message Routers Reduce the Cost of IT Infrastructure How Message Routers Reduce the Cost of IT Infrastructure This paper explains how s innovative solution can significantly reduce the total cost of ownership of your messaging middleware platform and IT

More information

Integrating Mobile apps with your Enterprise

Integrating Mobile apps with your Enterprise Integrating Mobile apps with your Enterprise Jonathan Marshall marshalj@uk.ibm.com @jmarshall1 Agenda Mobile apps and the enterprise Integrating mobile apps with Enterprise Applications Mobile apps and

More information

WELCOME TO Open Source Enterprise Architecture

WELCOME TO Open Source Enterprise Architecture WELCOME TO Open Source Enterprise Architecture WELCOME TO An overview of Open Source Enterprise Architecture In the integration domain Who we are Fredrik Hilmersson Petter Nordlander Why Open Source Integration

More information

NephOS A Licensed End-to-end IaaS Cloud Software Stack for Enterprise or OEM On-premise Use.

NephOS A Licensed End-to-end IaaS Cloud Software Stack for Enterprise or OEM On-premise Use. NephOS A Licensed End-to-end IaaS Cloud Software Stack for Enterprise or OEM On-premise Use. Benefits High performance architecture Advanced security and reliability Increased operational efficiency More

More information

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013 Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache

More information

Dominik Wagenknecht Accenture

Dominik Wagenknecht Accenture Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna

More information

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS WHITEPAPER BASHO DATA PLATFORM BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS INTRODUCTION Big Data applications and the Internet of Things (IoT) are changing and often improving our

More information

Cloud Security with Stackato

Cloud Security with Stackato Cloud Security with Stackato 1 Survey after survey identifies security as the primary concern potential users have with respect to cloud computing. Use of an external computing environment raises issues

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information

Actian SQL in Hadoop Buyer s Guide

Actian SQL in Hadoop Buyer s Guide Actian SQL in Hadoop Buyer s Guide Contents Introduction: Big Data and Hadoop... 3 SQL on Hadoop Benefits... 4 Approaches to SQL on Hadoop... 4 The Top 10 SQL in Hadoop Capabilities... 5 SQL in Hadoop

More information

Cloud computing - Architecting in the cloud

Cloud computing - Architecting in the cloud Cloud computing - Architecting in the cloud anna.ruokonen@tut.fi 1 Outline Cloud computing What is? Levels of cloud computing: IaaS, PaaS, SaaS Moving to the cloud? Architecting in the cloud Best practices

More information

Qlik Sense Enabling the New Enterprise

Qlik Sense Enabling the New Enterprise Technical Brief Qlik Sense Enabling the New Enterprise Generations of Business Intelligence The evolution of the BI market can be described as a series of disruptions. Each change occurred when a technology

More information

Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes

Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes Contents Pentaho Corporation Version 5.1 Copyright Page New Features in Pentaho Data Integration 5.1 PDI Version 5.1 Minor Functionality Changes Legal Notices https://help.pentaho.com/template:pentaho/controls/pdftocfooter

More information

ORACLE MOBILE SUITE. Complete Mobile Development Solution. Cross Device Solution. Shared Services Infrastructure for Mobility

ORACLE MOBILE SUITE. Complete Mobile Development Solution. Cross Device Solution. Shared Services Infrastructure for Mobility ORACLE MOBILE SUITE COMPLETE MOBILE DEVELOPMENT AND DEPLOYMENT PLATFORM KEY FEATURES Productivity boosting mobile development framework Cross device/os deployment Lightweight and robust enterprise service

More information

Introduction to WebSphere Process Server and WebSphere Enterprise Service Bus

Introduction to WebSphere Process Server and WebSphere Enterprise Service Bus Introduction to WebSphere Process Server and WebSphere Enterprise Service Bus Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 4.0.3 Unit objectives

More information

Unlocking the True Value of Hadoop with Open Data Science

Unlocking the True Value of Hadoop with Open Data Science Unlocking the True Value of Hadoop with Open Data Science Kristopher Overholt Solution Architect Big Data Tech 2016 MinneAnalytics June 7, 2016 Overview Overview of Open Data Science Python and the Big

More information

Building the Internet of Things Jim Green - CTO, Data & Analytics Business Group, Cisco Systems

Building the Internet of Things Jim Green - CTO, Data & Analytics Business Group, Cisco Systems Building the Internet of Things Jim Green - CTO, Data & Analytics Business Group, Cisco Systems Brian McCarson Sr. Principal Engineer & Sr. System Architect, Internet of Things Group, Intel Corp Mac Devine

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

BIG DATA ANALYTICS For REAL TIME SYSTEM

BIG DATA ANALYTICS For REAL TIME SYSTEM BIG DATA ANALYTICS For REAL TIME SYSTEM Where does big data come from? Big Data is often boiled down to three main varieties: Transactional data these include data from invoices, payment orders, storage

More information

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps White provides GRASP-powered big data predictive analytics that increases marketing effectiveness and customer satisfaction with API-driven adaptive apps that anticipate, learn, and adapt to deliver contextual,

More information

Information Architecture

Information Architecture The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to

More information

... ... PEPPERDATA OVERVIEW AND DIFFERENTIATORS ... ... ... ... ...

... ... PEPPERDATA OVERVIEW AND DIFFERENTIATORS ... ... ... ... ... ..................................... WHITEPAPER PEPPERDATA OVERVIEW AND DIFFERENTIATORS INTRODUCTION Prospective customers will often pose the question, How is Pepperdata different from tools like Ganglia,

More information

SIMPLE MACHINE HEURISTIC INTELLIGENT AGENT FRAMEWORK

SIMPLE MACHINE HEURISTIC INTELLIGENT AGENT FRAMEWORK SIMPLE MACHINE HEURISTIC INTELLIGENT AGENT FRAMEWORK Simple Machine Heuristic (SMH) Intelligent Agent (IA) Framework Tuesday, November 20, 2011 Randall Mora, David Harris, Wyn Hack Avum, Inc. Outline Solution

More information

BIRT ihub 3. 2013 Actuate Customer Days. Wow that looks good! Jeff Morris & Mark Gamble

BIRT ihub 3. 2013 Actuate Customer Days. Wow that looks good! Jeff Morris & Mark Gamble BIRT ihub 3 Wow that looks good! Jeff Morris & Mark Gamble SF Nov7 - UK Nov12 - DE Nov13 - FR Nov14 - SG Nov19 - JP Nov22 - NY Dec4 2013 Actuate Customer Days Actuate BIRT ihub 3 Focus Areas Simplified,

More information