Monitor Your Key Performance Indicators using WSO2 Business Activity Monitor

Size: px
Start display at page:

Download "Monitor Your Key Performance Indicators using WSO2 Business Activity Monitor"

Transcription

1 Published on WSO2 Inc ( Home > Stories > Monitor Your Key Performance Indicators using WSO2 Business Activity Monitor Monitor Your Key Performance Indicators using WSO2 Business Activity Monitor By kasunw.wso2.com Created :07 Applies To WSO2 BAM Version and above Table of Content Introduction [1] BAM architecture [2] Use case [3] KPIs for this use case [4] Collect information [5] BAM data-agent (Java API) [6] Non Java Data-agent [7] REST API [8] BAM data receiver [9] Cassandra explorer [10] Data Analysis [11] Writing a hive script for analyzing captured data [12] Visualization [13] Introduction Key performance indicators are essential for your organization to measure the performance or success in terms of progress of its goals. Also It is crucial for you to choose the right KPIs with better understanding of what is important to the organization to measure and improve the business productivity of it. WSO2 Business activity monitor has an extremely flexible framework that allows to model your own key performance indicators to suit your monitoring requirement. In this article I am going to explain how you can build your own KPI using WSO2 Business Activity Monitor. BAM architecture Following I have shown the BAM architecture which help us to achieve this greater flexibility.

2 Data that needs to be monitored goes through the above modules and the data flow is as follows. The system that you are going to monitor have the BAM data agents and these agents will capture the required data and send it to BAM. In addition to the data agents users can send data to BAM through the REST API. Data receiver in BAM receive this data and store them in Cassandra data store. Analyzer engine will run the analytics written in hive language according to the monitoring requirement, the summarize data can be persist into RDBMS data store. In the presentation layer summarized data will fetch from RDBMS and show it in the dashboard/reports. Use case I am going to illustrate the capability of defining KPIs by using an use-case. Let's think You have hosted a news website and You need to monitor the traffic comes to your website and analyze it to understand how you can enhance your website and marketing strategies to increase the profit. KPIs for this use case No of unique visitors per day How many visitors does it take for your website to achieve its goals? Your profit will proportional to the no of visitors to your website. If this shows some growth day by day, which means you are making a progress. Most popular news category This category will depends on several factors (time, incidents, gossip, events etc.). Hence analyzing your traffic and identifying the current in demand news categories are essential for you to promote your contents more in those sources and increase the profits. Number of peoples from different locations This will helpful for identifying the regions that have low no of visitors and take some actions to improve the traffic from these areas. Now I am going to illustrate step by step on how to use BAM to define above KPIs. Collect information

3 Website traffic information can be captured and published into BAM using following ways. Using Java API Using client generated by thrift IDL Using REST API BAM data-agent (Java API) BAM provides high performance, low latency, load-balancing (between multiple receivers ), non-blocking and multi-threaded API for sending large amount of business events over various transport (TCP,Http) using Apache Thrift. This API has been provided as a Java SDK and you can use it easily for publishing captured data from your Java based system to BAM for analysis. In addition to that these data-agents are compatible with WSO2 CEP that can be used for real time analysis. Custom data-agent Following I have written a simple asynchronous data-agent for publishing web traffic information to BAM. This will help you to understand the agent API usage. For more information about writing a data publisher for BAM, you can refer this article [1] import org.apache.log4j.logger; import org.wso2.carbon.databridge.agent.thrift.agent; import org.wso2.carbon.databridge.agent.thrift.asyncdatapublisher; import org.wso2.carbon.databridge.agent.thrift.conf.agentconfiguration; import org.wso2.carbon.databridge.agent.thrift.exception.agentexception; import org.wso2.carbon.databridge.commons.event; public class DataAgent { private static Logger logger = Logger.getLogger(DataAgent.class); public static final String ONLINE_NEWS_STATS_DATA_STREAM = "online_news_stats" public static final String VERSION = "1.0.0"; public static final String DATA = "Kasun,Colombo,Sri Lanka vs Australia 3rd ODI,Sports, "Amal,Kaluthara,Businessman killed in Expressway accident,acci "Kamal,Colombo,Navy intelligence investigating boat escape,mil "Kalum,Mathara,No leadership change - JVP,Politics, \ "Nuwan,Galle,Marginal improvement at this year?s O/L exam resu "Sampath,Mathara,Marginal improvement at this year?s O/L exam "Chamath,Mathara,Attempts to replace Minister Sirisena: UNP,Po "Prabath,Colombo,WikiLeaks; LTTE could have threatened Karunan "Amila,Kandy,Private bus owners to launch one-day strike,trans "Budhdhika,Colombo,ICC introduces new 'No ball' playing condit "Rangana,Nuwara Eliya,Annual horse racing event in Nuwara Eliy "Gamini,Colombo,O/L 2012 results released,education, "Tharindu,Colombo,O/L best results,education, \n" + "Janaka,Kaluthara,O/L best results,education, \n" + "Anuranga,Jaffna,Sri Lanka vs Australia 3rd ODI,Sports, "Kasun,Colombo,Marginal improvement at this year?s O/L exam re "Ranga,Kandy,Private bus owners to launch one-day strike,trans "Denis,Kaluthara,ICC introduces new 'No ball' playing conditio "Harsha,Galle,No leadership change - JVP,Politics, \n public static void main(string[] args) {

4 AgentConfiguration agentconfiguration = new AgentConfiguration(); System.setProperty("javax.net.ssl.trustStore", "/opt/wso2bam-2.2.0/reposit System.setProperty("javax.net.ssl.trustStorePassword", "wso2carbon"); Agent agent = new Agent(agentConfiguration); //Using Asynchronous data publisher AsyncDataPublisher asyncdatapublisher = new AsyncDataPublisher("tcp://127. String streamdefinition = "{" + " 'name':'" + ONLINE_NEWS_STATS_DATA_STREAM + "'," + " 'version':'" + VERSION + "'," + " 'nickname': 'News stats'," + " 'description': 'Stats of the news web site'," + " 'metadata':[" + " {'name':'publisherip','type':'string'" + " ]," + " 'payloaddata':[" + " {'name':'username','type':'string'," + " {'name':'region','type':'string'," + " {'name':'news','type':'string'," + " {'name':'tag','type':'string'," + " {'name':'timestamp','type':'long'" + " ]" + ""; asyncdatapublisher.addstreamdefinition(streamdefinition, ONLINE_NEWS_STATS publishevents(asyncdatapublisher); private static void publishevents(asyncdatapublisher datapublisher) { String[] datarow = DATA.split("\n"); for (String row : datarow) { String[] data = row.split(","); Object[] payload = new Object[]{data[0], data[1], data[2], data[3], Long.parseLong(data[4]); Event event = eventobject(null, new Object[]{" ", payload) try { datapublisher.publish(online_news_stats_data_stream, VERSION, even catch (AgentException e) { logger.error("failed to publish event", e); private static Event eventobject(object[] correlationdata, Object[] metadata, Event event = new Event(); event.setcorrelationdata(correlationdata); event.setmetadata(metadata); event.setpayloaddata(payloaddata); return event; Non Java Data-agent Thrift IDL can be used to generate thrift clients from different languages to publish data. In this case asynchronous data publishing implementations will be to developed by the developers the way similar to

5 how it has been implemented in Java API. Languages are supported by thrift C++ C# Cocoa D Delphi Erlang Haskell Java OCaml Perl PHP Python Ruby Smalltalk REST API You can capture the website traffic information from your system and use BAM REST API for publishing those data to BAM via Http transport. BAM data receiver Data send from above data agents will receive by the data receiver which use thrift and internal optimization techniques to achieve very high throughput. Those received data will directly write to Cassandra which is high performance and scalable big data storage. It persists events into a column family with name equal to the stream name. Also BAM data receiver can be used share the events with WSO2 CEP for real time analysis. Cassandra explorer It provides graphical user interface for viewing the data in a Cassandra column family. Now let's see the data published from our sample client. Go to Home? Manage? Cassandra Explorer? Connect to Cluster Type the connection details as below. You can see all the keyspaces and column families are listed in Cassandra explorer ui.

6 Writing This CREATE RDBMS <datasource> From Once Visualization After Also Here Number Popular Visitors References Kasun Hide Analytics Intermediate footer settimeout(function(){var Company Customers Partners Products Middleware Overview Application Carbon Complex Dashboard Data Enterprise Governance Identity IoT Machine Message Microservices Process New Storage Development Developer Cloud Private Managed Service Public API Use Technology IT Business --> Events Webinars Workshops WSO2Con Conferences Resources Library Articles Analyst Case On-Demand Presentations White Documentation Videos Training Support Subscribe Legal Privacy Report 2016 Source Links: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] Challenges div#footer Server Cloud Cases Manager you Analytics Services I Studies Papers have Cloud the Advantage Private _bizo_data_partner_id WSO2 URL: Overview Calendar Weranga Distribution Events Reports var Level News Server & System of Center a Learner Tutorials can problem analyze Broker Process Rules Challenges Cloud EXTERNAL run Training Hive Mobility Service Store Studio hive Visitors Server Factory Platform used Webinars Registry the rw_ext_id= storage Tools Agreement { Framework Category Server PaaS Inc. background-color: Processor script Overview Login newsletter onwards Gunathilake, WSO2 gadget above with and Server Bus Per Manager can to TABLE summarize this Day for analyze script Jaggery '';var generation you be for page a=document.createelement("script");var analyzing configured Java IF = can Senior hive pkbaseurl "7067";(function() the <name>wso2bam_datasource</name> framework[3] data transparent; NOT use read tool above Software according captured EXISTS from data for KPIs. = generating ((" master-datasources.xml from Engineer, Generation OnlineNewsStats to div#ifooter the { your == and KPIs, document.location.protocol)? s deploying = document.getelementsbytagname("script")[0]; { gadget you background-color: hive Inc. storage need b=document.getelementsbytagname("scrip script gadgets in generation (key WSO2BAM/repository/conf/datasource and case Java visualize for architecture data-agent API Analysis information STRING, this summarize a Data-agent receiver hive dashboard. can tool use explorer #000000; script your be (Java case " visualize username configured kpis. for it, API) <description>th font-family: then analyzing You Since your the can from STRING,re those summarize captured find KPIs. Arial; master inform v Author Then go to online_news_stats in the EVENT_KS. You can view the data by going to the column family. After successfully publish the data we can do the analysis. Data Analysis You can run data analytics on captured data by using the BAM analytics framework which is powered by Apache Hadoop for scaling out data processing operations on a large number of data processing nodes, in order to handle large data volumes. WSO2 BAM provides an easy way to do the map/reduce operation on Hadoop by using Apache Hive that provides a simple way to write query and managing large datasets residing in distributed storage using a SQL-like language called HiveQL. KPIs Following I have list down the KPIs that we are going to analyze. No of unique visitors per day Most popular news category No of peoples from different regions

Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo

Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo Sensor Network Messaging Service Hive/Hadoop Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo Contents 1 Introduction 2 What & Why Sensor Network

More information

Assignment # 1 (Cloud Computing Security)

Assignment # 1 (Cloud Computing Security) Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Apache Stratos (incubating) 4.0.0-M5 Installation Guide

Apache Stratos (incubating) 4.0.0-M5 Installation Guide Apache Stratos (incubating) 4.0.0-M5 Installation Guide 1. Prerequisites 2. Product Configuration 2.1 Message Broker Configuration 2.2 Load Balancer Configuration 2.3 Cloud Controller Configuration 2.4

More information

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook Talend Real-Time Big Data Talend Real-Time Big Data Overview of Real-time Big Data Pre-requisites to run Setup & Talend License Talend Real-Time Big Data Big Data Setup & About this cookbook What is the

More information

Join the Lean Wave. Asanka Abeysinghe Director, Solutions Architecture. WSO2, Inc. Friday, July 22, 11

Join the Lean Wave. Asanka Abeysinghe Director, Solutions Architecture. WSO2, Inc. Friday, July 22, 11 Join the Lean Wave Asanka Abeysinghe Director, Solutions Architecture. WSO2, Inc. 1 Asanka Abeysinghe 10 + years industry experience working on projects ranging from desktop, web applications through to

More information

Open Source for Cloud Infrastructure

Open Source for Cloud Infrastructure Open Source for Cloud Infrastructure June 29, 2012 Jackson He General Manager, Intel APAC R&D Ltd. Cloud is Here and Expanding More users, more devices, more data & traffic, expanding usages >3B 15B Connected

More information

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. Hadoop Source Alessandro Rezzani, Big Data - Architettura, tecnologie e metodi per l utilizzo di grandi basi di dati, Apogeo Education, ottobre 2013 wikipedia Hadoop Apache Hadoop is an open-source software

More information

Networks and Services

Networks and Services Networks and Services Dr. Mohamed Abdelwahab Saleh IET-Networks, GUC Fall 2015 TOC 1 Infrastructure as a Service 2 Platform as a Service 3 Software as a Service Infrastructure as a Service Definition Infrastructure

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Open Source Technologies on Microsoft Azure

Open Source Technologies on Microsoft Azure Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions

More information

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon. Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new

More information

Apache Stratos Building a PaaS using OSGi and Equinox. Paul Fremantle CTO and Co- Founder, WSO2 CommiCer, Apache Stratos

Apache Stratos Building a PaaS using OSGi and Equinox. Paul Fremantle CTO and Co- Founder, WSO2 CommiCer, Apache Stratos Apache Stratos Building a PaaS using OSGi and Equinox Paul Fremantle CTO and Co- Founder, WSO2 CommiCer, Apache Stratos @pzfreo #wso2 #apache paul@wso2.com pzf@apache.org 1 About me CTO and Co- Founder

More information

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software? Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software? 可 以 跟 資 料 庫 結 合 嘛? Can Hadoop work with Databases? 開 發 者 們 有 聽 到

More information

ISSN: 2321-7782 (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: 2321-7782 (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Sentimental Analysis using Hadoop Phase 2: Week 2

Sentimental Analysis using Hadoop Phase 2: Week 2 Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular

More information

Big data blue print for cloud architecture

Big data blue print for cloud architecture Big data blue print for cloud architecture -COGNIZANT Image Area Prabhu Inbarajan Srinivasan Thiruvengadathan Muralicharan Gurumoorthy Praveen Codur 2012, Cognizant Next 30 minutes Big Data / Cloud challenges

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Amazon Kinesis and Apache Storm

Amazon Kinesis and Apache Storm Amazon Kinesis and Apache Storm Building a Real-Time Sliding-Window Dashboard over Streaming Data Rahul Bhartia October 2014 Contents Contents Abstract Introduction Reference Architecture Amazon Kinesis

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Processing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems

Processing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems Processing of massive data: MapReduce 2. Hadoop 1 MapReduce Implementations Google were the first that applied MapReduce for big data analysis Their idea was introduced in their seminal paper MapReduce:

More information

Time series IoT data ingestion into Cassandra using Kaa

Time series IoT data ingestion into Cassandra using Kaa Time series IoT data ingestion into Cassandra using Kaa Andrew Shvayka ashvayka@cybervisiontech.com Agenda Data ingestion challenges Why Kaa? Why Cassandra? Reference architecture overview Hands-on Sandbox

More information

Creating a universe on Hive with Hortonworks HDP 2.0

Creating a universe on Hive with Hortonworks HDP 2.0 Creating a universe on Hive with Hortonworks HDP 2.0 Learn how to create an SAP BusinessObjects Universe on top of Apache Hive 2 using the Hortonworks HDP 2.0 distribution Author(s): Company: Ajay Singh

More information

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea Overview Riding Google App Engine Taming Hadoop Summary Riding

More information

WSO2 Message Broker. Scalable persistent Messaging System

WSO2 Message Broker. Scalable persistent Messaging System WSO2 Message Broker Scalable persistent Messaging System Outline Messaging Scalable Messaging Distributed Message Brokers WSO2 MB Architecture o Distributed Pub/sub architecture o Distributed Queues architecture

More information

Mitra Innovation Leverages WSO2's Open Source Middleware to Build BIM Exchange Platform

Mitra Innovation Leverages WSO2's Open Source Middleware to Build BIM Exchange Platform Mitra Innovation Leverages WSO2's Open Source Middleware to Build BIM Exchange Platform May 2015 Contents 1. Introduction... 3 2. What is BIM... 3 2.1. History of BIM... 3 2.2. Why Implement BIM... 4 2.3.

More information

Qsoft Inc www.qsoft-inc.com

Qsoft Inc www.qsoft-inc.com Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5

How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5 Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark

More information

Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot

Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot www.etidaho.com (208) 327-0768 Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot 3 Days About this Course This course is designed for the end users and analysts that

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

SIF 3: A NEW BEGINNING

SIF 3: A NEW BEGINNING SIF 3: A NEW BEGINNING The SIF Implementation Specification Defines common data formats and rules of interaction and architecture, and is made up of two parts: SIF Infrastructure Implementation Specification

More information

Certified Cloud Computing Professional VS-1067

Certified Cloud Computing Professional VS-1067 Certified Cloud Computing Professional VS-1067 Certified Cloud Computing Professional Certification Code VS-1067 Vskills Cloud Computing Professional assesses the candidate for a company s cloud computing

More information

Business Intelligence for Big Data

Business Intelligence for Big Data Business Intelligence for Big Data Will Gorman, Vice President, Engineering May, 2011 2010, Pentaho. All Rights Reserved. www.pentaho.com. What is BI? Business Intelligence = reports, dashboards, analysis,

More information

A Brief Introduction to Apache Tez

A Brief Introduction to Apache Tez A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

the missing log collector Treasure Data, Inc. Muga Nishizawa

the missing log collector Treasure Data, Inc. Muga Nishizawa the missing log collector Treasure Data, Inc. Muga Nishizawa Muga Nishizawa (@muga_nishizawa) Chief Software Architect, Treasure Data Treasure Data Overview Founded to deliver big data analytics in days

More information

Architecting Open source solutions on Azure. Nicholas Dritsas Senior Director, Microsoft Singapore

Architecting Open source solutions on Azure. Nicholas Dritsas Senior Director, Microsoft Singapore Learn. Connect. Explore. Architecting Open source solutions on Azure Nicholas Dritsas Senior Director, Microsoft Singapore Agenda Developing OSS Apps on Azure Customer case with OSS Apps Hadoop on Azure

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A

More information

Big Data Analytics on Cab Company s Customer Dataset Using Hive and Tableau

Big Data Analytics on Cab Company s Customer Dataset Using Hive and Tableau Big Data Analytics on Cab Company s Customer Dataset Using Hive and Tableau Dipesh Bhawnani 1, Ashish Sanwlani 2, Haresh Ahuja 3, Dimple Bohra 4 1,2,3,4 Vivekanand Education, India Abstract Project focuses

More information

Architectures for massive data management

Architectures for massive data management Architectures for massive data management Apache Kafka, Samza, Storm Albert Bifet albert.bifet@telecom-paristech.fr October 20, 2015 Stream Engine Motivation Digital Universe EMC Digital Universe with

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

(55042A) SharePoint 2013 Business Intelligence

(55042A) SharePoint 2013 Business Intelligence (55042A) SharePoint 2013 Business Intelligence OBJECTIVE This three-day instructor-led course provides students with the necessary knowledge to work with all the associated SharePoint business intelligence

More information

Cloud Computing. Adam Barker

Cloud Computing. Adam Barker Cloud Computing Adam Barker 1 Overview Introduction to Cloud computing Enabling technologies Different types of cloud: IaaS, PaaS and SaaS Cloud terminology Interacting with a cloud: management consoles

More information

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce

More information

Big Data Training - Hackveda

Big Data Training - Hackveda Big Data Training - Hackveda Become a Hackveda Certified Big Data Professional - (Beginner) Skill level: Beginner Training fee: INR 9000 only (Topics covered: 108) Chief Trainer: Mr. Devanshu Shukla Training

More information

Big Data Development CASSANDRA NoSQL Training - Workshop. March 13 to 17-2016 9 am to 5 pm HOTEL DUBAI GRAND DUBAI

Big Data Development CASSANDRA NoSQL Training - Workshop. March 13 to 17-2016 9 am to 5 pm HOTEL DUBAI GRAND DUBAI Big Data Development CASSANDRA NoSQL Training - Workshop March 13 to 17-2016 9 am to 5 pm HOTEL DUBAI GRAND DUBAI ISIDUS TECH TEAM FZE PO Box 121109 Dubai UAE, email training-coordinator@isidusnet M: +97150

More information

Oracle Big Data Building A Big Data Management System

Oracle Big Data Building A Big Data Management System Oracle Big Building A Big Management System Copyright 2015, Oracle and/or its affiliates. All rights reserved. Effi Psychogiou ECEMEA Big Product Director May, 2015 Safe Harbor Statement The following

More information

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture Apps and data source extensions with APIs Future white label, embed or integrate Power BI Deploy Intelligent

More information

Savanna Hadoop on. OpenStack. Savanna Technical Lead

Savanna Hadoop on. OpenStack. Savanna Technical Lead Savanna Hadoop on OpenStack Sergey Lukjanov Savanna Technical Lead Mirantis, 2013 Agenda Savanna Overview Savanna Use Cases Roadmap & Current Status Architecture & Features Overview Hadoop vs. Virtualization

More information

FINANCIAL SERVICES: FRAUD MANAGEMENT A solution showcase

FINANCIAL SERVICES: FRAUD MANAGEMENT A solution showcase FINANCIAL SERVICES: FRAUD MANAGEMENT A solution showcase TECHNOLOGY OVERVIEW FRAUD MANAGE- MENT REFERENCE ARCHITECTURE This technology overview describes a complete infrastructure and application re-architecture

More information

Hadoop Job Oriented Training Agenda

Hadoop Job Oriented Training Agenda 1 Hadoop Job Oriented Training Agenda Kapil CK hdpguru@gmail.com Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module

More information

HYBRID CLOUD SUPPORT FOR LARGE SCALE ANALYTICS AND WEB PROCESSING. Navraj Chohan, Anand Gupta, Chris Bunch, Kowshik Prakasam, and Chandra Krintz

HYBRID CLOUD SUPPORT FOR LARGE SCALE ANALYTICS AND WEB PROCESSING. Navraj Chohan, Anand Gupta, Chris Bunch, Kowshik Prakasam, and Chandra Krintz HYBRID CLOUD SUPPORT FOR LARGE SCALE ANALYTICS AND WEB PROCESSING Navraj Chohan, Anand Gupta, Chris Bunch, Kowshik Prakasam, and Chandra Krintz Overview Google App Engine (GAE) GAE Analytics Libraries

More information

Business Intelligence in Microservice Architecture. Debarshi Basak @ bol.com

Business Intelligence in Microservice Architecture. Debarshi Basak @ bol.com Business Intelligence in Microservice Architecture Debarshi Basak @ bol.com What can you expect? - Introduction Monolithic days Mapreduce Era Flink Era Operational Aspect Who am I? Debarshi Basak Software

More information

Big Data with Component Based Software

Big Data with Component Based Software Big Data with Component Based Software Who am I Erik who? Erik Forsberg Linköping University, 1998-2003. Computer Science programme + lot's of time at Lysator ACS At Opera Software

More information

Big Data Course Highlights

Big Data Course Highlights Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like

More information

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science A Seminar report On Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science SUBMITTED TO: www.studymafia.org SUBMITTED BY: www.studymafia.org

More information

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015 Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours

More information

BIG DATA HADOOP TRAINING

BIG DATA HADOOP TRAINING BIG DATA HADOOP TRAINING DURATION 40hrs AVAILABLE BATCHES WEEKDAYS (7.00AM TO 8.30AM) & WEEKENDS (10AM TO 1PM) MODE OF TRAINING AVAILABLE ONLINE INSTRUCTOR LED CLASSROOM TRAINING (MARATHAHALLI, BANGALORE)

More information

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08

More information

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2 DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing Slide 1 Slide 3 A style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet.

More information

Course: SharePoint 2013 Business Intelligence

Course: SharePoint 2013 Business Intelligence Course: SharePoint 2013 Business Intelligence Course Length: 3 days Course Code: M55042 Description This three-day instructor-led course provides students with the necessary knowledge to work with all

More information

SharePoint 2013 Business Intelligence

SharePoint 2013 Business Intelligence Course 55042A: SharePoint 2013 Business Intelligence Course Details Course Outline Module 1: Course Overview This module explains how the class will be structured and introduces course materials and additional

More information

Big Data Analytics in LinkedIn. Danielle Aring & William Merritt

Big Data Analytics in LinkedIn. Danielle Aring & William Merritt Big Data Analytics in LinkedIn by Danielle Aring & William Merritt 2 Brief History of LinkedIn - Launched in 2003 by Reid Hoffman (https://ourstory.linkedin.com/) - 2005: Introduced first business lines

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

APP DEVELOPMENT ON THE CLOUD MADE EASY WITH PAAS

APP DEVELOPMENT ON THE CLOUD MADE EASY WITH PAAS APP DEVELOPMENT ON THE CLOUD MADE EASY WITH PAAS This article looks into the benefits of using the Platform as a Service paradigm to develop applications on the cloud. It also compares a few top PaaS providers

More information

Chase Wu New Jersey Ins0tute of Technology

Chase Wu New Jersey Ins0tute of Technology CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

Hadoop & SAS Data Loader for Hadoop

Hadoop & SAS Data Loader for Hadoop Turning Data into Value Hadoop & SAS Data Loader for Hadoop Sebastiaan Schaap Frederik Vandenberghe Agenda What s Hadoop SAS Data management: Traditional In-Database In-Memory The Hadoop analytics lifecycle

More information

Azure Data Lake Analytics

Azure Data Lake Analytics Azure Data Lake Analytics Compose and orchestrate data services at scale Fully managed service to support orchestration of data movement and processing Connect to relational or non-relational data

More information

PaaS - Platform as a Service Google App Engine

PaaS - Platform as a Service Google App Engine PaaS - Platform as a Service Google App Engine Pelle Jakovits 14 April, 2015, Tartu Outline Introduction to PaaS Google Cloud Google AppEngine DEMO - Creating applications Available Google Services Costs

More information

Data sharing in the Big Data era

Data sharing in the Big Data era www.bsc.es Data sharing in the Big Data era Anna Queralt and Toni Cortes Storage System Research Group Introduction What ignited our research Different data models: persistent vs. non persistent New storage

More information

The Cloud to the rescue!

The Cloud to the rescue! The Cloud to the rescue! What the Google Cloud Platform can make for you Aja Hammerly, Developer Advocate twitter.com/thagomizer_rb So what is the cloud? The Google Cloud Platform The Google Cloud Platform

More information

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based

More information

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc. Beyond Web Application Log Analysis using Apache TM Hadoop A Whitepaper by Orzota, Inc. 1 Web Applications As more and more software moves to a Software as a Service (SaaS) model, the web application has

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc. Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE

More information

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg HDB++: HIGH AVAILABILITY WITH Page 1 OVERVIEW What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 2 OVERVIEW What is Cassandra (C*)?

More information

StratioDeep. An integration layer between Cassandra and Spark. Álvaro Agea Herradón Antonio Alcocer Falcón

StratioDeep. An integration layer between Cassandra and Spark. Álvaro Agea Herradón Antonio Alcocer Falcón StratioDeep An integration layer between Cassandra and Spark Álvaro Agea Herradón Antonio Alcocer Falcón StratioDeep An integration layer between Cassandra and Spark Álvaro Agea Herradón Antonio Alcocer

More information

Development of nosql data storage for the ATLAS PanDA Monitoring System

Development of nosql data storage for the ATLAS PanDA Monitoring System Development of nosql data storage for the ATLAS PanDA Monitoring System M.Potekhin Brookhaven National Laboratory, Upton, NY11973, USA E-mail: potekhin@bnl.gov Abstract. For several years the PanDA Workload

More information

CLOUD COMPUTING & WINDOWS AZURE

CLOUD COMPUTING & WINDOWS AZURE CLOUD COMPUTING & WINDOWS AZURE WORKSHOP Overview This workshop is an introduction to cloud computing and specifically Microsoft s public cloud offering in Windows Azure. Windows Azure has been described

More information

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & Innovation 04-08-2011 to the EC 8 th February, Luxembourg Your Atos business Research technologists. and Innovation

More information

Tracking a Soccer Game with Big Data

Tracking a Soccer Game with Big Data Tracking a Soccer Game with Big Data QCon Sao Paulo - 2015 Asanka Abeysinghe Vice President, Solutions Architecture - WSO2,Inc 2 Story about soccer 3 and Big Data Outline Big Data and CEP Tracking a Soccer

More information

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Accelerating Hadoop MapReduce Using an In-Memory Data Grid Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

Internals of Hadoop Application Framework and Distributed File System

Internals of Hadoop Application Framework and Distributed File System International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop

More information

CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University CS 555: DISTRIBUTED SYSTEMS [SPARK] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Streaming Significance of minimum delays? Interleaving

More information

Portable Scale-Out Benchmarks for MySQL. MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc.

Portable Scale-Out Benchmarks for MySQL. MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc. Portable Scale-Out Benchmarks for MySQL MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc. Continuent 2008 Agenda / Introductions / Scale-Out Review / Bristlecone Performance Testing Tools /

More information

Xiaoming Gao Hui Li Thilina Gunarathne

Xiaoming Gao Hui Li Thilina Gunarathne Xiaoming Gao Hui Li Thilina Gunarathne Outline HBase and Bigtable Storage HBase Use Cases HBase vs RDBMS Hands-on: Load CSV file to Hbase table with MapReduce Motivation Lots of Semi structured data Horizontal

More information

Microsoft Research Microsoft Azure for Research Training

Microsoft Research Microsoft Azure for Research Training Copyright 2014 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

Ernesto Ongaro BI Consultant February 19, 2013. The 5 Levels of Embedded BI

Ernesto Ongaro BI Consultant February 19, 2013. The 5 Levels of Embedded BI Ernesto Ongaro BI Consultant February 19, 2013 The 5 Levels of Embedded BI Saleforce.com CRM 2013 Jaspersoft Corporation. 2 Blogger 2013 Jaspersoft Corporation. 3 Linked In 2013 Jaspersoft Corporation.

More information

Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH 2014-05-15

Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH 2014-05-15 Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH 2014-05-15 2014 Amazon.com, Inc. and its affiliates. All rights

More information

Unified Batch & Stream Processing Platform

Unified Batch & Stream Processing Platform Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built

More information

SharePoint 2013 Business Intelligence Course 55042; 3 Days

SharePoint 2013 Business Intelligence Course 55042; 3 Days Lincoln Land Community College Capital City Training Center 130 West Mason Springfield, IL 62702 217-782-7436 www.llcc.edu/cctc SharePoint 2013 Business Intelligence Course 55042; 3 Days Course Description

More information

File S1: Supplementary Information of CloudDOE

File S1: Supplementary Information of CloudDOE File S1: Supplementary Information of CloudDOE Table of Contents 1. Prerequisites of CloudDOE... 2 2. An In-depth Discussion of Deploying a Hadoop Cloud... 2 Prerequisites of deployment... 2 Table S1.

More information