Architecting Open source solutions on Azure. Nicholas Dritsas Senior Director, Microsoft Singapore



Similar documents
Open Source Technologies on Microsoft Azure

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Assignment # 1 (Cloud Computing Security)

Microsoft Azure: Opção de Nuvem para Todo o Desenvolvedor. Danilo Bordini & Osvaldo Daibert

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

Bringing Big Data to People

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Understanding NoSQL Technologies on Windows Azure

WINDOWS AZURE DATA MANAGEMENT

Course 20533: Implementing Microsoft Azure Infrastructure Solutions

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

Sentimental Analysis using Hadoop Phase 2: Week 2

Microsoft Azure for IT Professionals 55065A; 3 days

Realizing the Benefits of Hybrid Cloud. Anand MS Cloud Solutions Architect Microsoft Asia Pacific

A Comparison of Clouds: Amazon Web Services, Windows Azure, Google Cloud Platform, VMWare and Others (Fall 2012)

Implementing Microsoft Azure Infrastructure Solutions

The Inside Scoop on Hadoop

Developing Microsoft Azure Solutions

Azure Data Lake Analytics

Linux A first-class citizen in Windows Azure. Bruno Terkaly bterkaly@microsoft.com Principal Software Engineer Mobile/Cloud/Startup/Enterprise

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Implementing Microsoft Azure Infrastructure Solutions 20533B; 5 Days, Instructor-led

Course 20533B: Implementing Microsoft Azure Infrastructure Solutions

Building a BI Solution in the Cloud

Understanding NoSQL on Microsoft Azure

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Microsoft Azure Data Technologies: An Overview

Implementing Microsoft Azure Infrastructure Solutions

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Introducing the Reimagined Power BI Platform. Jen Underwood, Microsoft

Microsoft Azure Cloud oplossing als een extensie op mijn datacenter? Frederik Baert Solution Advisor

Scalable Architecture on Amazon AWS Cloud

Developing Microsoft Azure Solutions 20532A; 5 days

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Intel IT s Cloud Journey. Speaker: [speaker name], Intel IT

Oracle Database 12c Plug In. Switch On. Get SMART.

Migrating SaaS Applications to Windows Azure

Upcoming Announcements

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

INTRODUCING WINDOWS AZURE

Microsoft Power BI. Nov 21, 2015

Hadoop in the Hybrid Cloud

Big Data Analytics Nokia

The Move to the Cloud

Cloud Scale Distributed Data Storage. Jürmo Mehine

Open Source for Cloud Infrastructure

Implementing Microsoft Azure Infrastructure Solutions

Oracle Big Data SQL Technical Update

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

PaaS - Platform as a Service Google App Engine

Cloud Big Data Architectures

IBM s Cloud Platform : IBM Bluemix

JAVA IN THE CLOUD PAAS PLATFORM IN COMPARISON

Modernizing Your Data Warehouse for Hadoop

MS 20532B - Developing Microsoft Azure Solutions

BIG DATA TRENDS AND TECHNOLOGIES

IAN MASSINGHAM. Technical Evangelist Amazon Web Services

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Big Data Technologies Compared June 2014

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Microsoft Research Windows Azure for Research Training

Hosting Models. Business Model Software (as a Service) Platform (as a Service) Infrastructure (as a Service) On Premises. Applications. Data.

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

Integrating Big Data into the Computing Curricula

Microsoft Research Microsoft Azure for Research Training

SQL Server 2012 Business Intelligence Boot Camp

Last time. Today. IaaS Providers. Amazon Web Services, overview

Towards Smart and Intelligent SDN Controller

Ad Hoc Analysis of Big Data Visualization

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

HADOOP. Revised 10/19/2015

Embedded Analytics & Big Data Visualization in Any App

Windows Azure and private cloud

CLOUD COMPUTING & WINDOWS AZURE

Big Data Visualization and Dashboards

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

OpenShift 3.0 in the Sogeti Services Factory

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

INTRODUCTION TO CASSANDRA

Enabling Manufacturing Transformation in a Connected World. John Shewchuk Technical Fellow DX

Big Data and Data Science: Behind the Buzz Words

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Azure Day Application Development

Building High Growth Services on the Microsoft Cloud Platform. Rich Cannon Senior Director, US Partner Hosting and Cloud Services

Sisense. Product Highlights.

Red Hat Openshift Christoph Eberle

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

How To Scale Out Of A Nosql Database

Transcription:

Learn. Connect. Explore.

Architecting Open source solutions on Azure Nicholas Dritsas Senior Director, Microsoft Singapore

Agenda Developing OSS Apps on Azure Customer case with OSS Apps Hadoop on Azure Customer cases using Hadoop on Azure

Agenda Developing OSS Apps on Azure Customer case with OSS Apps Hadoop on Azure Customer cases using Hadoop on Azure

Flexible

Open Source & Azure Android, ios & Node.js backend via Azure Mobile Services Java, Ruby SDKs via Linux VM, Engine Yard & Oracle Websites for PHP, Node.js, Python & App Gallery MySQL via ClearDB, MongoDB via MongoLab, Hadoop From Linux VMs via Image Gallery & VMDepot

Configuration

Example Technologies What It Provides Example Use Case Key/value stores Redis, Microsoft Azure Tables and Cache Fast access to large amounts of simply structured data Online shopping cart Column family stores Cassandra, HBase Fast access to large amounts of more structured data A table storing web pages Document databases MongoDB, CouchDB Scalable store for JSON documents Persistent store for Node.js application

Agenda Developing OSS Apps on Azure Customer case with OSS Apps Hadoop on Azure Customer cases using Hadoop on Azure

Migrating an end to end airline online system to Azure

Background FlyAir has very aggressive growth plans. As such, they expect their growth rates to be very high and they need to plan for better systems. The current systems are based on OSS. Centos/Ubuntu Linux OS running PHP and MySQL. FlyAir s system consists of the following 4 main areas: B2C, where they host the main web page and consumer interaction for booking or managing flights directly. B2T, where they support the travel agencies and where the majority of the revenue is coming from B2M, mobile users support B2B, for corporate accounts

Migration process We moved all these 4 systems from on premises to Azure in a few weeks. The system is hosted in Singapore Data Center and it consists of a number of Large/Extra Large Ubuntu/CentOS VMs that host PHP for the front end and MySQL for the backend. HA is achieved using Azure Load Balancer, VM Availability sets and MySQL replication. Site to site VPN was established using a Cisco device to support connectivity to on premises LOB systems plus ticketing interface to Amadeus (centralized ticketing system).

Infrastructure view of B2C

Current state and futures System has been running stable and well performant since November 2013. FlyAir plans to add DR site in Hong Kong data center and utilize Traffic Manager and Resource Groups to manage failover/failback process. SCOM and Newrelic tools are used to monitor the sites and manage alerts and resource warnings.

Agenda Developing OSS Apps on Azure Customer case with OSS Apps Hadoop on Azure Customer cases using Hadoop on Azure

Azure HDInsight

HDInsight Supports Hive SQL-like queries on Hadoop data in HDInsight HDInsight provides easy-to-use graphical query interface for Hive HiveQL is a SQL-like language (subset of SQL) Hive structures include well-understood database concepts such as tables, rows, columns, partitions Compiled into MapReduce jobs that are executed on Hadoop Dramatic performance gains with Stinger/Tez Stinger is a Microsoft, Hortonworks and OSS driven initiative to bring interactive queries with Hive Brings query execution engine technology from Microsoft SQL Server to Hive Performance gains up to 100x

HDInsight Supports HBase NoSQL database on data in HDInsight Coordination HMaster Name Node Region Server Region Server Region Server Region Server Job Tracker Data Node Data Node Data Node Data Node Task Tracker Task Tracker Task Tracker Task Tracker

HDInsight Supports Mahout Machine learning library

HDInsight Supports Storm Coming Q4, CY2014 Stream analytics for Near-Real Time processing

Connect Cloud Hadoop With On-premise

Scenarios For Deploying Hadoop As Hybrid

Agenda Developing OSS Apps on Azure Customer case with OSS Apps Hadoop on Azure Customer cases using Hadoop on Azure

Hadoop customer cases 1. Data Broker Company

Company Profile Who is the customer Customer is a Seattle-based cloud software company, focused exclusively on opening access to government data. SaaS government public set platform accessible via web, mobile, and restful interfaces Product details Open Data Platform GovStat insights and analytics API Foundry

Business Problem Project Milestones M1: migration of Open data platform to Azure with 4-6 design validation customers. Scaled down and ramp up as needed. Support and escalation path defined for PFE. ~150 cores and 1.5 TB of data to be served for this phase M2: support up to100 customers. DR, monitoring and alerting enhancements, compliance validation against FISMA/FedRamp. OData integration, Windows 8.NET application, Windows phone.net application, SQL IS integration for willing customers, Windows Azure Marketplace integration and Localization. M3: IS integration completion post GA, OData enhancements, HDInsight integration, Office 365 integration and PaaS transition study. 10 months after M2.

Catalog Published Search API DCAT API Search over: Metadata Dataset contents Filters based on: View/Visualization type Category Tags Geography Sorting over catalog Dataset view on Catalog

Views Four basic visualizations Tabular Maps Charts Calendars Operations Export (CSV, JSON, XLSX, XML/RDF) Group By, Filter, Order By SoQL Requests Create Derived Views Dataset Only Operations: Upsert, Append, Replace CSV upload Can be embedded using the Data Player

The Solution Architecture Technology Landscape: ~120 cores of Ubuntu VMs in Production. ~50 VMs each in staging and production environment. Standard 3-tier web application architecture Web tier is a RoR MVC application Application tier is Java deployed on Jetty, a servlet container REST API access to app layer. JAX-RS with Jersey SODA API Data tier is primarily PostgreSQL NoSQL options for monitoring, central service, rate limiting cache, aggregate cache Deploys Redis, Cassandra, MongoDB for NoSQL Lucene based Orester service for search Zookeeper and ActiveMQ for coordination service, messaging, inter process synchronization, discovery of services Miscellaneous for GeoServer, Monitoring, Alerting Deployment via Chef with azure-knife driver PureFTP for ftp uploads

High Level Component Architecture

High Level Role + Dataflow

Hadoop customer cases 2. Phone tracking and service company

Company 2 is providing technology protection services for mobile phones, consumer electronics, and home appliance devices. Mobile telemetry scenario (uni-directional); data published from protected mobile devices Goal is to predict, detect and potentially mitigate failure conditions Business driver is improving customer claim experience; predicting customer escalation during claim (self-service to agent), etc 6k events/second target (36M / day)

Project Overview

Business use cases

Blob Spooler Predictive Maint. Scoring Cloud ML Ingestion Svc Event Broker Insight Backup & device telemetry Web Role(s) Kafka Alerting Troubleshooting Operational Dashboard Customer Sat Scoring Cloud ML Call-Center and Support-Site logs Orchestration (MDP) Insight CRM Data On-Premises Anonymize & Synchroniz e Azure Storage Model (Re)Training (Cloud ML) Model Publishing Usage Reports & Analytics Curated Data Sets for Self Service Descriptive Analytics Data Exploration

Your Feedback is Important Fill out evaluation of this session and help shape future events. OPTION 1 OPTION 2 OPTION 3: Feedback stations outside the hall