Learn. Connect. Explore.
Architecting Open source solutions on Azure Nicholas Dritsas Senior Director, Microsoft Singapore
Agenda Developing OSS Apps on Azure Customer case with OSS Apps Hadoop on Azure Customer cases using Hadoop on Azure
Agenda Developing OSS Apps on Azure Customer case with OSS Apps Hadoop on Azure Customer cases using Hadoop on Azure
Flexible
Open Source & Azure Android, ios & Node.js backend via Azure Mobile Services Java, Ruby SDKs via Linux VM, Engine Yard & Oracle Websites for PHP, Node.js, Python & App Gallery MySQL via ClearDB, MongoDB via MongoLab, Hadoop From Linux VMs via Image Gallery & VMDepot
Configuration
Example Technologies What It Provides Example Use Case Key/value stores Redis, Microsoft Azure Tables and Cache Fast access to large amounts of simply structured data Online shopping cart Column family stores Cassandra, HBase Fast access to large amounts of more structured data A table storing web pages Document databases MongoDB, CouchDB Scalable store for JSON documents Persistent store for Node.js application
Agenda Developing OSS Apps on Azure Customer case with OSS Apps Hadoop on Azure Customer cases using Hadoop on Azure
Migrating an end to end airline online system to Azure
Background FlyAir has very aggressive growth plans. As such, they expect their growth rates to be very high and they need to plan for better systems. The current systems are based on OSS. Centos/Ubuntu Linux OS running PHP and MySQL. FlyAir s system consists of the following 4 main areas: B2C, where they host the main web page and consumer interaction for booking or managing flights directly. B2T, where they support the travel agencies and where the majority of the revenue is coming from B2M, mobile users support B2B, for corporate accounts
Migration process We moved all these 4 systems from on premises to Azure in a few weeks. The system is hosted in Singapore Data Center and it consists of a number of Large/Extra Large Ubuntu/CentOS VMs that host PHP for the front end and MySQL for the backend. HA is achieved using Azure Load Balancer, VM Availability sets and MySQL replication. Site to site VPN was established using a Cisco device to support connectivity to on premises LOB systems plus ticketing interface to Amadeus (centralized ticketing system).
Infrastructure view of B2C
Current state and futures System has been running stable and well performant since November 2013. FlyAir plans to add DR site in Hong Kong data center and utilize Traffic Manager and Resource Groups to manage failover/failback process. SCOM and Newrelic tools are used to monitor the sites and manage alerts and resource warnings.
Agenda Developing OSS Apps on Azure Customer case with OSS Apps Hadoop on Azure Customer cases using Hadoop on Azure
Azure HDInsight
HDInsight Supports Hive SQL-like queries on Hadoop data in HDInsight HDInsight provides easy-to-use graphical query interface for Hive HiveQL is a SQL-like language (subset of SQL) Hive structures include well-understood database concepts such as tables, rows, columns, partitions Compiled into MapReduce jobs that are executed on Hadoop Dramatic performance gains with Stinger/Tez Stinger is a Microsoft, Hortonworks and OSS driven initiative to bring interactive queries with Hive Brings query execution engine technology from Microsoft SQL Server to Hive Performance gains up to 100x
HDInsight Supports HBase NoSQL database on data in HDInsight Coordination HMaster Name Node Region Server Region Server Region Server Region Server Job Tracker Data Node Data Node Data Node Data Node Task Tracker Task Tracker Task Tracker Task Tracker
HDInsight Supports Mahout Machine learning library
HDInsight Supports Storm Coming Q4, CY2014 Stream analytics for Near-Real Time processing
Connect Cloud Hadoop With On-premise
Scenarios For Deploying Hadoop As Hybrid
Agenda Developing OSS Apps on Azure Customer case with OSS Apps Hadoop on Azure Customer cases using Hadoop on Azure
Hadoop customer cases 1. Data Broker Company
Company Profile Who is the customer Customer is a Seattle-based cloud software company, focused exclusively on opening access to government data. SaaS government public set platform accessible via web, mobile, and restful interfaces Product details Open Data Platform GovStat insights and analytics API Foundry
Business Problem Project Milestones M1: migration of Open data platform to Azure with 4-6 design validation customers. Scaled down and ramp up as needed. Support and escalation path defined for PFE. ~150 cores and 1.5 TB of data to be served for this phase M2: support up to100 customers. DR, monitoring and alerting enhancements, compliance validation against FISMA/FedRamp. OData integration, Windows 8.NET application, Windows phone.net application, SQL IS integration for willing customers, Windows Azure Marketplace integration and Localization. M3: IS integration completion post GA, OData enhancements, HDInsight integration, Office 365 integration and PaaS transition study. 10 months after M2.
Catalog Published Search API DCAT API Search over: Metadata Dataset contents Filters based on: View/Visualization type Category Tags Geography Sorting over catalog Dataset view on Catalog
Views Four basic visualizations Tabular Maps Charts Calendars Operations Export (CSV, JSON, XLSX, XML/RDF) Group By, Filter, Order By SoQL Requests Create Derived Views Dataset Only Operations: Upsert, Append, Replace CSV upload Can be embedded using the Data Player
The Solution Architecture Technology Landscape: ~120 cores of Ubuntu VMs in Production. ~50 VMs each in staging and production environment. Standard 3-tier web application architecture Web tier is a RoR MVC application Application tier is Java deployed on Jetty, a servlet container REST API access to app layer. JAX-RS with Jersey SODA API Data tier is primarily PostgreSQL NoSQL options for monitoring, central service, rate limiting cache, aggregate cache Deploys Redis, Cassandra, MongoDB for NoSQL Lucene based Orester service for search Zookeeper and ActiveMQ for coordination service, messaging, inter process synchronization, discovery of services Miscellaneous for GeoServer, Monitoring, Alerting Deployment via Chef with azure-knife driver PureFTP for ftp uploads
High Level Component Architecture
High Level Role + Dataflow
Hadoop customer cases 2. Phone tracking and service company
Company 2 is providing technology protection services for mobile phones, consumer electronics, and home appliance devices. Mobile telemetry scenario (uni-directional); data published from protected mobile devices Goal is to predict, detect and potentially mitigate failure conditions Business driver is improving customer claim experience; predicting customer escalation during claim (self-service to agent), etc 6k events/second target (36M / day)
Project Overview
Business use cases
Blob Spooler Predictive Maint. Scoring Cloud ML Ingestion Svc Event Broker Insight Backup & device telemetry Web Role(s) Kafka Alerting Troubleshooting Operational Dashboard Customer Sat Scoring Cloud ML Call-Center and Support-Site logs Orchestration (MDP) Insight CRM Data On-Premises Anonymize & Synchroniz e Azure Storage Model (Re)Training (Cloud ML) Model Publishing Usage Reports & Analytics Curated Data Sets for Self Service Descriptive Analytics Data Exploration
Your Feedback is Important Fill out evaluation of this session and help shape future events. OPTION 1 OPTION 2 OPTION 3: Feedback stations outside the hall