Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates
The Main Idea i Open source technologies are a fundamental part of Microsoft Azure
The Big Questions Commitment? Is Microsoft really serious about supporting open source on Azure? ANSWER Yes: - Its competitors do this - Cloud platforms sell cycles, storage, and bandwidth. Who cares what software uses these?
The Big Questions Customers? Do customers care about open source on Azure? ANSWER Yes: - For running Linux workloads - For open source development on Linux and Windows You can now sell projects that include open source solutions
Open Source on Microsoft Azure Two categories Compute Data IaaS, PaaS, and mobile services SQL, NoSQL, and big data analytics
Compute IaaS, PaaS, and Mobile Services
Microsoft Azure Technologies Compute (1) Infrastructure as a Service (IaaS) Platform as a Service (PaaS) Virtual Machines Cloud Services Web Sites Application VMs Application IIS Application VHDs Gallery Web Roles Worker Roles IIS Application Application VHDs User-Supplied
Microsoft Azure Technologies Compute (2) Mobile Backend as a Service (MBaaS) Mobile Services Authentication Notifications Custom Logic...
Microsoft Azure Compute: IaaS Virtual Machines Virtual Machines VMs Windows Server images provided by Microsoft; Linux images provided by partners VHDs Gallery VHDs User-Supplied Windows Server and Linux images provided by customers
Linux Images: Examples From the Gallery!!!!! Canonical UBUNTU Oracle Linux SUSE Linux Enterprise - Targets enterprises opensuse Linux - Community distro CENTOS by Open Logic - Binary compatible with Red Hat Enterprise Linux Microsoft provides forum-based support for all
Microsoft Azure Compute: PaaS Cloud Services Cloud Services Provides a pre-built, managed environment for running Windows applications Web Roles Application Worker Roles Can install open source software in Windows Server VMs Base images are Windows Server
Microsoft Azure Compute: PaaS Web Sites Web Sites IIS VMs Custom Web Applications FTP, WebDeploy Code HTML TFS, Git Open Source Web Applications From the Web App Gallery Base images are Windows Server
Microsoft Azure Web Sites Example applications and frameworks in the Web App Gallery WordPress Joomla What It Provides Content management system/blogging Content management system Technology Foundation PHP/MySQL PHP/MySQL (and others) MediaWiki Wiki package PHP/MySQL (and others) Apache Tomcat Django Web server/ servlet container Web framework Java Python Express Web framework JavaScript/Node.js
Microsoft Azure Web Sites Supported open source development environments PHP Python Node.js Scripting language and environment for web development General-purpose dynamic programming language Environment for web development with JavaScript.NET and Java applications are also supported
Node.js Basics Node.js can run in Virtual Machines, Cloud Services, and Web Sites JavaScript Client Code HTML/JSON Microsoft Azure JavaScript Server Code Written as a set of event handlers, each dispatched by the Node.js event loop JavaScript Engine Web Browser V8 JavaScript Engine Node.js Created by Google for the Chrome browser VM Same programming language on client and server
Node.js Development tools Works with Node.js on Microsoft Azure Web Sites (today) Cloud9 IDE Visual Studio Online Monaco Web Browser WebMatrix Microsoft Azure JavaScript Server Code V8 JavaScript Egine Many useful modules are available as open source, e.g., Express Node.js Tools for Visual Studio Windows Node.js VM
Microsoft Azure Compute: MBaaS Mobile Services Using identities from Microsoft, Google, etc. Using services from Microsoft or Apple Built with Node.js or.net User Authentication Notifications Data Access Custom Code Microsoft Azure Mobile Services REST/JSON Windows Store Apps Windows Phone Apps ios Apps Android Apps Microsoft provides SDKs for all
Which Compute Technology Should You Choose? Comparing the open source options Use Mobile Services to support mobile apps Microsoft Azure Virtual Machines (IaaS) Pros: Linux and Windows support; complete control of the VM Cons: Complete responsibility for VM management Microsoft Azure Cloud Services (PaaS) Pros: Least amount of required management Cons: Windows only; must reinstall OSS software each time a VM starts Microsoft Azure Web Sites (PaaS) Pros: Easiest to use; built-in support for many options Cons: Windows only; using more than the options Microsoft provides takes extra work
Data SQL, NoSQL, and Big Data Analytics
Microsoft Azure Technologies Data Operational Data Analytical Data Key/Value Store (Tables, Redis, ) NoSQL Technologies Column Family Store (HBase, Cassandra, ) Big Data Analytics (HDInsight, Hadoop) Document Store (DocumentDB, MongoDB, ) Graph Database (Neo4J, ) Provided by Microsoft Azure SQL Technologies Relational Database (SQL Database, SQL Server, Oracle, MySQL, ) Relational Reporting (SQL Server, Oracle, MySQL, ) Relational Analytics (SQL Server, Oracle, MySQL, ) Runs in Microsoft Azure Virtual Machines Runs in Microsoft Azure Virtual Machines and is open source
The Relational Data Model A quick summary MySQL, etc. can run in a Microsoft Azure Virtual Machines VM Application SQL Query Database Table Table Table Schema Relation ClearDB provides MySQL as a managed service on Microsoft Azure
Why Use a NoSQL Technology on Microsoft Azure? To scale for lots of users and lots of data To work better with different data formats, e.g., JSON To work with data in a more flexible way To analyze lots of data in parallel Pros: NoSQL technologies can offer more scalability than relational databases Pros: Avoiding object/relational mapping makes code easier to write Pros: NoSQL technologies don t have fixed schemas Pros: Hadoop has a large and growing ecosystem of tools and people Cons: Often lose some benefits of relational databases, e.g., secondary indexes, full transactions Cons: Persistent data designed for a single application is harder to share; limited BI tools Cons: Fixed schemas help prevent errors; data often isn t normalized Cons: Moving lots of onpremises data to Microsoft Azure can take time
Key/Value Stores Example: Redis NoSQL technologies are typically deployed in Microsoft Azure Linux VMs Database Shard Shard Shard Application B 3 A 1 A 2 B 1 B 2 C 1 C 2 A 3 B 3 C 3 Key Value (String, List, Set, Hash)
Comparing Operational NoSQL Technologies Key/value stores Example Technologies What It Provides Example Use Case Key/value stores Redis, Tables Fast access to large amounts of simply structured data Online shopping cart
Column Family Stores Example: Cassandra Keyspace Columns store multiple timestamped versions of a value Column Family Column Family Application X B 5 5 X A B 1 2 5 2 3 Y C B 4 7 8 4 Column Family Name Column Row Key Row Column Name Value
Comparing Operational NoSQL Technologies Column family stores Example Technologies What It Provides Example Use Case Key/value stores Redis, Tables Fast access to large amounts of simply structured data Online shopping cart Column family stores Cassandra, HBase Fast access to large amounts of more structured data A table storing web pages
Document Databases Example: MongoDB Database Can create indexes on multiple keys Targets a specific collection Shard Shard Shard Collection Collection Application Query JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON Document MongoLab provides a managed database service based on MongoDB for Microsoft Azure
Comparing Operational NoSQL Technologies Document databases Example Technologies What It Provides Example Use Case Key/value stores Redis, Tables Fast access to large amounts of simply structured data Online shopping cart Column family stores Cassandra, HBase Fast access to large amounts of more structured data A table storing web pages Document databases MongoDB, DocumentDB Scalable store for JSON documents Persistent store for Node.js application
JSON Storage for JavaScript Applications The complete picture Microsoft Azure JavaScript Application Web Browser Native Apps JSON JSON JavaScript Server Code Node.js Query JSON Mongo DB Collection JSON JSON JSON PC/Phone/Tablet VM VM
Graph Database Example: Neo4J A B D X Y Relationships are first-class data items Application Query B D Graph Graph Node Relationship Property Hard to shard, so not intended for massive scale
Comparing Operational NoSQL Technologies Graph databases Example Technologies What It Provides Example Use Case Key/value stores Redis, Tables Fast access to large amounts of simply structured data Online shopping cart Column family stores Cassandra, HBase Fast access to large amounts of more structured data A table storing web pages Document databases MongoDB, DocumentDB Scalable store for JSON documents Persistent store for Node.js application Graph databases Neo4J Fast access to data organized into graphs Social graph
Big Data Analytics Some members of the Hadoop technology family Hadoop 2.0 adds YARN, supporting frameworks other than MapReduce Hadoop Distributed File System (HDFS) Hadoop MapReduce Hive/Pig HBase Allows storing and accessing very large binary files across a cluster of commodity servers and disk drives Supports applications that process large amounts of analytical data in parallel Data is typically stored in HDFS Tools for querying, transforming, and analyzing data Both generate MapReduce jobs Column family store built on HDFS Designed for operational data, not analytical data
Big Data Analytics Example: A Hadoop cluster on Microsoft Azure Virtual Machines Provides HiveQL, a SQLlike query language Excel Microsoft allows submitting HiveQL queries from Excel Hive Pig... Hadoop MapReduce Job Hadoop Distributed File System (HDFS) Logic Logic Logic Data Data Data
Big Data Analytics Example: HDInsight Using HDInsight will typically make more sense than building your own Hadoop cluster on Microsoft Azure Excel Hive Pig... Hadoop MapReduce Job HDFS API VM VM VM Logic Logic Logic Data Data Data Microsoft Azure Blobs
Conclusion i Open source technologies are a fundamental part of Microsoft Azure Microsoft is serious about open source on Microsoft Azure Compute: Support for many open source languages, frameworks, and applications Data: The ability to use relational and NoSQL technologies Not even Microsoft can ignore the open source world
About the Speaker David Chappell is Principal of Chappell & Associates (www.davidchappell.com) in San Francisco, California. Through his speaking, writing, and consulting, he helps people around the world understand, use, and make better decisions about new technology. David has been the keynote speaker for more than a hundred events and conferences on five continents, and his seminars have been attended by tens of thousands of business and IT leaders, architects, and developers in forty-five countries. His books have been published in a dozen languages and used regularly in courses at MIT, ETH Zurich, and other universities. In his consulting practice, he has helped clients such as Hewlett-Packard, IBM, Microsoft, Stanford University, and Target Corporation adopt new technologies, market new products, and educate their customers and staff. Earlier in his career, David wrote networking software, chaired a U.S. national standards working group, and played keyboards with the Peabody-award-winning Children s Radio Theater. He holds a B.S. in Economics and an M.S. in Computer Science, both from the University of Wisconsin-Madison.
Copyright 2014 Chappell & Associates www.davidchappell.com @DChappellAssoc