3 Case Studies of NoSQL and Java Apps in the Real World



Similar documents
Mission-Critical Enterprise/Cloud Hybrid Applications

Cloud Scale Distributed Data Storage. Jürmo Mehine

Structured Data Storage

Preparing Your Data For Cloud

GigaSpaces Real-Time Analytics for Big Data

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

So What s the Big Deal?

Open Source Technologies on Microsoft Azure

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Sentimental Analysis using Hadoop Phase 2: Week 2

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Applications for Big Data Analytics

Implement Hadoop jobs to extract business value from large and varied data sets

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Understanding NoSQL Technologies on Windows Azure

High-Availability, Fault Tolerance, and Resource Oriented Computing

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford


NoSQL Data Base Basics

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Can the Elephants Handle the NoSQL Onslaught?

nosql and Non Relational Databases

Advanced Data Management Technologies

Lecture Data Warehouse Systems

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF

NoSQL Databases. Nikos Parlavantzas

Scalable Architecture on Amazon AWS Cloud

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Big Data and Data Science: Behind the Buzz Words

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

How To Scale Out Of A Nosql Database

Introduction to Big Data Training

Microsoft Azure: Opção de Nuvem para Todo o Desenvolvedor. Danilo Bordini & Osvaldo Daibert

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

NoSQL Database Systems and their Security Challenges

Document Oriented Database

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

Big Data Analytics - Accelerated. stream-horizon.com

MapReduce with Apache Hadoop Analysing Big Data

Big Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel Jaykrushna Patel

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

MongoDB Developer and Administrator Certification Course Agenda

MONGODB - THE NOSQL DATABASE

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Crazy NoSQL Data Integration with Pentaho

Big data blue print for cloud architecture

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

How To Use Big Data For Telco (For A Telco)

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Understanding NoSQL on Microsoft Azure

In Memory Accelerator for MongoDB

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

Microsoft Azure Data Technologies: An Overview

Performance and Scalability Overview

Scaling Pinterest. Yash Nelapati Ascii Artist. Pinterest Engineering. Saturday, August 31, 13

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Integrating Big Data into the Computing Curricula

Hurtownie Danych i Business Intelligence: Big Data

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

Architecting Open source solutions on Azure. Nicholas Dritsas Senior Director, Microsoft Singapore

Search and Real-Time Analytics on Big Data

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Luncheon Webinar Series May 13, 2013

Cloud Computing and Big Data What Technical Writers Need to Know

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Performance and Scalability Overview

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

the missing log collector Treasure Data, Inc. Muga Nishizawa

Study and Comparison of Elastic Cloud Databases : Myth or Reality?

Database Scalability and Oracle 12c

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Creating Big Data Applications with Spring XD

appscale: open-source platform-level cloud computing

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Transcription:

Eugene Ciurana geecon@ciurana.eu - pr3d4t0r ##java, irc.freenode.net 3 Case Studies of NoSQL and Java Apps in the Real World This presentation is available from: http://ciurana.eu/geecon-2011

About Eugene... 15+ years building mission-critical, highavailability systems 15+ years of Java work Open source evangelist MapReduce + Hadoop early adopter VP of R&D at badoo.com - largest social network in Europe (120M subscribers worldwide!) State of the art main line of business at the largest companies in the world - not a web guy!

Very Important! Please Ask Questions! (don t be shy)

What Is NoSQL? Database... Horizontally scalable Non-relational Built-in application support Custom file system designed for supporting NoSQL operations Best for non-oltp applications Unstructured data Lower cost than RDBMS

NoSQL Topology Consumer Node Node Node Node Virtual File System logical table management, load balancing, garbage collection (HDFS, GridFS, Hypertable) Tablet Server 0 Tablet Server 1 Tablet Server n Distributed File System FS 0 FS 1 FS 2 FS n

Areas of Application Document storage and management Object databases Graph databases Key/value stores Eventually consistent key/value stores Financial modeling Click stream analytics Simulations Protein folding Distributed sorting or grepping

Brewer s CAP Theorem Relational Key-Value Column-Oriented Document-Oriented Pick any two! Consistency Availability RDBMs (Oracle, MySQL), Aster Data, Green Plum, Vertica C mongodb, Terrastore, Datastore, Hypertable, Hbase, Redis, Berkeley DB, MemcacheDB, Scalaris Pick Any Two P Partition tolerance A Dynamo, Voldemort, Tokyo Cabinet, KAI, Cassandra, SimpleDB, CouchDB, Riak

Three NoSQL Systems mongodb Horizontally scalable Document-oriented database No JOIN operations, no row level locking GigaSpaces XAP Data grid for replacing application servers Event processing model Front-end to various data stores (SQL and NoSQL) Hadoop/Hive/HBase MapReduce framework foundation Optimized for fast search and retrieval Batch model for indexing and processing

mongodb Document-oriented storage Querying via JavaScript or custom APIs for all major programming languages In-place updates for atomicity Any attribute in a document can be indexed Built-in MapReduce Built-in caching BSON ( binary JSON ) document format

mongodb Consumer fail-over mongod Database daemon mongodb Server (master) mongos Sharding daemon mongod Database daemon mongodb Server (slave) mongos Sharding daemon Data Storage Data Storage

GigaSpaces XAP Data persistence Distributed processing Caching Multi-language support NoSQL operations: SQLQuery - SQL-like syntax Persistency - RDBMS through wrapper memcached Task execution and marshalling

GigaSpaces XAP Application Frameworks Java C++.Net Groovy Mule Spring JEE Jetty XAP Management and Monitoring XAP Deployment Virtualization XAP Middleware Virtualization (Virtualized Clustering Layer) RDBMS Memcache DB mongodb

Hadoop and HBase HDFS - distributed high performance file system Runs on top of ext3, HFS+, whatever Alternatives: AWS S3, CloudStore, others MapReduce - framework for running jobs Java or anything that works with stdin, stdout Chukwa - large log analysis framework (not very popular) Hive - Data warehousing, ETL, and SQL-like language HBase - Column-oriented NoSQL database Pig - flat file data analysis

Hadoop and HBase Hive Chukwa PIG ZooKeeper MapReduce HBase HDFS Sqoop Disk Disk Disk Disk

Case Study 1

Case Study 1: Large FI Stock Trades Stock trading system is based on large commercial database It can store only up to 4 weeks of trades Otherwise it s too expensive Inability to run long-term forecasting or trend analysis Robust, Java-based Mule-based - all messaging going through ESB Message playback log

Case Study 1: Large FI Stock Trades Syphon trades as they fly by through the ESB Copy every trade to HDFS Use MapReduce to break the data down for analysis Commit initial analysis to HBase Run queries and further mine data through HBase and MapReduce Data mining and presentation using WEKA Forecasting accuracy increased by 11.3% in the first 180 days of operation for commodity markets

Case Study 2

Service Consumers Large SaaS End Users Browser RSS Outlook CWS EWS Service Providers Various services providers throughout the Internet. Some are public, some are partners Legend HTTP SOAP Custom RPC ODBC/JDBC Direct/API Heavy web services Some XML, some custom Internal Service Providers query reply Search Netezza Lucene Rich Docs (GridFS) Static Files (S3) Firewall Main App CRM Client Relationships App Queue update Internal End Users End Users Dispatcher Custom Queuing System Service Consumers Reporting

Large SaaS Service Consumers Service Providers Various services providers throughout the Internet. Some are public, some are partners End Users Browser RSS Outlook CWS EWS Cloud Firewall New System Acquisition (.Net, PHP, etc.) Internal Service Providers Tomcat App Container Main App (zone Client instance) Relations (Zone Dispatcher Manager) New Apps Static Files (S3) Mule ESB Container: Services, Message Routing, and Transformations Other New Services Local DBs, Other Resource Client Relations Services Rich Documents (GridFS) Dispatcher Services Reporting Main App Services Corporate Firewall OpenMQ Search cron Services m e m c a c h e d Enterprise Services Databases End Users Legend: HTTP Web services (SOAP, REST, JMS, other) JDBC Direct/API/Any

Large SaaS External Service or Consumer Internal Services Tomcat App Container Main App (zone Client instance) Relations (Zone Dispatcher Manager) New Apps Static Files (S3) Mule ESB Container: Services, Message Routing, and Transformations Other New Services Client Relations Services Dispatcher Services Main App Services OpenMQ cron Services m e m c a c h e d Rich Documents (GridFS) Reporting Pig Search Hive Databases HDFS, GridFS, Data Warehouse Hadoop, DB cluster, computational network Cloud-based MapReduce/NoSQL Infrastructure - expand and contract capacity as-needed

Case Study 3

SOBA Labs Ubuntu Landscape REST SOBA interface - implementation is transparent to caller! http://soba.myserver.com/manage/resource sobadb 192.168.0.42 sobaengine localhost Other Consumer 192.168.0.42 REST SOBA interface EC2 web services API Xen XML-RPC API Amazon EC2 Xen Host End-user App ami-322ec65b End-user App ami-322ec65b F i r e w a l l Oracle vm_uuid: b220c8db Xen Python SOBA Python SOBA Agent

SOBA Labs SOBA Data mongodb Config Data (Puppet?) CANONICAL Landscape web services JSON R E S T R E S T Other Application easy integration! JSON Mule-based SOBA Engine abstracts provisioning, configuration, and monitoring through web services Java and Python Web Services Interface web services SOBA Engine Python API dict dict Python Native Application easy integration! EC2 web services API XML EC2 Query Xen XML-RPC API XML R E S T JSON SOBA Agent dict Python JSON R E S T DRY Interface Don't Repeat Yourself! EC2 Data amazon EC2 API Ubuntu Server puppet facter SOBA Agent Xen Server API puppet facter Ubuntu Server Ensemble Agent Rackspace Cloud Servers API Ubuntu Server puppet facter SOBA Agent Provisioning, configuration or monitoring via SOBA is the same regardless of target: Same API call, same data payload, same data format, etc. Implementation is abstracted from the caller!

Plug - Know Any High Caliber Coders? badoo.com is hiring! Top talent - we re very demanding PHP, MySQL developers and sr. developers Java with a Business Intelligence twist for Pentaho and Hadoop Mobile: Android, ios, Blackberry, WAP, JME QA sr. lead - highly technical, web, web services, and mobile 2,000 referral bonus for you if we hire your friend! Paid 90 days after hiring (trial period ends) If your friend can legally work in Russia or the UK, but doesn t live in Moscow or London, we ll work out relocation Contact: geecon@ciurana.eu Contact: jobs@corp.badoo.com

Eugene Ciurana geecon@ciurana.eu - pr3d4t0r ##java, irc.freenode.net http://ciurana.eu/scalablesystems Q&A Comments? Anything else? This presentation is available from: http://ciurana.eu/geecon-2011 Twitter: ciurana