Eugene Ciurana geecon@ciurana.eu - pr3d4t0r ##java, irc.freenode.net 3 Case Studies of NoSQL and Java Apps in the Real World This presentation is available from: http://ciurana.eu/geecon-2011
About Eugene... 15+ years building mission-critical, highavailability systems 15+ years of Java work Open source evangelist MapReduce + Hadoop early adopter VP of R&D at badoo.com - largest social network in Europe (120M subscribers worldwide!) State of the art main line of business at the largest companies in the world - not a web guy!
Very Important! Please Ask Questions! (don t be shy)
What Is NoSQL? Database... Horizontally scalable Non-relational Built-in application support Custom file system designed for supporting NoSQL operations Best for non-oltp applications Unstructured data Lower cost than RDBMS
NoSQL Topology Consumer Node Node Node Node Virtual File System logical table management, load balancing, garbage collection (HDFS, GridFS, Hypertable) Tablet Server 0 Tablet Server 1 Tablet Server n Distributed File System FS 0 FS 1 FS 2 FS n
Areas of Application Document storage and management Object databases Graph databases Key/value stores Eventually consistent key/value stores Financial modeling Click stream analytics Simulations Protein folding Distributed sorting or grepping
Brewer s CAP Theorem Relational Key-Value Column-Oriented Document-Oriented Pick any two! Consistency Availability RDBMs (Oracle, MySQL), Aster Data, Green Plum, Vertica C mongodb, Terrastore, Datastore, Hypertable, Hbase, Redis, Berkeley DB, MemcacheDB, Scalaris Pick Any Two P Partition tolerance A Dynamo, Voldemort, Tokyo Cabinet, KAI, Cassandra, SimpleDB, CouchDB, Riak
Three NoSQL Systems mongodb Horizontally scalable Document-oriented database No JOIN operations, no row level locking GigaSpaces XAP Data grid for replacing application servers Event processing model Front-end to various data stores (SQL and NoSQL) Hadoop/Hive/HBase MapReduce framework foundation Optimized for fast search and retrieval Batch model for indexing and processing
mongodb Document-oriented storage Querying via JavaScript or custom APIs for all major programming languages In-place updates for atomicity Any attribute in a document can be indexed Built-in MapReduce Built-in caching BSON ( binary JSON ) document format
mongodb Consumer fail-over mongod Database daemon mongodb Server (master) mongos Sharding daemon mongod Database daemon mongodb Server (slave) mongos Sharding daemon Data Storage Data Storage
GigaSpaces XAP Data persistence Distributed processing Caching Multi-language support NoSQL operations: SQLQuery - SQL-like syntax Persistency - RDBMS through wrapper memcached Task execution and marshalling
GigaSpaces XAP Application Frameworks Java C++.Net Groovy Mule Spring JEE Jetty XAP Management and Monitoring XAP Deployment Virtualization XAP Middleware Virtualization (Virtualized Clustering Layer) RDBMS Memcache DB mongodb
Hadoop and HBase HDFS - distributed high performance file system Runs on top of ext3, HFS+, whatever Alternatives: AWS S3, CloudStore, others MapReduce - framework for running jobs Java or anything that works with stdin, stdout Chukwa - large log analysis framework (not very popular) Hive - Data warehousing, ETL, and SQL-like language HBase - Column-oriented NoSQL database Pig - flat file data analysis
Hadoop and HBase Hive Chukwa PIG ZooKeeper MapReduce HBase HDFS Sqoop Disk Disk Disk Disk
Case Study 1
Case Study 1: Large FI Stock Trades Stock trading system is based on large commercial database It can store only up to 4 weeks of trades Otherwise it s too expensive Inability to run long-term forecasting or trend analysis Robust, Java-based Mule-based - all messaging going through ESB Message playback log
Case Study 1: Large FI Stock Trades Syphon trades as they fly by through the ESB Copy every trade to HDFS Use MapReduce to break the data down for analysis Commit initial analysis to HBase Run queries and further mine data through HBase and MapReduce Data mining and presentation using WEKA Forecasting accuracy increased by 11.3% in the first 180 days of operation for commodity markets
Case Study 2
Service Consumers Large SaaS End Users Browser RSS Outlook CWS EWS Service Providers Various services providers throughout the Internet. Some are public, some are partners Legend HTTP SOAP Custom RPC ODBC/JDBC Direct/API Heavy web services Some XML, some custom Internal Service Providers query reply Search Netezza Lucene Rich Docs (GridFS) Static Files (S3) Firewall Main App CRM Client Relationships App Queue update Internal End Users End Users Dispatcher Custom Queuing System Service Consumers Reporting
Large SaaS Service Consumers Service Providers Various services providers throughout the Internet. Some are public, some are partners End Users Browser RSS Outlook CWS EWS Cloud Firewall New System Acquisition (.Net, PHP, etc.) Internal Service Providers Tomcat App Container Main App (zone Client instance) Relations (Zone Dispatcher Manager) New Apps Static Files (S3) Mule ESB Container: Services, Message Routing, and Transformations Other New Services Local DBs, Other Resource Client Relations Services Rich Documents (GridFS) Dispatcher Services Reporting Main App Services Corporate Firewall OpenMQ Search cron Services m e m c a c h e d Enterprise Services Databases End Users Legend: HTTP Web services (SOAP, REST, JMS, other) JDBC Direct/API/Any
Large SaaS External Service or Consumer Internal Services Tomcat App Container Main App (zone Client instance) Relations (Zone Dispatcher Manager) New Apps Static Files (S3) Mule ESB Container: Services, Message Routing, and Transformations Other New Services Client Relations Services Dispatcher Services Main App Services OpenMQ cron Services m e m c a c h e d Rich Documents (GridFS) Reporting Pig Search Hive Databases HDFS, GridFS, Data Warehouse Hadoop, DB cluster, computational network Cloud-based MapReduce/NoSQL Infrastructure - expand and contract capacity as-needed
Case Study 3
SOBA Labs Ubuntu Landscape REST SOBA interface - implementation is transparent to caller! http://soba.myserver.com/manage/resource sobadb 192.168.0.42 sobaengine localhost Other Consumer 192.168.0.42 REST SOBA interface EC2 web services API Xen XML-RPC API Amazon EC2 Xen Host End-user App ami-322ec65b End-user App ami-322ec65b F i r e w a l l Oracle vm_uuid: b220c8db Xen Python SOBA Python SOBA Agent
SOBA Labs SOBA Data mongodb Config Data (Puppet?) CANONICAL Landscape web services JSON R E S T R E S T Other Application easy integration! JSON Mule-based SOBA Engine abstracts provisioning, configuration, and monitoring through web services Java and Python Web Services Interface web services SOBA Engine Python API dict dict Python Native Application easy integration! EC2 web services API XML EC2 Query Xen XML-RPC API XML R E S T JSON SOBA Agent dict Python JSON R E S T DRY Interface Don't Repeat Yourself! EC2 Data amazon EC2 API Ubuntu Server puppet facter SOBA Agent Xen Server API puppet facter Ubuntu Server Ensemble Agent Rackspace Cloud Servers API Ubuntu Server puppet facter SOBA Agent Provisioning, configuration or monitoring via SOBA is the same regardless of target: Same API call, same data payload, same data format, etc. Implementation is abstracted from the caller!
Plug - Know Any High Caliber Coders? badoo.com is hiring! Top talent - we re very demanding PHP, MySQL developers and sr. developers Java with a Business Intelligence twist for Pentaho and Hadoop Mobile: Android, ios, Blackberry, WAP, JME QA sr. lead - highly technical, web, web services, and mobile 2,000 referral bonus for you if we hire your friend! Paid 90 days after hiring (trial period ends) If your friend can legally work in Russia or the UK, but doesn t live in Moscow or London, we ll work out relocation Contact: geecon@ciurana.eu Contact: jobs@corp.badoo.com
Eugene Ciurana geecon@ciurana.eu - pr3d4t0r ##java, irc.freenode.net http://ciurana.eu/scalablesystems Q&A Comments? Anything else? This presentation is available from: http://ciurana.eu/geecon-2011 Twitter: ciurana