How SourceForge is Using MongoDB

Similar documents
MongoDB Developer and Administrator Certification Course Agenda

In Memory Accelerator for MongoDB

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Zero Downtime Deployments with Database Migrations. Bob Feldbauer

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Open Source Technologies on Microsoft Azure

Scalable Architecture on Amazon AWS Cloud

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011

Apache Hadoop. Alexandru Costan

[Hadoop, Storm and Couchbase: Faster Big Data]

Cloud Scale Distributed Data Storage. Jürmo Mehine

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Big data and urban mobility

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

NOSQL INTRODUCTION WITH MONGODB AND RUBY GEOFF

Peers Techno log ies Pv t. L td. HADOOP

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Applications for Big Data Analytics

INTRODUCTION TO CASSANDRA

Search and Real-Time Analytics on Big Data

Department of Software Systems. Presenter: Saira Shaheen, Dated:

MongoDB. The Definitive Guide to. The NoSQL Database for Cloud and Desktop Computing. Apress8. Eelco Plugge, Peter Membrey and Tim Hawkins

Cloud3DView: Gamifying Data Center Management

Application of NoSQL Database in Web Crawling

Evolution of Web Application Architecture International PHP Conference. Kore Nordmann / <kore@qafoo.com> June 9th, 2015

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

An Approach to Implement Map Reduce with NoSQL Databases

Lecture 21: NoSQL III. Monday, April 20, 2015

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

.NET User Group Bern

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

HO5604 Deploying MongoDB. A Scalable, Distributed Database with SUSE Cloud. Alejandro Bonilla. Sales Engineer abonilla@suse.com

3 Case Studies of NoSQL and Java Apps in the Real World

Cloud Computing mit mathematischen Anwendungen

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

NoSQL in der Cloud Why? Andreas Hartmann

Scaling with MongoDB. by Michael Schurter Scaling with MongoDB by Michael Schurter - OS Bridge,

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

The Synergy Between the Object Database, Graph Database, Cloud Computing and NoSQL Paradigms

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

Humongous MongoDB. Sean Corfield World Singles llc

A Performance Analysis of Distributed Indexing using Terrier

Increasing Business Productivity and Value in Financial Services with Secure Big Data Architecture

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Scalable Web Application

Comparing Scalable NOSQL Databases

EIDA WFCatalog Service!!! Luca Trani and the EIDA Team

Data Modeling for Big Data

nosql and Non Relational Databases

Scaling Pinterest. Yash Nelapati Ascii Artist. Pinterest Engineering. Saturday, August 31, 13

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

Distributed Storage Systems

Tobby Hagler, Phase2 Technology

Big Data Management. Big Data Management. (BDM) Autumn Povl Koch September 16,

NoSQL Database Options

An Open Source NoSQL solution for Internet Access Logs Analysis

A Novel Approach to improve the business using Hadoop and MongoDB

Eclipse Exam Tutorial - Pros and Cons

Hadoop and Map-Reduce. Swati Gore

Certified Apache CouchDB Professional VS-1045

YouTube Vitess. Cloud-Native MySQL. Oracle OpenWorld Conference October 26, Anthony Yeh, Software Engineer, YouTube.

Preparing Your Data For Cloud

Katta & Hadoop. Katta - Distributed Lucene Index in Production. Stefan Groschupf Scale Unlimited, 101tec. sg{at}101tec.com

So What s the Big Deal?

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH

Structured Data Storage

High Availability Using MySQL in the Cloud:

NoSQL: Going Beyond Structured Data and RDBMS

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Big Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel Jaykrushna Patel

Document Oriented Database

Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment

Introduction to Big Data Training

The MongoDB Tutorial Introduction for MySQL Users. Stephane Combaudon April 1st, 2014

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Yahoo! Cloud Serving Benchmark

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

ABSTRACT. Migration of Legacy Web Application Using NoSQL Databases. Pouyan Ghasemi. April, 2013

Data Services Advisory

Big Data & Data Science Course Example using MapReduce. Presented by Juan C. Vega

Learning Management Redefined. Acadox Infrastructure & Architecture

Distributed Databases

Qsoft Inc

SwiftScale: Technical Approach Document

Reference Model for Cloud Applications CONSIDERATIONS FOR SW VENDORS BUILDING A SAAS SOLUTION

XDB. Shared MySQL hosting at Facebook scale. Evan Elias Percona Live MySQL Conference, April 2015

Comparison of the Frontier Distributed Database Caching System to NoSQL Databases

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

Certified MongoDB Professional VS-1058

Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island

Scalable ecommerce with NoSQL. Dipali Trivedi

Cloud Platforms, Challenges & Hadoop. Aditee Rele Karpagam Venkataraman Janani Ravi

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

How To Handle Big Data With A Data Scientist

PaaS - Platform as a Service Google App Engine

Wikimedia architecture. Mark Bergsma Wikimedia Foundation Inc.

Transcription:

How SourceForge is Using MongoDB Rick Copeland @rick446 rick@geek.net Geeknet, page 1

SF.net BlackOps : FossFor.us User Editable! Web 2.0! (ish) Not Ugly! Geeknet, page 2

Moving to NoSQL FossFor.us used CouchDB (NoSQL) Just adding new fields was trivial, and was happening all the time Mark Ramm Scaling up to the level of SF.net needs research CouchDB MongoDB Tokyo Cabinet/Tyrant Cassandra... and others Geeknet, page 3

Rewriting Consume Most traffic on SF.net hits 3 types of pages: Project Summary File Browser Download Pages are read-mostly, with infrequent updates from the Develop side of sf.net Original goal is 1 MongoDB document per project Later split release data because some projects have lots of releases Periodic updates via RSS and AMQP from Develop Geeknet, page 4

Deployment Architecture Load Balancer / Proxy Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 Develop MongoDB Slave MongoDB Slave MongoDB Slave MongoDB Slave Master DB Server MongoDB Master Gobble Server Geeknet, page 5

Deployment Architecture (revised) Load Balancer / Proxy Apache mod_wsgi / TG 2.0 Apache Apache mod_wsgi / TG 2.0 Apache mod_wsgi / TG 2.0 mod_wsgi / TG 2.0 Develop Scalability is good Single-node performance is good, too Master DB Server MongoDB Master Gobble Server Geeknet, page 6

SF.net Downloads Allow non-sf.net projects to use SourceForge mirror network Stats calculated in Hadoop and stored/served from MongoDB Same deployment architecture as Consume (4 web, 1 db) Geeknet, page 7

Allura (SF.net beta devtools) Rewrite developer tools with new architecture Wiki, Tracker, Discussions, Git, Hg, SVN, with more to come Single MongoDB replica set manually sharded by project Release early & often Geeknet, page 8

What We Liked Performance, performance, performance Easily handle 90% of SF.net traffic from 1 DB server, 4 web servers Schemaless server allows fast schema evolution in development, making many migrations unnecessary Replication is easy, making scalability and backups easy Keep a backup slave running Kill backup slave, copy off database, bring back up the slave Automatic re-sync with master Query Language You mean I can have performance without map-reduce? GridFS Geeknet, page 9

Pitfalls Too-large documents Store less per document Return only a few fields Ignoring indexing Watch your server log; bad queries show up there Ignoring your data s schema Using many databases when one will do Using too many queries Geeknet, page 10

Ming an Object-Document Mapper? Your data has a schema Your database can define and enforce it It can live in your application (as with MongoDB) Nice to have the schema defined in one place in the code Sometimes you need a migration Changing the structure/meaning of fields Adding indexes Sometimes lazy, sometimes eager Queuing up all your updates can be handy Python dicts are nice; objects are nicer Geeknet, page 11

Ming Concepts Inspired by SQLAlchemy Group of classes to which you map your collections Each class defines its schema, including indexes Convenience methods for loading/saving objects and ensuring indexes are created Migrations Unit of Work great for web applications MIM Mongo in Memory nice for unit tests Geeknet, page 12

Ming Example from ming import schema from ming.orm import MappedClass from ming.orm import (FieldProperty, ForeignIdProperty, RelationProperty) class WikiPage(MappedClass): class mongometa : session = session name = 'wiki_page' _id = FieldProperty(schema.ObjectId) title = FieldProperty(str) text = FieldProperty(str) comments=relationproperty('wikicomment') MappedClass.compile_all() # Lets ming know about the mapping Geeknet, page 13

Open Source Ming http://sf.net/projects/merciless/ MIT License Allura http://sf.net/p/allura/ Apache License Geeknet, page 14

Future Work mongos New Allura Tools Migrating legacy SF.net projects to Allura Stats all in MongoDB rather than Hadoop? Better APIs to access your project data Geeknet, page 15

Questions? Confidential Geeknet, page 16

Contact Rick Copeland @rick446 rick@geek.net Geeknet, page 17