Why Zalando trusts in PostgreSQL

Similar documents
Enabling development teams to move fast. PostgreSQL at Zalando

Weaving Stored Procedures into Java at Zalando

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

DATA INTEGRATION. in the world of microservices

Database Scalability and Oracle 12c

Sharding with postgres_fdw

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

Linas Virbalas Continuent, Inc.

Domain driven design, NoSQL and multi-model databases

plproxy, pgbouncer, pgbalancer Asko Oja

Scalable Architecture on Amazon AWS Cloud

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

MongoDB Developer and Administrator Certification Course Agenda

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Can the Elephants Handle the NoSQL Onslaught?

YouTube Vitess. Cloud-Native MySQL. Oracle OpenWorld Conference October 26, Anthony Yeh, Software Engineer, YouTube.

How, What, and Where of Data Warehouses for MySQL

Postgres Plus Advanced Server

Database Scalability {Patterns} / Robert Treat

Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH

Social Networks and the Richness of Data

High Availability Solutions for the MariaDB and MySQL Database

NoSQL Databases. Nikos Parlavantzas

System x solution with open source based Postgres Plus Advanced Server database from EnterpriseDB


FIS GT.M Multi-purpose Universal NoSQL Database. K.S. Bhaskar Development Director, FIS +1 (610)

Integrating Big Data into the Computing Curricula

The elephant called PostgreSQL

How To Scale Out Of A Nosql Database

How graph databases started the multi-model revolution

Introduction to NoSQL and MongoDB. Kathleen Durant Lesson 20 CS 3200 Northeastern University

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

In Memory Accelerator for MongoDB

Real-time reporting at 10,000 inserts per second. Wesley Biggs CTO 25 October 2011 Percona Live

Comparing SQL and NOSQL databases

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

<Insert Picture Here> Oracle In-Memory Database Cache Overview

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.

Customer Bank Account Management System Technical Specification Document

Cloud Based Application Architectures using Smart Computing

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Parallel Replication for MySQL in 5 Minutes or Less

Big data and urban mobility

Search Big Data with MySQL and Sphinx. Mindaugas Žukas

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Instant-On Enterprise

Programming Database lectures for mathema

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

Database migration. from Sybase ASE to PostgreSQL. Achim Eisele and Jens Wilke. 1&1 Internet AG

Database Replication with MySQL and PostgreSQL

Logistics. Database Management Systems. Chapter 1. Project. Goals for This Course. Any Questions So Far? What This Course Cannot Do.

Welcome to Virtual Developer Day MySQL!

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Nick Ashley TOOLS. The following table lists some additional and possibly more unusual tools used in this paper.

Moving From Hadoop to Spark

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Structured Data Storage

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

WHICH POSTGRES IS RIGHT FOR ME?

Open Source Technologies on Microsoft Azure

Getting ready for PostgreSQL 9.1

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

Open Source DBMS CUBRID 2008 & Community Activities. Byung Joo Chung bjchung@cubrid.com

Microsoft Azure Data Technologies: An Overview

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

High Availability and Scalability for Online Applications with MySQL

Hadoop IST 734 SS CHUNG

The Cloud to the rescue!

Scaling Database Performance in Azure

Real-time Data Replication

Agenda. ! Strengths of PostgreSQL. ! Strengths of Hadoop. ! Hadoop Community. ! Use Cases

MySQL. Leveraging. Features for Availability & Scalability ABSTRACT: By Srinivasa Krishna Mamillapalli

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Cloud Powered Mobile Apps with Azure

An Approach to Implement Map Reduce with NoSQL Databases

Document Oriented Database

Open PostgreSQL Monitoring

Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

MongoDB. An introduction and performance analysis. Seminar Thesis

Database Extension 1.5 ez Publish Extension Manual

Tushar Joshi Turtle Networks Ltd

How To Use Big Data For Telco (For A Telco)

Transcription:

Why Zalando trusts in PostgreSQL A developer s view on using the most advanced open-source database Henning Jacobs - Technical Lead Platform/Software Zalando GmbH Valentine Gogichashvili - Technical Lead Platform/Database Zalando GmbH GOTO Berlin 2013, 2013-10-18

Who we are Henning Jacobs <henning.jacobs@zalando.de> with Zalando since 2010 NO DBA, NO PostgreSQL Expert Valentine Gogichashvili <valentine.gogichashvili@zalando.de> with Zalando since 2010 DBA, PostgreSQL Expert 3/37

About Zalando 14 countries > 1 billion revenue 2012 > 150,000 products 3 warehouses Europe's largest online fashion retailer 4/37

Our ZEOS Platform 5/37

Our Main Stack Java PostgreSQL Main/Production CXF-WS SProcWrapper JDBC Tomcat API Schema Data Schemas Postgres Java 7, Tomcat 7, PostgreSQL 9.0 9.3 6/37

Our Main Stacks Different Use Cases same Database Main/Production CRUD/JPA Scripting/Python CXF-WS CXF-WS HTTP/REST SProcWrapper JPA SQLAlchemy JDBC Tomcat JDBC Tomcat psycopg2 CherryPy API Schema Data Schemas Data Schemas Data Schemas Postgres Postgres Postgres Java 7, Tomcat 7, Python 2.7, PostgreSQL 9.0 9.3 7/37

Some Numbers > 90 deployment units (WARs) > 800 production Tomcat instances > 50 different databases > 90 database master instances > 5 TB of PostgreSQL data > 200 developers, 8 DBAs 8/37

Stored Procedures SET search_path... SELECT register_customer(...) API Schemas stored procedures, custom types,... z_api_v13_42 register_customer(..) find_orders(..)... Data Schema(s) tables, views,... register_customer(..) find_orders(..)... z_data customer order order_address... 9/37

Stored Procedures 1:1 Mapping using SProcWrapper @SProcService public interface CustomerSProcService { @SProcCall int registercustomer(@sprocparam String email, @SProcParam Gender gender); } JAVA CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender) RETURNS int AS $$ INSERT INTO z_data.customer (c_email, c_gender) VALUES (p_email, p_gender) RETURNING c_id $$ LANGUAGE 'sql' SECURITY DEFINER; SQL 10/37

Stored Procedures Complex Types @SProcCall List<Order> findorders(@sprocparam String email); JAVA CREATE FUNCTION find_orders(p_email text, OUT order_id int, OUT order_date timestamp, OUT shipping_address order_address) RETURNS SETOF RECORD AS $$ SELECT o_id, o_date, ROW(oa_street, oa_city, oa_country)::order_address FROM z_data.order JOIN z_data.order_address ON oa_order_id = o_id JOIN z_data.customer ON c_id = o_customer_id WHERE c_email = p_email $$ LANGUAGE 'sql' SECURITY DEFINER; SQL 11/37

Stored Procedures Experience Performance benefits Easy to change live behavior Validation close to data Simple transaction scope Makes moving to new software version easy Cross language API layer CRUD JPA 12/37

Stored Procedures Rolling out database changes API versioning - Automatic roll-out during deployment - Java backend selects "right" API version Modularization - shared SQL gets own Maven artifacts - feature/library bundles of Java+SQL DB-Diffs - SQL scripts for database changes - Review process - Tooling support 13/37

Stored Procedures & Constraints Protect Your Most Valuable Asset: Your Data 14/37

Constraints Ensuring Data Quality Simple SQL expressions: CREATE TABLE z_data.host ( SQL h_hostname varchar(63) NOT NULL UNIQUE CHECK (h_hostname ~ '^[a-z][a-z0-9-]*$'), h_ip inet UNIQUE CHECK (masklen(h_ip) = 32), h_memory_bytes bigint CHECK (h_memory_bytes > 8*1024*1024) ); COMMENT ON COLUMN z_data.host.h_memory_bytes IS 'Main memory (RAM) of host in Bytes'; Combining check constraints and stored procedures: CREATE TABLE z_data.customer ( c_email text NOT NULL UNIQUE CHECK (is_valid_email(c_email)),.. ); SQL 15/37

Constraints What about MySQL? The MySQL manual says: The CHECK clause is parsed but ignored by all storage engines. Open Bug ticket since 2004: http://bugs.mysql.com/bug.php?id=3464 http://dev.mysql.com/doc/refman/5.7/en/create-table.html 16/37

Constraints Custom Type Validation using Triggers config_value cv_id : serial cv_key : text cv_value : json cv_type_id : int key: recipientemail value: "john.doe@example.org" config_type ct_id : serial ct_name : text ct_type : base_type ct_regex : text... type: EMAIL_ADDRESS 17/37

Custom Type Validation using Triggers CREATE FUNCTION validate_value_trigger() RETURNS trigger AS $$ BEGIN PERFORM validate_value(new.cv_value, NEW.cv_type_id); END; $$ LANGUAGE 'plpgsql'; SQL CREATE FUNCTION validate_value(p_value json, p_type_id int) RETURNS void AS $$ import json import re #... Python code, see next slide $$ LANGUAGE 'plpythonu'; SQL 18/37

Custom Type Validation with PL/Python class TypeValidator(object): PYTHON @staticmethod def load_by_id(id): tdef = plpy.execute('select * FROM config_type WHERE ct_id = %d' % int(id))[0] return _get_validator(tdef) def check(self, condition, msg): if not condition: raise ValueError('Value does not match the type "%s". Details: %s' % (self.type_name, msg)) class BooleanValidator(TypeValidator): def validate(self, value): self.check(type(value) is bool, 'Boolean expected') validator = TypeValidator.load_by_id(p_type_id) validator.validate(json.loads(p_value)) 19/37

Scaling? 20/37

Sharding Scaling Horizontally App SProcWrapper Shard 1 Cust. A-F Shard 2 Cust. G-K Shard N Cust. Y-Z 21/37

Sharding SProcWrapper Support Sharding customers by ID: @SProcCall int registercustomer(@sprocparam @ShardKey int customerid, @SProcParam String email, @SProcParam Gender gender); JAVA Sharding articles by SKU (uses MD5 hash): @SProcCall Article getarticle(@sprocparam @ShardKey Sku sku); JAVA Collecting information from all shards concurrently: @SProcCall(runOnAllShards = true, parallel = true) List<Order> findorders(@sprocparam String email); JAVA 22/37

Sharding Auto Partitioning Collections @SProcCall(parallel = true) void updatestockitems(@sprocparam @ShardKey List<StockItem> items); JAVA 23/37

Sharding Bitmap Data Source Providers 24/37

Sharding Experience Scalability without sacrificing any SQL features Start with a reasonable number of shards (8) Some tooling required Avoid caching if you can and scale horizontally 25/37

Fail Safety Replication All databases use streaming replication Every database has a (hot) standby and a delayed slave App Service IP WAL WAL Slave with 1 h delay Hot Standby for readonly queries 26/37

Fail Safety Failover Service IP for switching/failover Monitoring with Java and custom web frontend Failover is manual task 27/37

Fail Safety General Remarks Good hardware - G8 servers from HP - ECC RAM, RAID No downtimes allowed - Major PostgreSQL version upgrades? Two data centers Dedicated 24x7 team Maintenance - Concurrent index rebuild, table compactor 28/37

Monitoring You need it... Nagios/Icinga PGObserver pg_view 29/37

Monitoring PGObserver 30/37

Monitoring PGObserver Locate hot spots - - - Frequent stored procedure calls Long running stored procedures I/O patterns Helps tuning DB performance Creating the right indices often a silver bullet 31/37

Monitoring pg_view Top-like command line utility DBA's daily tool Analyze live problems Monitor data migrations 32/37

NoSQL Relational is dead? If your project is not particularly vital and/or your team is not particularly good, use relational databases! Martin Fowler, GOTO Aarhus 2012 People vs. NoSQL, GOTO Aarhus 2012 33/37

NoSQL Relational is dead? The key goals of F1's design are: 3. Consistency: The system must provide ACID transactions, [..] 4. Usability: The system must provide full SQL query support [..] Features like indexes and ad hoc query are not just nice to have, but absolute requirements for our business. F1: A Distributed SQL Database That Scales 34/37

NoSQL Comparison Aggregate oriented? - - Schemaless? - - Scaling? Auth*? SProcWrapper Changes? Implicit Schema HStore, JSON Use cases? "Polyglot Persistence" Schemaless Aggregate-Oriented Document Column-Family Key-Value Graph Introduction to NoSQL, GOTO Aarhus 2012 35/37

YeSQL PostgreSQL at Zalando Fast, stable and well documented code base Performs as well as (and outperforms some) NoSQL databases while retaining deeper exibility Scaling horizontally can be easy Know your tools whatever they are! PostgreSQL and NoSQL 36/37

Thank You! Please Visit also SProcWrapper Java library for stored procedure access github.com/zalando/java-sproc-wrapper PGObserver monitoring web tool for PostgreSQL github.com/zalando/pgobserver pg_view top-like command line activity monitor github.com/zalando/pg_view tech.zalando.com Please rate this session! 37/37