Lessons learned How we use Erlang to analyze millions of messages per day



Similar documents
Building highly available systems in Erlang. Joe Armstrong

Messaging with Erlang and Jabber

THE BASIC SERVER IN ERLANG

Erlang in Production. I wish I'd known that when I started Or This is nothing like the brochure :-(

Graylog2 Lennart Koopmann, OSDC /

A (Web) Face for Radio. NPR and Drupal7 David Moore

Example of Standard API

A Look through the Android Stack

Application Monitor Application (APPMON)

Lua as a business logic language in high load application. Ilya Martynov ilya@iponweb.net CTO at IPONWEB

Linux Driver Devices. Why, When, Which, How?

How To Write An Erlang Based Software Update Platform For Mobile Devices

Erlang at Facebook. Eugene Letuchy. Apr 30, 2009

Erlang a platform for developing distributed software systems

Inside the Erlang VM

Effective Java Programming. efficient software development

HOSTING PYTHON WEB APPLICATIONS. Graham Dumpleton PyCon Australia Sydney 2011

Typeset in L A TEX from SGML source using the DocBuilder Document System.

Ping Device Driver Help Schneider Electric

Integrating Online Banking and Top-up Card into Payment Gateway

A CMS framework in Erlang. Marc Worrell WhatWebWhat

White Paper. Java versus Ruby Frameworks in Practice STATE OF THE ART SOFTWARE DEVELOPMENT 1

Application of Android OS as Real-time Control Platform**

Change Color for Export from Light Green to Orange when it Completes with Errors (31297)

Chapter 6, The Operating System Machine Level

Microservices and Erlang/OTP. Christoph Iserlohn

Real-time Big Data Analytics with Storm

Deployment patterns for Fusion Middleware. a best practice session by Simon Haslam & Jacco H. Landlust

JBoss & Infinispan open source data grids for the cloud era

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Operating System Monitor Application (OS MON)

Automated Performance Testing of Desktop Applications

Kerberos + Android. A Tale of Opportunity. Slide 1 / 39. Copyright 2012 yassl

cubesql ReadMe SQLabs, All rights reserved.

Scaling to Millions of Simultaneous Connections

Basic TCP/IP networking knowledge of client/server concepts Basic Linux commands and desktop navigation (if don't know we will cover it )

Scaling Pinterest. Yash Nelapati Ascii Artist. Pinterest Engineering. Saturday, August 31, 13

Server & Application Monitor

Enterprise Application Monitoring with

Disco: Beyond MapReduce

Connecting UniOP to Telemecanique PLC s

Research and Design of Universal and Open Software Development Platform for Digital Home

Using NSM for Event Notification. Abstract. with DM3, R4, and Win32 Devices

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

Creating a More Secure Device with Windows Embedded Compact 7. Douglas Boling Boling Consulting Inc.

Performance Analysis of Android Platform

Acronis Backup & Recovery: Events in Application Event Log of Windows

High Availability Solutions for the MariaDB and MySQL Database

Preparing for the Big Oops! Disaster Recovery Sites for MySQL. Robert Hodges, CEO, Continuent MySQL Conference 2011

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Multiprogramming. IT 3123 Hardware and Software Concepts. Program Dispatching. Multiprogramming. Program Dispatching. Program Dispatching

Bigdata High Availability (HA) Architecture

By Wick Gankanda Updated: August 8, 2012

Vanilla44 New Features

Deploying and Monitoring Ruby on Rails A practical guide

CDH installation & Application Test Report

Data and Machine Architecture for the Data Science Lab Workflow Development, Testing, and Production for Model Training, Evaluation, and Deployment

How To Monitor A Server With Zabbix

OpenACC 2.0 and the PGI Accelerator Compilers

mguard Device Manager Release Notes Version 1.6.1

Modern Web Development From Angle Brackets to Web Sockets

E21 Mobile Users Guide


Kony MobileFabric. Sync Windows Installation Manual - WebSphere. On-Premises. Release 6.5. Document Relevance and Accuracy

Authorize.Net. Advanced Integration Method. Miva Merchant Module. Documentation for module version Last Updated: 5/07/03

Resource Monitoring During Performance Testing. Experience Report by Johann du Plessis. Introduction. Planning for Monitoring

Outside In Image Export Technology SDK Quick Start Guide

MuL SDN Controller HOWTO for pre-packaged VM

Scaling Graphite Installations

Ensuring scalability and performance with Drupal as your audience grows

Mark Bennett. Search and the Virtual Machine

Guideline for stresstest Page 1 of 6. Stress test

Performance Testing of a Cloud Service

CGL Architecture Specification

Masters in Human Computer Interaction

Dave Stokes MySQL Community Manager

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers

Masters in Advanced Computer Science

A Generic Game Server

Sample copy. Introduction To WebLogic Server Property of Web 10.3 Age Solutions Inc.

Log Management with Open-Source Tools. Risto Vaarandi rvaarandi 4T Y4H00 D0T C0M

Masters in Artificial Intelligence

Exploring Amazon EC2 for Scale-out Applications

Masters in Networks and Distributed Systems

Oracle Enterprise Manager 12c New Capabilities for the DBA. Charlie Garry, Director, Product Management Oracle Server Technologies

CentOS. Apache. 1 de 8. Pricing Features Customers Help & Community. Sign Up Login Help & Community. Articles & Tutorials. Questions. Chat.

Dollar Universe SNMP Monitoring User Guide

What is a stack? Do I need to know?

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

How To Fix A Fault Notification On A Network Security Platform (Xc) (Xcus) (Network) (Networks) (Manual) (Manager) (Powerpoint) (Cisco) (Permanent

FTP Client Engine Library for Visual dbase. Programmer's Manual

What's new in httpd 2.2?

Transcription:

Erlang Factory Lite Paris Lessons learned How we use Erlang to analyze millions of messages per day

A few words about us What we do with Erlang

A few words about us Semiocast processes social media conversations to provide analytics and market research insights»»»»»

Semiocast s offer Quantitative studies ad hoc Barometers Shares of social media conversations Sentiment analysis and clustering Topic identification Ad hoc quantitative indicators Qualitative studies ad hoc Monitoring Consumer insights Verbatim research Clustering of conversations Mapping of communities and influence analysis Enumeration of social media conversation spaces Real-time alerts Daily/Weekly/Monthly reports Crisis monitoring Tools Social media monitoring platform (Semioboard) Technology as a service (API)

Semioboard

Semioboard

Live analysis of comments on TV debates

How we ended up using Erlang

How we ended up using Erlang Discovered Erlang when getting WiFi rabbits to talk to each other over XMPP (ejabberd) in 2007 Taught OCaml in 2004 Three reasons why we chose Erlang : - hot code change and inspection - fault-tolerance - happy to do functional programming (gave us a break from Java and C++)

How we use Erlang 1352 OTP releases - 47 applications - 100K lines of Erlang (without tests) {release, {"Semiocast OTP", "1352"}, {erts, "5.8.4"}, [ % erts 5.8.4 {kernel, "2.14.4"}, {stdlib, "1.17.4"}, {mnesia, "4.4.18"}, {inets, "5.5.2"}, {sasl, "2.1.9.3"}, {crypto, "2.0.2.1"}, {snmp, "4.19"}, {otp_mibs, "1.0.6"}, {ssl, "4.1.5"}, {public_key, "0.12"}, {xmerl, "1.2.8"}, {compiler, "4.7.3"}, {runtime_tools, "1.8.5"}, {syntax_tools, "1.6.7"}, % Other libs {erlsom, "1.2.1"}, {mochiweb, "0.167.10"}, {nitrogen, "2.0.20100531.14"}, {nprocreg, "0.1"}, {simple_bridge, "1.0.2"}, - 11K lines of C/C++ (mostly glue) - 1K lines of Java (glue) ~50 ungraduated applications for - prototyping - short lived projects - web-based/command line tools that run on dev machines ]}. % Semiocast {analyzer, "22", load}, {alien_models, "50", load}, {alien_uniform_streams, "7", load}, {api_server, "109", load}, {aspell, "4", load}, {binlog, "26", load}, {certificate_authority, "8", load}, {chasen, "11", load}, {chinese_segmenter, "1", load}, {commonlib, "250", permanent}, % always start commonlib. {ctl, "29", permanent}, % always start ctl. {dashboard_engine, "55", load}, {dashboard_storage, "56", load}, {dashboard_website, "226", load}, {developer_website, "36", load}, {engine, "455", load}, {gate, "79", load}, {geodb, "5", load}, {geoip, "2", load}, {image_magick, "8", load}, {kdtree, "3", load}, {kqueue, "1", load}, {link_grammar_parser, "12", load}, {memcached, "5", load}, {mysql, "8", load}, {nagios, "6", permanent}, % always start nagios. {opennlp, "8", load}, {pgsql, "2", load}, {pubsubhubbub, "4", load}, {qr_website, "1", load}, {re2, "1", load}, {s_http, "42", load}, {setproctitle, "2", permanent}, % always start setproctitle. {sink, "15", load}, {sqlite, "9", load}, {storage, "226", load}, {svg, "12", load}, {svm, "1", load}, {text_ident, "27", load}, {text_proc, "62", load}, {titema_website, "7", load}, {url_server, "8", load}, {uuid, "6", load}, {web_common, "14", load}, {web_gate, "8", load}, {web_storage, "5", load}, {wikimedia, "6", load}

A few things we wish we had known about Erlang A few things we wish we had known about Erlang

A few things we wish we had known about Erlang Mistake #1: Creating an erlang process to do a lot of work - processes should spend most of their time waiting for messages (gen_server), or do some intensive work and quickly exit when done (spawn_link) - when required, benchmark, as message passing with the worker process can prove expensive Self = self(), Self spawn_link(fun() = self(), -> Self! {language, analyze_language(text, MD0, Mode)} end), spawn_link(fun() -> -> Self! {language, {location, analyze_language(text, analyze_location(md0, MD0, Mode)} Mode)} end), end), spawn_link(fun() {NProcessedLang, -> Language} Self! {location, = receive {language, analyze_location(md0, RespLa} -> Mode)} RespLa end), end, {NProcessedLang, {NProcessedLoc, Language} Location} = receive {language, {location, RespLa} RespLoc} -> -> RespLa RespLoc end, end, {NProcessedLoc, Location} = receive {location, RespLoc} -> RespLoc end, Faster {NProcessedLang, Language} = analyze_language(text, MD0, Mode), {NProcessedLang, {NProcessedLoc, Language} Location} = analyze_language(text, analyze_location(md0, Mode), MD0, Mode), {NProcessedLoc, Location} = analyze_location(md0, Mode),

A few things we wish we had known about Erlang Mistake #2: Creating a lot of processes for parallelized computing - having more worker processes than schedulers does not make sense - it can actually hurt, because processes waiting for a reply may not have it in time and will fail with a timeout

Thinking OTP Thinking OTP

Thinking OTP Mistake #3: Starting processes outside the supervision tree - gen_server & co. should be used everywhere, except for very short lived processes (that won t be upgraded) - Every gen_server should be started from a supervisor - A real-world supervision design requires process_flag(trap_exit, true), monitor/2 and link/1, as well as some thinking

Thinking OTP Mistake #4: Thinking obscure comments?shutdown_delay, in supervisor, the documentation do not really apply - When in doubt, source permanent, code % restart is handy, children that and crash helps figuring out when [alien_user_server_sup] we really need to % modules go off the rule gen_manager is a gen_server with a callback module handling code_change messages (here, user_manager) %%---------------------------------------------------------------- %% @doc supervisor init callback. %% -spec(init/1::(any()) -> sup_init()). init(_args) -> UserServerSupSpec = {user_server_sup, % id {user_server_sup, start_link, []}, % init function permanent, % restart children that crash [user_server_sup] % modules }, AlienUserServerSupSpec = {alien_user_server_sup, % id {alien_user_server_sup, start_link, []}, % init function?shutdown_delay, supervisor, }, MailboxServerSupSpec = {mailbox_server_sup, % id {mailbox_server_sup, start_link, []}, % init function As a rule of thumb permanent, Modules % restart should children be a list that with crash one element [Module],?SHUTDOWN_DELAY, where Module supervisor, the callback module, if the [mailbox_server_sup] child process is a % supervisor, modules gen_server or gen_fsm }, UserManagerStartChildSemaphore = {user_manager_start_child_sem {semaphore, start_link, erl -man[{local, supervisor user_manager_start_child_ transient, % restart children that crash?shutdown_delay, worker, [semaphore] % modules }, UserManagerSpec = {user_manager, % id {user_manager, start_link, []}, % init function transient, % restart children that crash?shutdown_delay, worker, [gen_manager, user_manager] % modules }, RestartStrategy = {one_for_one,?max_restarts,?max_restarts_p % Setup snmp probe for semaphore. gen_snmp_agent:set_variable(?snmp_user_manager_start_child_sem

Thinking OTP Mistake #5: Putting everything in a single virtual machine (node) per server - Virtual machines may crash - Code changes can fail and take down the whole node - It s better to separate critical code - even per server if a crashing node can take a huge amount of RAM and make other nodes swap

Foreign code Interfacing with foreign code

Foreign code Six ways to interface foreign code with Erlang: - Linked-In drivers - External drivers - NIFs - os:cmd/1 - C-based distributed node - Java-based distributed node (jinterface) We tried them all and are looking forward to future extensions to the native interface (R15?)

Foreign code Method Linked-in drivers External drivers NIFs os:cmd/1 We use/used for - aspell - kqueue (FreeBSD/MacOS X kqueue binding) - SQLite - ImageMagick - GeoIP - WebKit - uuid - re2 (linear time bound replacement for re) - bzip2 - OpenSSL - Batik (svg rasterizer in Java) C-based distributed node - ruby (we actually bound Rails websites with Erlang at some point) jinterface - OpenNLP

Foreign code Mistake #6: Using linked-in drivers for open source code that could crash/abort E.g.: ImageMagick will abort on bad input - External drivers are more suitable when external library is large, crash-prone or could leak (sometimes, the leak is in the glue ) - Passing pointers is possible but requires some logic, typically binding a pointer to the port

Foreign code Mistake #7: Using linked-in drivers for I/O intensive code E.g.: sqlite - Linked-in driver code is executed within a scheduler thread. Running for too long will starve other processes that will timeout, waiting for messages - Theoretically, we can use async threads (and we do with sqlite). However, enabling async threads (+A) has a huge impact on built-in I/O drivers - Performance could be worse with external drivers

Erlang technologies we love hate Erlang technologies we love

Erlang technologies we love hate dialyzer - part of our compilation cycle - found many bugs, typically inconsistencies between callers and callees - starting with -spec when defining exported funs We wish it would be fixed/improved: - horribly slow - sometimes blind - hard to understand - fails on code_change code - useless warnings that cannot be disabled

Erlang technologies we love hate snmp - makes it really easy to integrate erlang nodes within a monitoring solution (we use nagios and munin) HiPE OTP installed with --enable-native-lib and all our code compiled with +native - helps with CPU-bound work (including dialyzer ) - most patches we submitted were HiPE-related We are also grateful to the authors of: erlsom, mochiweb, nitrogen, zotonic

Thank you! paul@semiocast.com