Dedalus: Datalog in Time and Space



Similar documents
Cloud Programming: From Doom and Gloom to BOOM and Bloom

Communication System Design Projects

Middleware and Distributed Systems. System Models. Dr. Martin v. Löwis. Freitag, 14. Oktober 11

Programming Without a Call Stack: Event-driven Architectures

Distributed systems Lecture 6: Elec3ons, consensus, and distributed transac3ons. Dr Robert N. M. Watson

Do Relational Databases Belong in the Cloud? Michael Stiefel

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Active database systems. Triggers. Triggers. Active database systems.

Amr El Abbadi. Computer Science, UC Santa Barbara

Agenda. Some Examples from Yahoo! Hadoop. Some Examples from Yahoo! Crawling. Cloud (data) management Ahmed Ali-Eldin. First part: Second part:

Distributed Storage Systems

DRAFT. 1 Proposed System. 1.1 Abstract

Doing Business with DARPA

Chapter 4 Cloud Computing Applications and Paradigms. Cloud Computing: Theory and Practice. 1

WSO2 Message Broker. Scalable persistent Messaging System

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Benchmark Study on Distributed XML Filtering Using Hadoop Distribution Environment. Sanjay Kulhari, Jian Wen UC Riverside

Modular Communication Infrastructure Design with Quality of Service

Signature-Free Asynchronous Binary Byzantine Consensus with t < n/3, O(n 2 ) Messages, and O(1) Expected Time

MapReduce Online. Tyson Condie, Neil Conway, Peter Alvaro, Joseph Hellerstein, Khaled Elmeleegy, Russell Sears. Neeraj Ganapathy

Special Relativity and the Problem of Database Scalability

BIG DATA AND ANALYTICS

RADOS: A Scalable, Reliable Storage Service for Petabyte- scale Storage Clusters

Software Defined Networking and Network Virtualization

Application Architectures

Bigdata High Availability (HA) Architecture

Cassandra. References:

Internet Bandwidth Congestion and Optimization

Blockchain, Throughput, and Big Data Trent McConaghy

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Project 3 and Software-Defined Networking (SDN)

Introduction to Basics of Communication Protocol

BASIC CONCEPTS AND RELATED WORK

Chapter 7: Termination Detection

Event-based middleware services

Heisenbugs and Bohrbugs: Why are they different?

Software Defined Networking and Network Virtualization

High Throughput Computing on P2P Networks. Carlos Pérez Miguel

Cloud Management: Knowing is Half The Battle

Simple Network Management Protocol

Section of DBMS Selection & Evaluation Questionnaire

CSE-E5430 Scalable Cloud Computing Lecture 11

The Import & Export of Data from a Database

Dr Markus Hagenbuchner CSCI319. Distributed Systems

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

The Case for a Hybrid P2P Search Infrastructure

Enhancing Map-Reduce Job Execution On Geo-Distributed Data Across Datacenters

Software and the Concurrency Revolution

Relational Calculus. Module 3, Lecture 2. Database Management Systems, R. Ramakrishnan 1

MuleSoft Blueprint: Load Balancing Mule for Scalability and Availability

Praktikum Wissenschaftliches Rechnen (Performance-optimized optimized Programming)

Towards Lightweight Logging and Replay of Embedded, Distributed Systems

Software Engineering Reference Framework

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

CORBA and object oriented middleware. Introduction

CompuScholar, Inc. Alignment to Utah's Computer Programming II Standards

Cloud Computing and Advanced Relationship Analytics

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

First Midterm for ECE374 02/25/15 Solution!!

Load Distribution in Large Scale Network Monitoring Infrastructures

Serial Communications / Protocol in AirTest Products

Hosting Transaction Based Applications on Cloud

Jairson Vitorino. PhD Thesis, CIn-UFPE February Supervisor: Prof. Jacques Robin. Ontologies Reasoning Components Agents Simulations

Chapter 6: distributed systems

The Service Revolution software engineering without programming languages

The ConTract Model. Helmut Wächter, Andreas Reuter. November 9, 1999

Snapshots in Hadoop Distributed File System

Introduction to Computer Networks

The Clean programming language. Group 25, Jingui Li, Daren Tuzi

fakultät für informatik informatik 12 technische universität dortmund Data flow models Peter Marwedel Informatik 12 TU Dortmund Germany

6PANview: A Network Monitoring System for the Internet of Things

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

MAPREDUCE Programming Model

Building Heavy Load Messaging System

DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING. Carlos de Alfonso Andrés García Vicente Hernández

Implementing Declarative Overlays

How To Know If Is Reliable

Transcription:

Dedalus: Datalog in Time and Space Peter Alvaro, Ras Bodík, Neil Conway, Joe Hellerstein, David Maier (PSU), Bill Marczak, and Rusty Sears (Yahoo! Research) 2010 Berkeley OSQ Retreat

Dedalus Dedalus is a declaramve language for distributed programming Grounded in our experiences using declaramve languages to build distributed systems DeclaraMve Networking (2003-2008) The BOOM Project (2008- Present)

Outline 1. Context and MoMvaMon DeclaraMve Networking DeclaraMve Systems: BOOM A taste of Overlog 2. Dedalus: Datalog in Time and Space 3. Future DirecMons and Open Problems

DeclaraMve Networking Networking is about moving data from one locamon to another Can we view networking as a distributed data management problem? E.g., can we express a roumng protocol as a distributed query in a declaramve language? Yes: transport protocols, roumng protocols, sensor networks, DHTs, replicamon policies, distributed snapshots, consensus protocols,... Typically 10x reducmon in code size B.T. Loo, T. Condie, M. Garofalakis, D.E. Gay, J.M. Hellerstein, P. ManiaMs, R. Ramakrishnan, T. Roscoe, I. Stoica. Declara've Networking. CACM, 2009

DeclaraMve Systems Focus has turned from protocols toward distributed systems Larger programs More complex algorithms BOOM: Berkeley Orders Of Magnitude OOM more scale, OOM less code Goal: A complete cloud compumng stack built using declaramve languages Could we build Google in 10 kloc?

Overlog: Distributed Datalog Datalog: a recursive query language from the deducmve database community Defined over a stamc database Add state update, distributed queries (communicamon)

Datalog Example Rule Body (conjuncmon of terms) Rule Head path(x, Y, C) :- link(x, Y, C); path(x, Z, C1 + C2) :- link(x, Y, C1), TransiMve Closure path(y, Z, C2); mincost(x, Z, min<c>) :- path(x, Z, C);

Overlog Example path(@x, Y, C) :- link(@x, Y, C); path(@x, Z, C1 + C2) :- link(@x, Y, C1), path(@y, Z, C2); Distributed join mincost(@x, Z, min<c>) :- path(@x, Z, C);

Overlog Timestep Model

Outline 1. Context and MoMvaMon DeclaraMve Networking DeclaraMve Systems: BOOM A taste of Overlog 2. Dedalus: Datalog in Time and Space 3. Future DirecMons and Open Problems

Dedalus Datalog = The Good Stuff Precise semanmcs, established techniques for opmmizamon and evaluamon In Overlog, the Hard Stuff happens between Mme steps State update Asynchronous messaging Can we talk about the Hard Stuff with logic?

State Update sequence(a, Val + 1) :- sequence(a, Val), event(a); How do we interpret this? Datalog: infinite database Overlog: runmme deletes old version of tuple Overlog: ugly, outside of logic, ambiguous SemanMcs defined by the implementamon Hence, difficult to express common panerns Queues, sequencing Order doesn t maner... except when it does!

Asynchronous Messaging Logical interpretamon unclear: p(@a, B) :- q(@b, A);

Asynchronous Messaging Logical interpretamon unclear: p(@a, B) :- q(@b, A); Overlog @ notamon describes space Upon reflecmon, 'me is more fundamental Model failure with arbitrary delay

Dedalus: Datalog in Time (1) Deduc?ve rule: (Pure Datalog) p(a, B) :- q(a, B); (2) Induc?ve rule: (Constraint across next Mmestep) p(a, B)@next :- q(a, B); (3) Async rule: (Constraint across arbitrary Mmesteps) p(a, B)@async:- q(a, B);

Dedalus: Datalog in Time (1) Deduc?ve rule: (Pure Datalog) All terms in body have same Mme p(a, B, S) :- q(a, B, T), T = S; (2) Induc?ve rule: (Constraint across next Mmestep) p(a, B, S) :- q(a, B, T), successor(t, S); (3) Async rule: (Constraint across arbitrary Mmesteps) p(a, B, S) :- q(a, B, T),?me(S), choose((a, B, T), (S));

State Update in Dedalus p(a, B)@next :- p(a, B), nomn p_del(a, B); Example Trace: p(1, 2)@101; p(1, 3)@102; p_del(1, 3)@300; Time p(1, 2) p(1, 3) p_del(1, 3) 101 102... 300 301

Sequences in Dedalus sequence(a, Val + 1)@next :- sequence(a, Val), event(a); sequence(a, Val)@next :- sequence(a, Val), nomn event(a);

Asynchrony in Dedalus Unreliable Broadcast in Dedalus: sbcast(#target, Sender, Message)@async :- new_message(#sender, Message), members(#sender, Target); More samsfactory logical interpretamon Can build Lamport clocks, reliable broadcast, etc. What about space? Space is the unit of atomic deducmon w/o parmal failure

Asynchrony in Dedalus Unreliable Broadcast in Dedalus: Include sender s local Mme sbcast(#target, Sender, T, Message)@async :- new_message(#sender, Message)@T, members(#sender, Target)@T; More samsfactory logical interpretamon Can build Lamport clocks, reliable broadcast, etc. What about space? Space is the unit of atomic deducmon w/o parmal failure

Dedalus Summary Logical, model- theoremc semanmcs for two key features of distributed systems 1. Mutable state 2. Asynchronous communicamon All facts are transient Persistence and state update are explicit Has been successful in clarifying the semanmcs of our programs

Outline 1. Context and MoMvaMon DeclaraMve Networking DeclaraMve Systems: BOOM A taste of Overlog 2. Dedalus: Datalog in Time and Space 3. Future DirecMons and Open Problems

Big Picture Agenda 1. Language Overlog: concise code Dedalus: precise semanmcs C4: efficient execumon (new language runmme) Bloom: friendly syntax, mainstream appeal 2. BOOM Project Build more systems using logic (e.g., Cassandra) Move up the stack? (Business logic, GUIs,...)

VerificaMon of Dedalus programs Premises: Program expressed as a set of logical implicamons All asynchrony/non- determinism is explicit Close to the specificamon but smll executable Conclusion: easier verificamon? Programmer does (some) of the abstracmon for us Can we integrate formal verificamon into the development process?

Network- Oriented OpMmizaMon TradiMonal compiler opmmizamon is node- oriented The big wins are in network- oriented opmmizamons Given program for n nodes, execute using m nodes Given $100, what is the best cluster configuramon? AutomaMcally colocate code and data Co- opmmize applicamon logic and network protocols E.g., if program transimons are commutamve, consensus is cheaper As cloud compumng environments become more complex and unpredictable, automamc opmmizamon will be crucial

QuesMons? Thank you! hnp://boom.cs.berkeley.edu IniMal PublicaMons: BOOM Analy8cs: EuroSys 10, Alvaro et al. Paxos in Overlog: NetDB 09, Alvaro et al. Dedalus: UCB TR #2009-173, Alvaro et al.