Datatype-generic data migrations



Similar documents
Cloud Haskell. Distributed Programming with. Andres Löh. 14 June 2013, Big Tech Day 6 Copyright 2013 Well-Typed LLP. .Well-Typed

Integration of Hotel Property Management Systems (HPMS) with Global Internet Reservation Systems

This module explains fundamental aspects of Microsoft Dynamics NAV Development Environment.

Towards More Security in Data Exchange

POLYTYPIC PROGRAMMING OR: Programming Language Theory is Helpful

Distribution and Integration Technologies

Online signature API. Terms used in this document. The API in brief. Version 0.20,

>

CSc 372. Comparative Programming Languages. 21 : Haskell Accumulative Recursion. Department of Computer Science University of Arizona

Arithmetic Coding: Introduction

1. Give the 16 bit signed (twos complement) representation of the following decimal numbers, and convert to hexadecimal:

A GP Service for Enterprise Printing. Kevin Shows, Anadarko Petroleum Kirk Kuykendall, Idea Integration 04/20/2011

AURA: A language with authorization and audit

SQL. Short introduction

2013 Ruby on Rails Exploits. CS 558 Allan Wirth

Poor Man's Type Classes

Oct: 50 8 = 6 (r = 2) 6 8 = 0 (r = 6) Writing the remainders in reverse order we get: (50) 10 = (62) 8

Relational Database Basics Review

Type Classes with Functional Dependencies

Functional Programming in C++11

2874CD1EssentialSQL.qxd 6/25/01 3:06 PM Page 1 Essential SQL Copyright 2001 SYBEX, Inc., Alameda, CA

Digital Logic Design. Basics Combinational Circuits Sequential Circuits. Pu-Jen Cheng

Understanding Logic Design

Programming Autodesk PLM 360 Using REST. Doug Redmond Software Engineer, Autodesk

The Needle Programming Language

Stephen Drape. Oxford University Computing Laboratory. with thanks to Jeff Sanders

IoT-Ticket.com. Your Ticket to the Internet of Things and beyond. IoT API

Web Application Firewalls: What the vendors do NOT want you to know SHAKACON III

To Java SE 8, and Beyond (Plan B)

Class Notes CS Creating and Using a Huffman Code. Ref: Weiss, page 433

Overview. Elements of Programming Languages. Advanced constructs. Motivating inner class example

Simple Invoicing Desktop Database with MS Access c 2015 by David W. Gerbing School of Business Administration Portland State University

ATS Revit Family Creation Standards July 23 rd, 2012 Plumbing Fixtures: Faucets Version 1.0

17. November Übung 1 mit Swift. Architektur verteilter Anwendungen. Thorsten Kober Head Solutions ios/os X, SemVox GmbH

SAS Data Set Encryption Options

Practical Generic Programming with OCaml

Number Representation

Qualtrics Single Sign-On Specification

United States Naval Academy Electrical and Computer Engineering Department. EC262 Exam 1

earlier in the semester: The Full adder above adds two bits and the output is at the end. So if we do this eight times, we would have an 8-bit adder.

InvGen: An Efficient Invariant Generator

8 Tutorial: Using ASN.1

Brainshark/Salesforce.com Integration Installation Procedures

Ecma/TC39/2013/NN. 4 th Draft ECMA-XXX. 1 st Edition / July The JSON Data Interchange Format. Reference number ECMA-123:2009

Glossary of Object Oriented Terms

Caml Virtual Machine File & data formats Document version: 1.4

Satellite Control Software (SCS) Mission Data Client Extensibility User Guide

RSA SecurID Software Token 1.0 for Android Administrator s Guide

It has a parameter list Account(String n, double b) in the creation of an instance of this class.

Developing a Web Server Platform with SAPI Support for AJAX RPC using JSON

Topics. Introduction. Java History CS 146. Introduction to Programming and Algorithms Module 1. Module Objectives

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

Creating SOAP and REST Services and Web Clients with Ensemble

The Habit Programming Language: The Revised Preliminary Report

C++ Programming Language

Anatomy of Programming Languages. William R. Cook

International Journal of Advanced Research in Computer Science and Software Engineering

Programming Fundamentals. Lesson 20 Sections

Adding GADTs to OCaml the direct approach

Fun with Phantom Types

A Rational Software Whitepaper

A simple type system with two identity types

Section 1.4 Place Value Systems of Numeration in Other Bases

International Securities Identification Number (ISIN)

Database Migration : An In Depth look!!

The Advantages and Disadvantages of Network Computing Nodes

= = 3 4, Now assume that P (k) is true for some fixed k 2. This means that

Compiler Construction

S4 Classes in 15 pages, more or less

Web Services for Management Perl Library VMware ESX Server 3.5, VMware ESX Server 3i version 3.5, and VMware VirtualCenter 2.5

SQL SELECT Query: Intermediate

Lecture 21: NoSQL III. Monday, April 20, 2015

Semantic Analysis: Types and Type Checking

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. AVRO Tutorial

Verifying security protocols using theorem provers

Data Mining for Business Analytics

Tips on Encoding the Unique Item Identifier (UII) Mark and Building the Concatenated UII

Developing a generic deployable component for single sign-on. Stephen P Vickers The University of Edinburgh

Lab 4: 26 th March Exercise 1: Evolutionary algorithms

Numeral Systems. The number twenty-five can be represented in many ways: Decimal system (base 10): 25 Roman numerals:

Chapter 4: Computer Codes

MATHEMATICAL INDUCTION. Mathematical Induction. This is a powerful method to prove properties of positive integers.

The JSON approach is mainly intended for intranet integration, while the xml version is for machine to machine integration.

Integrating VoltDB with Hadoop

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Appointment Router Salesforce.com Web- to- Lead Integration Guide. Date: January 19, 2011

Colligo Briefcase Enterprise. Administrator s Guide

Proseminar on Semantic Theory Fall 2013 Ling 720. Problem Set on the Formal Preliminaries : Answers and Notes

Mathematical Induction

Xtext Documentation. September 26, 2014

Transcription:

Datatype-generic data migrations IFIP WG 21 meeting #73, Lökeberg, Sweden Andres Löh August 31, 2015 The Haskell Consultants

Motivation Datatypes evolve

Motivation Datatypes evolve Example: data Person = Person { name :: String, address :: String }

Motivation Datatypes evolve Example: data Person = Person { name :: String}

Motivation Datatypes evolve Example: data Person = Person { lastname :: String, firstname :: String }

Motivation Datatypes evolve Example: data Person = Person { lastname :: String, firstname :: String, years :: Int }

Why is it a problem? Within the program itself, it usually is not But programs communicate, and produce external representations of data: binary encodings, JSON, database entries,

Different versions External representations change First version: { "name" : "Aura Löh", "address" : "Regensburg" }

Different versions External representations change First version: { "name" : "Aura Löh", "address" : "Regensburg" } Current version: { "lastname" : "Löh", "firstname" : "Aura", "years" : 2 }

Different versions External representations change First version: { "name" : "Aura Löh", "address" : "Regensburg" } Current version: { "lastname" : "Löh", "firstname" : "Aura", "years" : 2 } Program should be able to cope with both inputs

Available Haskell options safecopy Define all versions as separate Haskell datatypes Define migration functions between the versions Instantiate a class to get a versioned binary decoding

Available Haskell options safecopy Define all versions as separate Haskell datatypes Define migration functions between the versions Instantiate a class to get a versioned binary decoding api-tools Use a DSL to describe the changes between versions Use Template Haskell to derive versioned decoders

Example: api-tools changes version "04" changed record Person field added years :: Int version "03" migration record Person SplitName version "02" changed record Person field removed address // initial version version "01"

Use datatype-genericity

Seems to make sense Migrations apply to different datatypes Serialization and deserialization to various formats are classic examples of datatype-generic programming Different versions of a datatype are usually closely related

Representing types data Person = Person { name :: String, address :: String }

Representing types data Person = Person { name :: String, address :: String } type instance Code Person = [ [String, String]] Rep (Code Person) = SOP I (Code Person) Person

Representing types data Person = Person { name :: String, address :: String } type instance Code Person = [ [String, String]] Rep (Code Person) = SOP I (Code Person) Person type family Code (a :: *) :: [[*]] class Generic a where from :: a -> Rep (Code a) to :: Rep (Code a) -> a

What is Rep? data Person = Person { name :: String, address :: String } type instance Code Person = [ [String, String]] Value of type Person : Person "Aura Löh" "Regensburg" Value of type Rep (Code Person) (modulo syntactic clutter): C 0 ["Aura Löh", "Regensburg"]

Sums of products SOP I xss NS (NP I) xss data NS (f :: k -> *) (xs :: [k]) where Z :: NS f (x : xs) S :: NS f xs -> NS f (x : xs) data NP (f :: k -> *) (xs :: [k]) where Nil :: NP f [] (:*) :: f x -> NP f xs -> NP f (x : xs)

Generic functions class Encode a where encode :: a -> [Bit] decoder :: Decoder a

Generic functions class Encode a where encode :: a -> [Bit] decoder :: Decoder a Defined via induction on the representation: gencode :: (Generic a, All2 Encode (Code a)) => a -> [Bit] gencode = gdecoder :: (Generic a, All2 Encode (Code a)) => Decoder a gdecoder = Yields defaults for the Encode class methods

History of a datatype Person 1 Person 2 Person 3 Person 4

History of a datatype Person 1 Code Person 1 Person 2 Code Person 2 Person 3 Code Person 3 Person 4 Code Person 4

History of a datatype Person 1 Code Person 1 Migration (Code (Person 1 )) (Code (Person 2 )) Person 2 Code Person 2 Migration (Code (Person 2 )) (Code (Person 3 )) Person 3 Code Person 3 Migration (Code (Person 3 )) (Code (Person 4 )) Person 4 Code Person 4 data Migration :: [[*]] -> [[*]] -> * where Migration :: (Rep a -> Rep b) -> Migration a b

History of a datatype Code Person 1 Migration (Code (Person 1 )) (Code (Person 2 )) Code Person 2 Migration (Code (Person 2 )) (Code (Person 3 )) Code Person 3 Migration (Code (Person 3 )) (Code (Person 4 )) Person Code Person data Migration :: [[*]] -> [[*]] -> * where Migration :: (Rep a -> Rep b) -> Migration a b

History of a datatype Code Person 1 Migration (Code (Person 1 )) (Code (Person 2 )) Code Person 2 Migration (Code (Person 2 )) (Code (Person 3 )) Code Person 3 Migration (Code (Person 3 )) (Code (Person 4 )) Person Code Person data Migration :: [[*]] -> [[*]] -> * where Migration :: (Rep a -> Rep b) -> Migration a b data History :: Version -> [[*]] -> * where Initial :: History v c Revision :: () => Migration c c -> History v c -> History v c

Simple migration addconstructor :: Migration c ( [] : c) addconstructor = Migration shift

Simple migration addconstructor :: Migration c ( [] : c) addconstructor = Migration shift Good, but not quite satisfactory: By position rather than name No way to actually give a name to a revision

Include names in codes data Person = Person {name :: String, address :: String} Plain code: type family Code (a :: *) :: [[*]] type instance Code Person = [ [String, String]]

Include names in codes data Person = Person {name :: String, address :: String} Plain code: type family Code (a :: *) :: [[*]] type instance Code Person = [ [String, String]] Code with metadata: type family Code (a :: *) :: [(Symbol, [(Symbol, *)])] type instance Code Person = [ ("Person", [ ("name", String), ("address", String)])] Stripping metadata: type family Simplify (c :: [(Symbol, [(Symbol, *)])]) :: [[*]]

Migrations based on codes with metadata data Migration :: [(Symbol, [(Symbol, *)])] -> [(Symbol, [(Symbol, *)])] -> * where Migration :: (Rep (Simplify a) -> Rep (Simplify b)) -> Migration a b

Migrations based on codes with metadata data Migration :: [(Symbol, [(Symbol, *)])] -> [(Symbol, [(Symbol, *)])] -> * where Migration :: (Rep (Simplify a) -> Rep (Simplify b)) -> Migration a b addfield :: () => Proxy (v :: Version) -> Proxy (d :: Symbol) -- name of constructor -> Proxy (f :: Symbol) -- name of field -> a -- default value -> History v c -> History v (AddField d f c)

Example personhistory :: History "04" (Code Person) personhistory = addfield [pr "04" ] [pr "Person" ] [pr "years" ] (2 :: Int) $ replacefield [pr "03" ] [pr "Person" ] [pr "name" ] [pr ["lastname", "firstname"] ] splitname $ removefield [pr "02" ] [pr "Person" ] [pr "address" ] $ initialrevision [pr "01" ]

Attaching histories to datatypes class (Generic a, ) => HasHistory a where type CurrentRevision a :: Symbol history :: Proxy a -> History (CurrentRevision a) (Code a)

Encoding and decoding based on histories hencode :: (HasHistory a, ) => a -> [Bit] choose latest version from history encode version encode data generically

Encoding and decoding based on histories hencode :: (HasHistory a, ) => a -> [Bit] choose latest version from history encode version encode data generically hdecode :: (HasHistory a, ) => Decoder a decode version choose the corresponding version from history decode data generically for that version apply the remaining migration functions

An annoying detail For hdecode, all types contained in all codes of all revisions must be in the Encode class

An annoying detail For hdecode, all types contained in all codes of all revisions must be in the Encode class This means: put class constraints in History type, index History over all intermediate versions, abstract History over class constraints

Conclusions Current code is proof of concept Implementing the migration steps (eg addfield ) is really ugly and a lot of work But it works and is more safe than other approaches Extends to nested versioning Not tied to a single encoding Efficiency? Future: writing older versions