Functorial Data Migration. Categorical Informatics, Inc. info@catinf.com. December 1, 2015. C i



Similar documents
Relational Foundations for Functorial Data Migration

FUNCTORIAL DATA MIGRATION

Using Temporary Tables to Improve Performance for SQL Data Services

Data Integration and Exchange. L. Libkin 1 Data Integration and Exchange

From Databases to Natural Language: The Unusual Direction

Database Design. Database Design I: The Entity-Relationship Model. Entity Type (con t) Chapter 4. Entity: an object that is involved in the enterprise

SPARQL: Un Lenguaje de Consulta para la Web

WORKSHOP ON TOPOLOGY AND ABSTRACT ALGEBRA FOR BIOMEDICINE

Number of hours in the semester L Ex. Lab. Projects SEMESTER I 1. Economy Philosophy Mathematical Analysis Exam

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

ISU Department of Mathematics. Graduate Examination Policies and Procedures

Data exchange. L. Libkin 1 Data Integration and Exchange

Part A: Data Definition Language (DDL) Schema and Catalog CREAT TABLE. Referential Triggered Actions. CSC 742 Database Management Systems

Data Exchange: Semantics and Query Answering

Schema Mappings and Data Exchange

Left-Handed Completeness

Foundations of Query Languages

Relational Databases

Chapter 9 Joining Data from Multiple Tables. Oracle 10g: SQL

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

Parametric Domain-theoretic models of Linear Abadi & Plotkin Logic

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

Data Modeling using XML. Murali Mani, WPI Antonio Badia, University of Louisville Oct 13, 2003

MOC 20461C: Querying Microsoft SQL Server. Course Overview

Chapter 5: Overview of Query Processing

Introducing Formal Methods. Software Engineering and Formal Methods

The composition of Mappings in a Nautural Interface

Creating an RDF Graph from a Relational Database Using SPARQL

MySQL for Beginners Ed 3

CHAPTER 4: BUSINESS ANALYTICS

Oracle Database 10g Express

SQL DDL. DBS Database Systems Designing Relational Databases. Inclusion Constraints. Key Constraints

3. Relational Model and Relational Algebra

Tackling The Challenges of Big Data. Tackling The Challenges of Big Data Big Data Systems. Security is a Negative Goal. Nickolai Zeldovich

Formalization of the CRM: Initial Thoughts

1 File Processing Systems

SUBQUERIES AND VIEWS. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 6

G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs

Introduction to SQL C H A P T E R3. Exercises

NOTES ON CATEGORIES AND FUNCTORS

CHAPTER 5: BUSINESS ANALYTICS

Relational Algebra. Basic Operations Algebra of Bags

INTRODUCTORY COURSES IN CALCULUS, STATISTICS, AND COMPUTER SCIENCE

IBM DB2 XML support. How to Configure the IBM DB2 Support in oxygen

Using OpenMath Servers for Distributing Mathematical Computations

Database Systems. Lecture Handout 1. Dr Paolo Guagliardo. University of Edinburgh. 21 September 2015

CMU - SCS / Database Applications Spring 2013, C. Faloutsos Homework 1: E.R. + Formal Q.L. Deadline: 1:30pm on Tuesday, 2/5/2013

Graham Kemp (telephone , room 6475 EDIT) The examiner will visit the exam room at 15:00 and 17:00.

Explainable Security for Relational Databases

Lecture 6. SQL, Logical DB Design

5 Directed acyclic graphs

A CONSTRUCTION OF THE UNIVERSAL COVER AS A FIBER BUNDLE


= = 3 4, Now assume that P (k) is true for some fixed k 2. This means that

Answer Key. UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division

XML Data Integration

LiTH, Tekniska högskolan vid Linköpings universitet 1(7) IDA, Institutionen för datavetenskap Juha Takkinen

Theory of Relational Database Design and Normalization

Database Design Overview. Conceptual Design ER Model. Entities and Entity Sets. Entity Set Representation. Keys

A Logic-Based Approach to Cloud Computing

A Workbench for Prototyping XML Data Exchange (extended abstract)

There are five fields or columns, with names and types as shown above.

OWL based XML Data Integration

Databases Model the Real World. The Entity- Relationship Model. Conceptual Design. Steps in Database Design. ER Model Basics. ER Model Basics (Contd.

The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

Database Design Patterns. Winter Lecture 24

CS & Applied Mathematics Dual Degree Curriculum Content

SQLMutation: A tool to generate mutants of SQL database queries

A NOTE ON TRIVIAL FIBRATIONS

CSC 742 Database Management Systems

The Recovery of a Schema Mapping: Bringing Exchanged Data Back

Datavetenskapligt Program (kandidat) Computer Science Programme (master)

Translating between XML and Relational Databases using XML Schema and Automed

Applying Model Management to Classical Meta Data Problems

Database Management System

XML with Incomplete Information

Fibrations and universal view updatability

RDF y SPARQL: Dos componentes básicos para la Web de datos

XML and Data Integration

POSTECH SUMMER SCHOOL 2013 LECTURE 4 INTRODUCTION TO THE TRACE FORMULA

Dedekind s forgotten axiom and why we should teach it (and why we shouldn t teach mathematical induction in our calculus classes)

Object-Relational Query Processing

5.1 Database Schema Schema Generation in SQL

CURVES WHOSE SECANT DEGREE IS ONE IN POSITIVE CHARACTERISTIC. 1. Introduction

Leveraging the Power of SOLR with SPARK. Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015

Chapter 10. Functional Dependencies and Normalization for Relational Databases. Copyright 2007 Ramez Elmasri and Shamkant B.

Big Data Analytics. Rasoul Karimi

Data Models and Database Management Systems (DBMSs) Dr. Philip Cannata

Probability Concepts Probability Distributions. Margaret Priest Nokuthaba Sibanda

Theory of Relational Database Design and Normalization

ADVANCED SCHOOL OF SYSTEMS AND DATA STUDIES (ASSDAS) PROGRAM: CTech in Computer Science

Software Modeling and Verification

CS 377 Database Systems. Database Design Theory and Normalization. Li Xiong Department of Mathematics and Computer Science Emory University

ETL-EXTRACT, TRANSFORM & LOAD TESTING

Computer UMass Hanna M. Wallach

HTSQL is a comprehensive navigational query language for relational databases.

Relational Algebra. Query Languages Review. Operators. Select (σ), Project (π), Union ( ), Difference (-), Join: Natural (*) and Theta ( )

Relational Databases for Querying XML Documents: Limitations and Opportunities. Outline. Motivation and Problem Definition Querying XML using a RDBMS

Transcription:

Functorial Data Migration Categorical Informatics, Inc. info@catinf.com December 1, 2015 C i

Outline The Functorial Data Model (FDM) The FDM is based on category theory, which was designed to migrate theorems from one area of mathematics to another. Functorial Data Migration Now, researchers at MIT use category theory to migrate data from one computer system to another. The Functorial Query Language (FQL) tool The FDM research has culminated in a prototype ETL software tool, FQL, available at catinf.com. Categorical Informatics (C i ) Because the FQL tool is based on a principled theoretical foundation, it gives the best possible answer to many ETL problems, and is being commercialized by C i. 2 / 1

Category theory Knowledge of category theory is not required to understand this talk, functorial data migration, or use the FQL tool. But for completeness we include a brief description of category theory. A category is an algebraic structure similar to a group, ring, or field. A category is a multi-sorted monoid and an algebra of functions. A functor is a homomorphism of categories. In the FDM, schemas, instances, schema mappings, and even data migration operations are expressed as categories and functors. Category theory has informed the design of: Functional programming languages (Haskell, ML, Scala) Query languages (nested relational calculus, LINQ) Proof assistants (Coq, Agda, HoTT) Mathematics itself (algebraic topology, set theory, model theory) 3 / 1

Category theory A category C consists of a set of objects for all objects X, Y a set CpX, Y q of arrows for all objects X an arrow id P CpX, Xq for all objects X, Y, Z a function : CpY, Zq ˆ CpX, Y q Ñ CpX, Zq such that f id id and id f f and pf gq h f pg hq A functor F : C Ñ D is a function taking objects in C to objects in D and arrows f : X Ñ Y in C to arrows F pfq: F pxq Ñ F py q in D such that F pidq id and F pf gq F pfq F pgq. A category presentation C consists of a set of nodes for all nodes X, Y a set CpX, Y q of edges a set of path equations A functor presentation F : C Ñ D is a function taking nodes in C to nodes in D and edges f : X Ñ Y in C to paths F pfq: F pxq Ñ F py q in D such that C $ p q implies D $ F ppq F pqq. 4 / 1

The Functorial Data Model manager Emp works secretary Dept last first Dom name Emp ID mgr works first last 101 103 q10 Al Akin 102 102 x02 Bob Bo 103 103 q10 Carl Cork Emp.manager.works Emp.works Dept.secretary.works Dept Dept ID sec name q10 102 CS x02 101 Math Dom ID Al Akin Bob Bo Carl Cork CS Math 5 / 1

Attributes Omit Dom table, and draw edges Ñ f Dom as f : manager Emp works secretary Dept last first Dom name manager Emp works secretary Dept first last name 6 / 1

The Functorial Data Model manager Emp works secretary Dept first last name Emp.manager.works Emp.works Dept.secretary.works Dept Emp ID mgr works first last 101 103 q10 Al Akin 102 102 x02 Bob Bo 103 103 q10 Carl Cork Dept ID sec name q10 102 CS x02 101 Math 7 / 1

Functorial Data Migration A functor F : S Ñ T is a constraint-respecting mapping: nodespsq Ñ nodespt q edgespsq Ñ pathspt q and it induces three adjoint data migration functors: F : T -inst Ñ S-inst (like project) S F T I F piq : IF Set ΠF : S-inst Ñ T -inst (like join) F % Π F Σ F : S-inst Ñ T -inst (like outer disjoint union then quotient) Σ F % F 8 / 1

(Project) Name Name Salary N1 N2 F ÝÝÝÑ N Salary Age Age N1 ID Name Salary 1 Alice $100 2 Bob $250 3 Sue $300 N2 ID Age 4 20 5 20 6 30 F ÐÝÝ N ID Name Salary Age a Alice $100 20 b Bob $250 20 c Sue $300 30 9 / 1

Π (Join) Name Name Salary N1 N2 F ÝÝÝÑ N Salary Age Age N1 ID Name Salary 1 Alice $100 2 Bob $250 3 Sue $300 N2 ID Age 4 20 5 20 6 30 Π F ÝÝÑ N ID Name Salary Age a Alice $100 20 b Alice $100 20 c Alice $100 30 d Bob $250 20 e Bob $250 20 f Bob $250 30 g Sue $300 20 h Sue $300 20 i Sue $300 30 10 / 1

Σ (Union) Name Name Salary N1 N2 F ÝÝÝÑ N Salary Age Age N1 ID Name Salary 1 Alice $100 2 Bob $250 3 Sue $300 N2 ID Age 4 20 5 20 6 30 Σ F ÝÝÑ N ID Name Salary Age a Alice $100 null 1 b Bob $250 null 2 c Sue $300 null 3 d null 4 null 5 20 e null 6 null 7 20 f null 8 null 9 30 11 / 1

Foreign keys Name Name Salary N1 f N2 F ÝÝÝÑ N Salary Age Age N1 ID Name Salary f 1 Alice $100 4 2 Bob $250 5 3 Sue $300 6 N2 ID Age 4 20 5 20 6 30 F ÐÝÝ Π F,Σ F ÝÝÝÝÑ N ID Name Salary Age a Alice $100 20 b Bob $250 20 c Sue $300 30 12 / 1

Expressivity of Functorial Data Migration - Example mgr Emp F ÝÝÝÑ SelfMgr F will copy SelfMgr into Emp, and put the identity into mgr. Π F will migrate into SelfMgr those Emps who are their own mgr. Σ F will migrate into SelfMgr representatives of the management groups of Emp, i.e. equivalence classes of Emps modulo the equivalence relation generated by mgr. 13 / 1

Pivot (Instance ô Schema) CS q10 101 name first works last Al Akin Math name x02 works mgr works 102 103 first mgr first last Bob last Carl Bo Cork mgr Emp ID mgr works first last 101 103 q10 Al Akin 102 102 x02 Bob Bo 103 103 q10 Carl Cork Dept ID name q10 CS x02 Math 14 / 1

Positives of the functorial data model The category of categories is bi-cartesian closed (model of STLC). Schemas support 0, 1, `, ˆ,p. For each category C, the category C-inst is a topos (model of HOL). Instances support 0, 1, `, ˆ,p and @, D, ^, _,, Ñ, J, K. Data integrity constraints (path equations) are built-in to schemas. In progress: more expressive constraints ( EDs ) in schemas. Data migrations transform entire instances. Easy to pivot. Σ has better semantics than TGD-only systems (e.g., Clio). 15 / 1

FQL - A Functorial Query Language FQL is an open-source, graphical IDE available at catinf.com. It translates data migrations of the form Σ F Π G H into SQL and executes via JDBC whenever possible. Otherwise, FQL executes the migration directly. FQL also includes a (partial) translator SQL Ñ FQL, as well as RDF/JSON input/output. Some FQL queries can be written using built-in SELECT-FROM-WHERE syntax. Demo 16 / 1

Conclusion There are deep connections between the FDM and other data models, including relational, RDF, and XML. MIT and C i have had initial success using FQL on a data integration scenario identified by the National Institute of Standards and Technology (NIST) and are looking for academic collaborators, customers, test cases, and employees. Visit catinf.com for more information. See categoricaldata.net and appliedcategorytheory.org for other interesting categorical projects. 17 / 1