Write a Foreign Data Wrapper in 15 minutes



Similar documents
How To Use Postgresql With Foreign Data In A Foreign Server In A Multi Threaded Database (Postgres)

Sharding with postgres_fdw

SQL/MED and More. Management of External Data in PostgreSQL and Microsoft SQL Server. Seminar Database Systems

Installing The SysAidTM Server Locally

PostgreSQL as an integrated data analysis platform

IceWarp to IceWarp Server Migration

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

AWS Schema Conversion Tool. User Guide Version 1.0

The Advantages of PostgreSQL

Sisense. Product Highlights.

SyncThru Database Migration

AWS Schema Conversion Tool. User Guide Version 1.0

DBI-Link: 3.0. PgCon Ottawa 2008 Copyright David Fetter 2008 All Rights Reserved

PostGIS Integration Tips

2.3 - Installing the moveon management module - SQL version

Installation Guide for contineo

Intellicus Enterprise Reporting and BI Platform

Practical Cassandra. Vitalii

DBX. SQL database extension for Splunk. Siegfried Puchbauer

Accessing Your Database with JMP 10 JMP Discovery Conference 2012 Brian Corcoran SAS Institute

HP OO 10.X - SiteScope Monitoring Templates

ACCESSING IBM iseries (AS/400) DB2 IN SSIS

TRANSFORM BIG DATA INTO ACTIONABLE INFORMATION

Apache Sentry. Prasad Mujumdar

ibolt V3.2 Release Notes

Architecture and Mode of Operation

TG Web. Technical FAQ

Q&A for Zend Framework Database Access

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package Data Federation Administration Tool Guide

Big Data & Cloud Computing. Faysal Shaarani

Architecture and Mode of Operation

Seamless Web Data Entry for SAS Applications D.J. Penix, Pinnacle Solutions, Indianapolis, IN

Smarter Balanced Reporting (RFP 15) Developer Guide

Security and Control Issues within Relational Databases

Data Access Guide. BusinessObjects 11. Windows and UNIX

Connect to MySQL or Microsoft SQL Server using R

Extracting META information from Interbase/Firebird SQL (INFORMATION_SCHEMA)

Data Migration between Document-Oriented and Relational Databases

MySQL Storage Engines

Using BIRT Analytics Loader

Informatica Data Replication FAQs

Database Extension 1.5 ez Publish Extension Manual

Basics on Geodatabases

CN=Monitor Installation and Configuration v2.0

Various Load Testing Tools

Real-time Data Replication

Technology Foundations. Conan C. Albrecht, Ph.D.

Oracle Application Express MS Access on Steroids

Jet Data Manager 2012 User Guide

Relational Databases for the Business Analyst

Consolidate by Migrating Your Databases to Oracle Database 11g. Fred Louis Enterprise Architect

Data Warehouse Center Administration Guide

The elephant called PostgreSQL

MatriXay Database Vulnerability Scanner V3.0

LISTSERV Maestro 6.0 Installation Manual for Solaris. June 8, 2015 L-Soft Sweden AB lsoft.com

Python. Data bases. Marcin Młotkowski. 8th May, 2013

GoldenGate and ODI - A Perfect Match for Real-Time Data Warehousing

Business Intelligence Tutorial: Introduction to the Data Warehouse Center

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

Using Apache Derby in the real world

Deploying PostgreSQL in a Windows Enterprise

Synchronization Agent Configuration Guide

Agenda. ! Strengths of PostgreSQL. ! Strengths of Hadoop. ! Hadoop Community. ! Use Cases

Release Bulletin Sybase ETL Small Business Edition 4.2

How To Install Powerpoint 6 On A Windows Server With A Powerpoint 2.5 (Powerpoint) And Powerpoint On A Microsoft Powerpoint 4.5 Powerpoint (Powerpoints) And A Powerpoints 2

W16 Data Mining Workshop

Thank you for using AD Bulk Export 4!

Online Backup Guide for the Amazon Cloud: How to Setup your Online Backup Service using Vembu StoreGrid Backup Virtual Appliance on the Amazon Cloud

Intellicyber s Enterprise Integration and Application Tools

Suite. How to Use GrandMaster Suite. Exporting with ODBC

OpenText Actuate Big Data Analytics 5.2

How To Use The Listerv Maestro With A Different Database On A Different System (Oracle) On A New Computer (Orora)

Getting ready for PostgreSQL 9.1

PostgreSQL vs. MySQL vs. Commercial Databases: It's All About What You Need

MySQL Security for Security Audits

Apache Derby Security. Jean T. Anderson

Advanced Tornado TWENTYONE Advanced Tornado Accessing MySQL from Python LAB

NEMUG Feb Create Your Own Web Data Mart with MySQL

Automating System Administration with Perl

SOLUTION BRIEF. JUST THE FAQs: Moving Big Data with Bulk Load.

Migrating MS Access Database to MySQL database: A Process Documentation

Oracle Financial Services Data Integration Hub Foundation Pack Extension for Data Relationship Management Interface

Guide to the MySQL Workbench Migration Wizard: From Microsoft SQL Server to MySQL

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Spectrum Technology Platform. Version 9.0. Enterprise Data Integration Guide

Virto Pivot View for Microsoft SharePoint Release User and Installation Guide

MarkLogic Server. Java Application Developer s Guide. MarkLogic 8 February, Copyright 2015 MarkLogic Corporation. All rights reserved.

CSE 544 Principles of Database Management Systems. Magdalena Balazinska (magda) Spring 2006 Lecture 1 - Class Introduction

CA DLP. Stored Data Integration Guide. Release rd Edition

Intro to Databases. ACM Webmonkeys 2011

WHITE PAPER. Domo Advanced Architecture

Two new DB2 Web Query options expand Microsoft integration As printed in the September 2009 edition of the IBM Systems Magazine

news from Tom Bacon about Monday's lecture

Oracle Data Integrator. Knowledge Modules Reference Guide 10g Release 3 (10.1.3)

Maksym Iaroshenko Co-Founder and Senior Software Engineer at Eltrino. Magento non-mysql implementations

Advanced Object Oriented Database access using PDO. Marcus Börger

Transcription:

Write a Foreign Data Wrapper in 15 minutes

Table des matières Write a Foreign Data Wrapper in 15 minutes...1 1 About me...4 2 Foreign Data Wrappers?...5 3 Goals...5 4 Agenda...5 5 Part 1 - SQL/MED...6 6 MED?...6 7 DBLink and DBI-Link...6 8 In other RDBMS...7 9 Overview...7 10 Part 2 - How To...8 11 Install...8 12 Create server...8 13 Create Foreign Table...8 14 Select...9 15 Errors...9 16 Beware...10 17 Uninstall...10 18 Part 3 : Current List (feb. 2012)...10 19 RDBMS fdw...11 20 NoSQL fdw...11 21 Text fdw...11 22 Web fdw...12 22.1 ldap_fdw...12 22.2 s3_fdw...12 22.3 www_fdw...13 23 Others...14 24 What about pgsql_fdw?...14 25 Full list...14 26 Ideas...15 27 Part 4 : Write a fdw...15 28 Write a FDW in C : the basics...15 29 Problems with that...16 30 Multicorn...16 31 Multicorn is your friend!...16 32 Install multicorn...17 33 ical_fdw.py...17 34 make install...18 35 Plug - 1...18 36 Plug - 2...18 37 Play - 1...18 38 Play - 2...19 39 There's more : Optimize...19 40 There's more : Log...20 2 / 23

41 Conclusions...20 42 In a nutshell...20 43 Limitations...21 44 ETL No More...21 45 Release on pgxn...21 46 Links...22 47 Contact...22 48 KTHXBYE...23 3 / 23

1 About me Damien Clochard COO of DALIBO, French PostgreSQL Company Admin of www.postgresql.fr 4 / 23

2 Foreign Data Wrappers? «One Database to Rule Them All» 3 Goals Know more about SQL/MED Use the existing data wrappers Write your own in 15 minutes 4 Agenda 1. SQL/MED? 2. How to use a FDW 3. Current list 4. How to write a FDW 5 / 23

5 Part 1 - SQL/MED Error: Reference source not found MED? DBLink & DBI-LINK In other RDBMS Overview Why SQL/MED is cool 6 MED? Management of External Data Appeared in SQL:2003 implemented in PostgreSQL 9.1 Also present in IBM DB2 http://www.ibm.com/developerworks/data/library/techarticle/0203haas/0203haas.html 7 DBLink and DBI-Link DBLink Connect to other PostgreSQL databases Contrib module since 8.3 Will be replaced by pgsql_fdw DBI-Link Partial implementation written in PL/PerlU Started in 2004 Now obsolete 6 / 23

More information : http://www.postgresql.org/docs/9.1/static/dblink.html https://github.com/davidfetter/dbi-link 8 In other RDBMS Informix: External Tables (flat files) Oracle: External Tables + DATABASE LINK (odbc) MySQL: FEDERATED Storage Engine (mysql only) MSSQL: Text File Driver (cvs only) Firebird: External Table (cvs only) DB2: Complete SQL/MED implementation 9 Overview 7 / 23

10 Part 2 - How To... Error: Reference source not found Install Create server Create foreign table Select Uninstall 11 Install CREATE EXTENSION file_fdw; 12 Create server CREATE SERVER fileserver FOREIGN DATA WRAPPER file_fdw; 13 Create Foreign Table CREATE FOREIGN TABLE meminfo ( stat text, value text ) SERVER fileserver OPTIONS ( filename '/proc/meminfo', format 'csv', delimiter ':' ); 8 / 23

14 Select SELECT value FROM meminfo WHERE stat = 'MemFree'; value 55024 kb (1 line) 15 Errors CREATE FOREIGN TABLE won't check if the data storage exists/ won't check if the data can be read won't check if the data format is ok But SELECT will return an error 9 / 23

16 Beware There's no cache SELECT will scan the whole file every time To avoid this : CREATE TABLE localtable AS SELECT * FROM foreigntable Which is more or less equal to : COPY localtable FROM '/your/file'; 17 Uninstall DROP FOREIGN TABLE meminfo; DROP SERVER fileserver CASCADE; DROP EXTENSION file_fdw CASCADE; 18 Part 3 : Current List (feb. 2012) SQL Databases Wrappers NoSQL Databases Wrappers File Wrappers Web Wrappers Others 10 / 23

In a nutshell, you can now use various Foreign Data Wrappers (FDW) to connect a PostgreSQL Server to remote data stores. This page is an incomplete list of the Wrappers available right now. Another fdw list can be found at the PGXN website. Please keep in mind that most of these wrappers are not officially supported by the PostgreSQL Global Development Group (PGDG) and that some of these projects are still in Beta version. Use carefully! 19 RDBMS fdw oracle_fdw mysql_fdw sql_lite odbc_fdw 20 NoSQL fdw couchdb_fdw redis_fdw 21 Text fdw file_fdw (delivered with PG 9.1) file_text_array_fdw 11 / 23

22 Web fdw www_fdw twitter_fdw google_fdw rss_fdw imap_fdw s3_fdw 22.1 ldap_fdw Allows PostgreSQL to query an LDAP server and retrieve data from some pre-configured Organizational Unit. CREATE EXTENSION ldap_fdw; CREATE SERVER ldap_my_server_service FOREIGN DATA WRAPPER ldap_fdw OPTIONS ( address 'my_ldap_server', port '389'); CREATE USER MAPPING FOR current_user SERVER ldap_my_server_service OPTIONS (user_dn 'cn=the_ldap_user,dc=example,dc=com', password 'the_ldap_user_password'); CREATE FOREIGN TABLE ldap_people ( uid text, displayname text, mail text ) SERVER ldap_my_server_service OPTIONS (base_dn 'OU=people,DC=example,DC=com'); SELECT * FROM ldap_people WHERE mail = 'user@mail.com'; 22.2 s3_fdw s3_fdw reads files located in Amazon S3 db1=# CREATE EXTENSION s3_fdw; CREATE EXTENSION db1=# CREATE SERVER amazon_s3 FOREIGN DATA WRAPPER s3_fdw; CREATE SERVER db1=# CREATE USER MAPPING FOR CURRENT_USER SERVER amazon_s3 OPTIONS ( accesskey 'your access key id', secretkey 'your secret access key' 12 / 23

); CREATE USER MAPPING db1=# CREATE FOREIGN TABLE log20110901( atime timestamp, method text, elapse int, session text ) SERVER amazon_s3 OPTIONS ( hostname 's3 ap northeast 1.amazonaws.com', bucketname 'umitanuki dbtest', filename 'log20110901.txt', delimiter E'\t' ); CREATE FOREIGN TABLE 22.3 www_fdw www_fdw allows to query different web services:here is example for google search usage: DROP EXTENSION IF EXISTS www_fdw CASCADE; CREATE EXTENSION www_fdw; CREATE SERVER www_fdw_server_google_search FOREIGN DATA WRAPPER www_fdw OPTIONS (uri 'https://ajax.googleapis.com/ajax/services/search/web?v=1.0'); CREATE USER MAPPING FOR current_user SERVER www_fdw_server_google_search; CREATE FOREIGN TABLE www_fdw_google_search ( /* parameters used in request */ q text, /* fields in response */ GsearchResultClass text, unescapedurl text, url text, visibleurl text, cacheurl text, title text, titlenoformatting text, content text ) SERVER www_fdw_server_google_search; postgres=# SELECT url,substring(title,1,25) '...',substring(content,1,25) '...' FROM www_fdw_google_search WHERE q='postgres'; url?column??column? + + http://www.postgresql.org/ <b>postgresql</b>: The wo... Sophisticated opensource... http://www.postgresql.org/download/ <b>postgresql</b>: Downlo... The core of the <b>postgr... http://www.postgresql.org/docs/ <b>postgresql</b>: Docume... There IS a wealth of <b>p... http://en.wikipedia.org/wiki/postgresql <b>postgresql</b> Wikip... <b>postgresql</b>, often... (4 rows) 13 / 23

23 Others git_fdw ldap_fdw process_fdw pgstorm 24 What about pgsql_fdw? Will be an official extension (like file_fdw) Currently under development Probably in 9.2 Until then keep using dblink 25 Full list http://wiki.postgresql.org/wiki/foreign_data_wrappers http://pgxn.org/tag/fdw/ 1 new wrapper every month 14 / 23

26 Ideas XML Excel / OpenDocument Spreadsheet GIS file formats (KML,WKT, ) Images open data websites RRD 27 Part 4 : Write a fdw How to implement in C Multicorn Let's write a wrapper for ICalendar files! 28 Write a FDW in C : the basics 2 functions : handler validator (optionnal) 6 callback routines (called by planner and executor) PlanForeignScan / ExplainForeignScan BeginForeignScan / EndForeignScan IterateForeignScan / ReScanForeignScan 15 / 23

29 Problems with that... This is complex This is not well documented The code is generic What if i don't know C? 30 Multicorn 31 Multicorn is your friend! A simple python interface to write FDW A wrapper over a wrapper Handles all the dirty C code for you You can focus on the extraction and use python's librairies 16 / 23

32 Install multicorn Error: Reference source not found You need postgresql-server-dev and python-dev pgxn install multicorn --testing Sources at github.com/kozea/multicorn.git apt get install python software properties add apt repository ppa:pitti/postgresql apt get update apt get install postgresql 9.1 postgresql server dev 9.1 libpq dev python python dev pythonsetuptools easy_install importlib easy_install pgxnclient then pgxn install multicorn testing or git clone git://github.com/kozea/multicorn.git cd Multicorn make && make install 33 ical_fdw.py from multicorn import ForeignDataWrapper import urllib from icalendar import Calendar, Event class ICalFdw(ForeignDataWrapper): def init (self, options, columns): super(icalfdw, self). init (options, columns) self.url = options.get('url', None) self.columns = columns def execute(self, quals, columns): ical_file = urllib.urlopen(self.url).read() cal = Calendar.from_string(ical_file) for v in cal.walk('vevent'): e = Event(v) line = {} for column_name in self.columns: 17 / 23

line[column_name] = e.decoded(column_name) yield line 34 make install cd Multicorn cp icalfdw.py python/multicorn make & make install 35 Plug - 1 CREATE EXTENSION multicorn; CREATE SERVER multicorn_ical FOREIGN DATA WRAPPER multicorn options (wrapper 'multicorn.icalfdw.icalfdw'); 36 Plug - 2 CREATE FOREIGN TABLE fosdem_events ( dtstart TIMESTAMP, dtend TIMESTAMP, summary TEXT ) server multicorn_ical options ( url 'http://fosdem.org/schedule/ical' ); 37 Play - 1 SELECT count(*) 18 / 23

FROM fosdem_events; count 437 (1 line) 38 Play - 2 SELECT to_char(e.dtend,'hh24:mi') AS end_of_this_talk FROM fosdem_events e WHERE e.summary ILIKE '%foreign data wrapper%'; end_of_this_talk 12:50 (1 line) 39 There's more : Optimize Reduce the amount of data fetched over the network : handle the quals argument a Qual object defines : a field_name an operator a value 19 / 23

40 There's more : Log Error: Reference source not found multicorn.utils has a log_to_postgres function 3 parameters message level hint (optional) log_to_postgres('you MUST set the url', ERROR) 41 Conclusions In a Nutshell Limitations ETL No More Release on pgxn The future Links / Contact 42 In a nutshell Useful for : Easier migrations Data Warehouse Reporting Avoid useless data replication 20 / 23

43 Limitations Read Only Most wrappers are beta or alpha Most are not supported by PGDG Forget about performance Forget about data integrity 44 ETL No More Instead of : EXTRACT - TRANSFORM - LOAD Just : CREATE FOREIGN TABLE - SELECT Use SQL or PL/SQL to manipulate data Transformation code stays in the DB 45 Release on pgxn If you write your own FDW: publish it on PGXN update the wiki page 21 / 23

46 Links This talk : http://dalibo.org/conferences Multicorn : http://multicorn.org Ical FDW : http://moourl.com/icalfdw 47 Contact me : damien@dalibo.info Multicorn creator : Ronan Dunklau rdunklau@gmail.com 22 / 23

48 KTHXBYE 23 / 23