PostgreSQL as an integrated data analysis platform

Similar documents
PostGIS Integration Tips

Geodatabase Programming with SQL

Pennsylvania Geospatial Data Sharing Standards (PGDSS) V 2.5

Programming Database lectures for mathema

Write a Foreign Data Wrapper in 15 minutes

Adam Rauch Partner, LabKey Software Extending LabKey Server Part 1: Retrieving and Presenting Data

Institute of Computational Modeling SB RAS

Data Integration for ArcGIS Users Data Interoperability. Charmel Menzel, ESRI Don Murray, Safe Software

PostgreSQL extension s development

Enterprise GIS Solutions to GIS Data Dissemination

Supported DBMS platforms DB2. Informix. Enterprise ArcSDE Technology. Oracle. GIS data. GIS clients DB2. SQL Server. Enterprise Geodatabase 9.

How to Design and Create Your Own Custom Ext Rep

GeoPackage, The Shapefile Of The Future

Perl & NoSQL Focus on MongoDB. Jean-Marie Gouarné jmgdoc@cpan.org

How To Use Postgresql With Foreign Data In A Foreign Server In A Multi Threaded Database (Postgres)

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package Data Federation Administration Tool Guide

database abstraction layer database abstraction layers in PHP Lukas Smith BackendMedia

Chapter 6: Data Acquisition Methods, Procedures, and Issues

Spectrum Technology Platform. Version 9.0. Enterprise Data Integration Guide

Introduction to Using PostGIS Training Workbook Last Updated 18 June 2014

Web Services for Management Perl Library VMware ESX Server 3.5, VMware ESX Server 3i version 3.5, and VMware VirtualCenter 2.5

Institute of Natural Resources Departament of General Geology and Land use planning Work with a MAPS

Setting up the Oracle Warehouse Builder Project. Topics. Overview. Purpose

OGRS Lab «Practical introduction to OrbisGIS 2.1» 9th, July 2009 Ecole Centrale of Nantes

GIS Databases With focused on ArcSDE

Oracle Financial Services Data Integration Hub Foundation Pack Extension for Data Relationship Management Interface

SAP Business Objects Data Services Setup Guide

Spectrum Technology Platform. Version 9.0. Spectrum Spatial Administration Guide

Lab 2 : Basic File Server. Introduction

D61830GC30. MySQL for Developers. Summary. Introduction. Prerequisites. At Course completion After completing this course, students will be able to:

Using SAS ACCESS to retrieve and store data in relational database management systems

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

Introduction to the Mapbender OWS Security Proxy

GetLibraryUserOrderList

An Introduction to Open Source Geospatial Tools

Investigating Hadoop for Large Spatiotemporal Processing Tasks

Primitive type: GEOMETRY: matches SDO_GEOMETRY and GEOMETRY_COLUMN types.

ArcGIS. Server. A Complete and Integrated Server GIS

ALERT installation setup

MongoDB. An introduction and performance analysis. Seminar Thesis

Whisler 1 A Graphical User Interface and Database Management System for Documenting Glacial Landmarks

Building a Spatial Database in PostgreSQL

Accessing Your Database with JMP 10 JMP Discovery Conference 2012 Brian Corcoran SAS Institute

Mastering Advanced GeoNetwork

Evaluating Metadata access

Docker Java Application with Solr, Mongo, & Cassandra: Design, Deployment, Service Discovery, and Management in Production

Lab 2: PostgreSQL Tutorial II: Command Line

Resco CRM Server Guide. How to integrate Resco CRM with other back-end systems using web services

Panorama Necto. Load Balancing Installation Guide. (12.5 and above)

Extracting META information from Interbase/Firebird SQL (INFORMATION_SCHEMA)

Quick start. A project with SpagoBI 3.x

Weaving Stored Procedures into Java at Zalando

f...-. I enterprise Amazon SimpIeDB Developer Guide Scale your application's database on the cloud using Amazon SimpIeDB Prabhakar Chaganti Rich Helms

Intellicyber s Enterprise Integration and Application Tools

The manual can be used as a practical tutorial and aims to teach users how to:

Chapter 5 Spatial is not Special: Managing Tracking Data in a Spatial Database

GeoKettle: A powerful open source spatial ETL tool

Introduction to PostGIS

LDAPCON Sébastien Bahloul

Open Source GIS The Future?

WSO2 Business Process Server Clustering Guide for 3.2.0

A Brief Introduction to MySQL

TEST AUTOMATION FRAMEWORK

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

Basics on Geodatabases

SQL/MED and More. Management of External Data in PostgreSQL and Microsoft SQL Server. Seminar Database Systems

OSM-in-a-box A Ready-Made Highly Configurable Map Server

DEVELOPING AND IMPLEMENTING MULTIUSER, FULLY RELATIONAL GIS DATABASE FOR DESKTOP SYSTEMS USING OPEN SOURCE TECHNOLOGIES

Cabot Consulting Oracle Solutions. The Benefits of this Approach. Infrastructure Requirements

Why Zalando trusts in PostgreSQL

Analysis of the Free GIS Software Applications in respect to INSPIRE services and OGC standards

Automating System Administration with Perl

Pervasive Data Integrator. Oracle CRM On Demand Connector Guide

Table of contents. Reverse-engineers a database to Grails domain classes.

Cookbook 23 September 2013 GIS Analysis Part 1 - A GIS is NOT a Map!

Architecting the Future of Big Data

The release notes provide details of enhancements and features in Cloudera ODBC Driver for Impala , as well as the version history.

ishare in the Cloud Service Definition v5.0

OpenGeo Suite for Linux Release 3.0

General principles and architecture of Adlib and Adlib API. Petra Otten Manager Customer Support

Bruce Momjian June, Postgres Plus Technical Overview

Sharding with postgres_fdw

Developer Tutorial Version 1. 0 February 2015

Standards based spatial data management, GIS and web mapping. Spatial data management, analysis & sharing the free & easy way!

Administering your PostgreSQL Geodatabase

WG-Edit: a new gvsig extension for the street cadastre management. Fulvia Gambalonga Flavio Pompermaier

MS ACCESS DATABASE DATA TYPES

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

Various Load Testing Tools

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data

Abstract. Introduction

sqlite driver manual

Database Extension 1.5 ez Publish Extension Manual

Intro to Databases. ACM Webmonkeys 2011

PostgreSQL Functions By Example

Introduction to Spatial Data Management with Postgis. Spatial Data Management

Transcription:

PostgreSQL as an integrated data analysis platform FOSS4G.be 2015 Oslandia Team Licence GNU FDL SARL / 29. octobre 2015 / www.oslandia.com / infos@oslandia.com

Let 's try to Think Different (about PostgreSQL) Database is not only a place to store data (and use basic SQL to access it)

Let 's try to Think Different (about PostgreSQL) Database is not only a place to store data (and use basic SQL to access it) PostgreSQL is far more than an enhanced filesystem PostgreSQL by design is extensible

#1 Data Integration External WS Data API Flat files PostgreSQL Other databases

#1 Data Integration External WS Data API Flat files Other databases ETL PostgreSQL Common answer is «Use an ETL»

#1 Data Integration External WS Data API Flat files FDW PostgreSQL Other databases Alternate answer is «Use PostgreSQL Foreign Data Wrapper»

https://wiki.postgresql.org/wiki/foreign_data_wrappers SQL Management of External Data (SQL/MED) added to the SQL standard Handling access to remote objects from SQL databases Available in PostgreSQL since 9.3

#1 Data integration : OGR FDW Shapefile OGR FDW PostgreSQL WFS Server (TinyOWS)

https://wiki.postgresql.org/wiki/foreign_data_wrappers SQL Management of External Data (SQL/MED) added to the SQL standard Handling access to remote objects from SQL databases Available in PostgreSQL since 9.3 ~50 native connectors already available (And more throught Multicorn extension)

https://github.com/pramsey/pgsql-ogr-fdw Install OGR FDW git clone https://github.com/pramsey/pgsql-ogr-fdw.git cd pgsql-ogr-fdw make sudo make install

Define a FDW wrapper CREATE EXTENSION postgis; CREATE EXTENSION ogr_fdw; CREATE SERVER shapefile_france FOREIGN DATA WRAPPER ogr_fdw OPTIONS ( datasource '/tmp/fdw_ogr/france.shp', format 'ESRI Shapefile' ); Retrieve shapefile attributes list (metadata) ogrinfo -al -so /tmp/fdw_ogr/france.shp

Create Foreign table CREATE SCHEMA shp; CREATE FOREIGN TABLE shp.france ( id_geofla integer, geom geometry, code_chf_l varchar, nom_chf_l varchar, x_chf_lieu varchar, y_chf_lieu varchar, x_centroid integer, y_centroid integer, nom_dept varchar, code_reg varchar, nom_region varchar, code_dept varchar ) SERVER shapefile_france OPTIONS (layer 'france'); Check it SELECT id_geofla, ST_AsEWKT(ST_Centroid(geom)) AS geom FROM shp.france LIMIT 1 ;

Create VIEW from Foreign Table https://github.com/pramsey/pgsql-ogr-fdw/issues/11 CREATE OR REPLACE VIEW shp.france_wfs AS SELECT id_geofla, ST_Multi(ST_SetSRID(geom,27572))::geometry(MultiPolygon,27572) AS geom, code_dept, nom_dept FROM france;

TinyOWS configuration <tinyows online_resource="http://127.0.0.1/cgi-bin/tinyows" schema_dir="/usr/local/share/tinyows/schema/" estimated_extent="1" display_bbox="0"> <pg host="127.0.0.1" user="pggis" password="***" dbname="db" /> <metadata name="tinyows WFS Server" title="tinyows Server OGR FDW Service" /> <layer retrievable="1" writable="0" ns_prefix="tows" ns_uri="http://www.tinyows.org/" schema="shp" name="france_wfs" title="france" /> </tinyows> Check it wget -O out http://127.0.0.1/cgi-bin/tinyows? SERVICE=WFS&REQUEST=GetFeature&Typename=tows:france_wfs

#1 Data integration : Oracle FDW Oracle Spatial Oracle FDW http://pgxn.org/dist/oracle_fdw/

CREATE EXTENSION postgres_fdw; CREATE EXTENSION oracle_fdw; CREATE SERVER orcl FOREIGN DATA WRAPPER oracle_fdw OPTIONS (dbserver '${ORACLE_URI}'); Oracle user Mapping GRANT USAGE ON FOREIGN SERVER orcl TO ${PGUSER}; CREATE USER orcl_map FOR ${PGUSER} SERVER orcl OPTIONS (user '${ORAUSER}', password '${ORAPWD}');

CREATE SCHEMA fdw; CREATE FOREIGN TABLE fdw.foo ( id double precision, label varchar, last_update date, geom geometry(point, 2154), ) SERVER orcl OPTIONS (schema '${ORAUSER}', table 'FOO');

CREATE SCHEMA mat; CREATE MATERIALIZED VIEW mat.foo AS SELECT * FROM fdw.foo; CREATE UNIQUE INDEX ON mat.foo(id); CREATE INDEX ON mat.foo USING GIST(geom); REFRESH MATERIALIZED VIEW CONCURRENTLY mat.foo;

#2 Cleaning Data: Validity SELECT count(*) FROM my_schema.my_table WHERE NOT ST_IsValid(geom);

#2 Cleaning Data: Validity SELECT count(*) FROM my_schema.my_table WHERE NOT ST_IsValid(geom);

UPDATE my_schema."my_table" SET geom=st_collectionextract(st_makevalid(geom), 3) WHERE ST_IsValidReason(geom)!= 'Valid Geometry' AND (GeometryType(geom) = 'POLYGON' OR GeometryType(geom) = 'MULTIPOLYGON');

UPDATE my_schema."my_table" SET geom=st_collectionextract(st_makevalid(geom), 3) WHERE ST_IsValidReason(geom)!= 'Valid Geometry' AND (GeometryType(geom) = 'POLYGON' OR GeometryType(geom) = 'MULTIPOLYGON'); Still to deal with : Null Surface Empty Single Point Line infitesimal ending point translation

#2 Cleaning Data: Reclassify SELECT id, input, CASE WHEN input = 'yes' THEN 1::boolean WHEN input = 'no' THEN 0::boolean ELSE NULL END reclass FROM data; id input reclass ----+---------+--------- 1 yes t 2 no f 3 NC

#3 Data Processing PostgreSQL Treatment script Common answer is : «Develop an external script»

#3 Data Processing PostgreSQL Call an Extension Alternate answer is : «Hey it's already there!»

Since PostgreSQL 9.1 : EXTENSION handling Using existing extension is that easy, UUID generation example : foo=# CREATE EXTENSION "uuid-ossp"; CREATE EXTENSION foo=# SELECT uuid_generate_v4(); 6953879c-3aae-4d42-a470-6d430305e173

Lot of PostgreSQL extensions available (really) To display those already available on your server : SELECT * FROM pg_available_extensions ;

An PostgreSQL extension repository: http://pgxn.org/

Some useful PostgreSQL extensions (among others) pg_trgm Use trigram matching to evaluate string similarity (for natural language texts search) Fuzzystrmatch Alternates well known string similarity functions (levenshtein, soundex...) Unnacent Deal with accentuated text xml2 Xpath functions facilities (use libxml2) Pgcrypto Cryptographic functions Hstore Storing and manipulation of key/value pairs inside a single PostgreSQL value

#3 Data Processing PostgreSQL Treatment script Alternate answer is : «Put your scripts inside PostgreSQL»

#3 Data Processing : PL/Python Using existing Python Library from PostgreSQL Throught SQL function

#3 Data Processing : PL/Python Using existing Python Library from PostgreSQL Call throught SQL function An example with GeoPy, Installation : sudo apt-get install postgresql-plpython-9.4 python3-geopy createdb db createlang plpython3u db psql db -c "CREATE EXTENSION postgis" Register on GeoNames Enable your account to use the free WebService

Pl/Python basic Geocoder function CREATE OR REPLACE FUNCTION geoname(toponym text) RETURNS geometry(point,4326) AS $$ from geopy import geocoders g = geocoders.geonames(username="your_username") try: place, (lat, lng) = g.geocode(toponym) result = plpy.execute( "SELECT 'SRID=4326;POINT(%s %s)'::geometry(point, 4326) AS geom" % (lng, lat), 1) return result[0]["geom"] except: plpy.warning('geocoding Error') return None $$ LANGUAGE plpython3u;

Check it : psql db -c "SELECT ST_AsGeoJSON(geoname('New York, NY 10022'))" {"type":"point","coordinates":[-74.00597,40.71427]} http://www.openstreetmap.org/?mlon=-74.00597&mlat=40.71427&zoom=12

#3 Data Processing : GeoSpatial statistic correlation WITH lyon AS (SELECT ST_Transform(geoname('Lyon'), 2154) geom), tc AS (SELECT DISTINCT ON (tc.gid) tc.gid, v.gid, tc.geom, ST_Distance(tc.geom, v.geom) dist FROM data.tc tc, data.velov v ORDER BY tc.gid, tc.geom <-> v.geom) SELECT corr(dist, ST_Distance(l.geom, tc.geom)) FROM lyon l, tc;

#3 Data Processing : GeoSpatial statistic correlation WITH lyon AS (SELECT ST_Transform(geoname('Lyon'), 2154) geom), tc AS (SELECT DISTINCT ON (tc.gid) tc.gid, v.gid, tc.geom, ST_Distance(tc.geom, v.geom) dist FROM data.tc tc, data.velov v ORDER BY tc.gid, tc.geom <-> v.geom) SELECT corr(dist, ST_Distance(l.geom, tc.geom)) FROM lyon l, tc; corr ------------------- 0.951847587972223

#3 Data Processing : GeoSpatial statistic correlation

#3 Data Processing : Same, but via PL/R createlang plr DATABASE CREATE OR REPLACE FUNCTION r_corr(a float[], b float[]) RETURNS float AS $$ return (cor(a, b)) $$ language plr;

WITH lyon AS (SELECT ST_Transform(geoname('Lyon'), 2154) geom), tc AS (SELECT DISTINCT ON (tc.gid) tc.gid, ST_Distance(tc.geom, v.geom) dist, tc.geom FROM data.tc tc, data.velov v ORDER BY tc.gid, tc.geom <-> v.geom) SELECT r_corr(array(select dist FROM lyon l, tc ORDER BY tc.gid), ); ARRAY(SELECT(ST_Distance(l.geom, tc.geom)) FROM lyon l, tc ORDER BY tc.gid) r_corr ------------------- 0.951847587972213

#To Go Further Write his own PostgreSQL module in C (if needed) Write your own FDW (with Multicorn or in C) Use Pl/R to use R advanced statistics Machine Learning

#Conclusion PostgreSQL behaves like an extensible and integrated Framework Allow you to keep all data in the same place (modern) SQL acting as a glue language

www.oslandia.com Thanks!