National critical geo infrastructure runs on open source database PostGIS TU Delft Geomatics Open Guest Lecture March 21st 2011 Thijs Brentjens
Who am I? TU Delft MSc Geodetic Engineering (2004) Thesis: Web Feature Services GIS Software engineer Freelancer 2007 Member OpenGeoGroep Open standards Open source
Contents National critical geo infrastructure runs on open source database PostGIS
Contents Geo infrastructure PostGIS in geo infrastructure: Usage on PostGIS Case: PDOK BAG: Postgis and authentic register Extras
Geo infrastructure Spatial information is crucial in modern society, for example:...
Geo infrastructure Spatial information is crucial in modern society, for example: Spatial planning Environmental issues Registrations / public authorities Agriculture Navigation --> To make modern information society work (and fun)
How could we efficiently and effectively provide spatial data for this?
Geo infrastructure Need to re-use, share up-to-date data Enabled by a geo information infrastructure http://www.flickr.com/photos/20013727@n02/
Geo infrastructure Some current driving programmes in NL: From a bit earlier (and still running): Authentic registers INSPIRE PDOK WION, Spatial planning (RO Online),... Web developments, mobile
Geo infrastructure There is a need for geodata for critical applications From mixed sources Data is more and more available An infrastructure to facilitate this is evolving
What is such a geo infrastructure in practice? (in terms of technical components)
Geo infrastructure Basically: Spatial datasets Offered by standards based webservices Described in (and searchable through) metadata Consumed by client applications Web Desktop Mobile
Geo infrastructure Simplified architecture: Clients Webservices Metadata Spatial datasets Metadata
Geo infrastructure 100s services already publicly available Services used in many applications Cross-organization: starting
PostGIS in geo infrastructure
PostGIS PostGIS adds support for geographic objects to the PostgreSQL objectrelational database. http://postgis.refractions.net/ Spatial spatial Able to analyze data types, spatial functions, indexing for PostgresSQL store, query, manipulate and geospatial data (and more)
Applications using PostGIS Clients Webservices Metadata Spatial datasets Metadata
Applications using PostGIS Data store for webservices: UMN Mapserver Geoserver Deegree ESRI ArcGIS Server Intergraph GeoMedia WebMap...others...
Applications using PostGIS Desktop: Qgis Udig OpenJUMP Intergraph GeoMedia... others..
Applications using PostGIS PostGIS often used for: Storage Retrieval (querying) Some data processing Also: On-the-fly reprojection Specialist GIS / geometric analysis Geometric validation...
PDOK and PostGIS
Projects: PDOK Who, why and what: project backgrounds How Postgis is used Next challenges
Projects: PDOK Publieke Dienstverlening op de Kaart (in English Public Service on the Map ) Organisations joint forces the Dutch Cadastre the Ministry of Economic Affairs, Agriculture and Innovation the Ministry of Infrastructure and the Environment TNO, an independent research organisation
Projects: PDOK Serve a core set of geographic data to other governments Because of: the need to reduce government expenditure to make national geographic data services widely accessible to each other and (possibly) society in order to improve public services to address requirements set forth in INSPIRE Very important contribution national geo infrastructure
Projects: PDOK Central and decentral services
Projects: PDOK Example of datasets: Authentic register for topographic data (BasisRegistratie Topografie (BRT))
Projects: PDOK Top10NL
Projects: PDOK Administrative borders Natura2000 Geology Addresses: ACN, later Authentic register (BAG) Transport networks Currently ~10 new datasets being added Many more from PDOK partners planned, this year and beyond
Projects: PDOK Offer data-delivery services For many users From several data providers High level requirements: High performance High availability (7x24h) Emergency services
Projects: PDOK Guiding principles central services: components based proven solutions / best practices use Open Source software components account for future growth open standards existing software components Keep It Simple principle: only design for things that are needed in the phase of the implementation.
Projects: PDOK Central service components
Projects: PDOK Why Postgis for central services? Amongst others: PostGIS is able to do what is needed Proven: many examples, also big infrastructures IGN for example Scalable. Technically and license-wise. Open source software (OSS) Other software supports PostGIS well Other dbmsses would have been possible too, but PostGIS the best choice
PostGIS usage in PDOK
PostGIS usage in PDOK Storage & loading data Querying & optimizations Users (autorization) Scalability & deployment
PostGIS usage in PDOK Storage currently (March 2011): Vector data around 25 GB vector data in PDOK central services Example: Top10NL = 6Gb Growing Schemas: per dataset, public only for generic objects (tables geometry_columns, functions) Chosen for one geom per table
PostGIS usage in PDOK Import: use unique table for import, suffix _imp Load data: check on valid geometry result in valid -column Use views to publish to web View Valid Table_imp Valid geometries only Column aliasses Data Source
PostGIS usage in PDOK Querying: what kind of queries?
PostGIS usage in PDOK Queries mostly: create a map Many spatial intersects (bbox / polygon). Sometimes more: administrative constraints (classification), scale dependency (e.g. roads classfication) Reproject coordinates on-the-fly
PostGIS usage in PDOK SELECT "gid",encode(st_asbinary(st_force_2d("the_geom ")),'base64') as "the_geom" FROM "grenzen"."cbs_wijken" WHERE "the_geom" && ST_GeomFromText( 'POLYGON ((-113908 197172, -113908 632427, 396528 632427, 396528 197172, -113908 197172))', 28992)
PostGIS usage in PDOK Tuning for querying: Indexing: geometry is essential! primary key (default indexed by PostGIS) foreign keys (PDOK: not many now) indices for known searches For now sufficient, because webservices are using more resources than database
PostGIS usage in PDOK PostGIS spatial indexing used: GiST-index, Generalized Search Tree to speed up searches on irregular data structures Indexes break up data into: "things to one side" "things which overlap" "things which are inside"
PostGIS usage in PDOK Two advantages over R-Tree indexes: "null safe" Only store the "important" part in an index. --> spatial objects: the bounding box. Why? GIS objects larger than 8K cause R-Tree indexes to fail in the process of being built. create index geoname_geom_idx on geoname using gist (the_geom); http://postgis.refractions.net/docs/ch04.html#id2638705
PostGIS usage in PDOK Geometry Relationship Functions, include implicit bounding box overlap operators exceptions of ST_Disjoint and ST_Relate very fast searches / processing http://postgis.refractions.net/docs/ch04.html#id2638955
PostGIS usage in PDOK User roles Default: only postgres ( admin ) Added: owner of all tables Read access only Also for more specific usage, depending on requirements
PostGIS usage in PDOK Deployment & scalability Simplified production view (currently) Webservices Webservices Server 1 Server n Webservices Farm All data Server db
PostGIS usage in PDOK Deployment using development street Many servers involved Webservices Webservices OK Development Webservices OK Test Webservices OK Acceptance Production Data import Data Sources
PostGIS usage in PDOK Upcoming challenges database in PDOK: More datasets --> consequences? More updates --> consequences? Temporal data & store history? Solutions will depend on requirements
PostGIS usage in PDOK Some options: more database servers & replication? split up databases? --> more database instances More tuning for performance?
BAG: Example Authentic Register
BAG: Example Authentic Register What is the BAG? Data model BAG in PostGIS Model & database Temporal aspects
What is the BAG? Authentic Register All official buildings and addresses of NL Municipalities provide & maintain data Centralized services for access / delivery (XML) Currently being implemented Mandatory to use by this summer for all governments
Data model Example: 1 building Many residences With each an address In a street In a city
Data Model Data model for delivery / clients Places Streets Addresses Addressable objects: Residences Site (for e.g. caravans) Berths (for ships / boats) Buildings
BAG in PostGIS ~ 20 million objects with geom ~ 10 million address references Using BAG Extract tool: Each type a table Relations stored in database --> ~ 1-to-1 translation of model Views for retrieval
BAG in PostGIS Example, Verblijfsobject / residence : 1 main table, several tables for references Identification Geometry (polygon) Simple administrative data
BAG in PostGIS Temporal: start-date (begindatum), enddate (einddatum) to store history & reconstruct Relations / references to others: Main address (1) --> in table for references Secondary addresses (0-N) --> in table for references Building (1-N) --> table Buildings
BAG in PostGIS Views for data access Verblijfsobjectactueel / Current residences : reconstructs main address (using references) Filter on time: now Other views could be defined, as needed
BAG in PostGIS Filter on time: now SELECT <verblijfsobject.attributes> FROM verblijfsobject WHERE verblijfsobject.begindatum <= 'now'::text::date AND verblijfsobject.einddatum >= 'now'::text::date
BAG in PostGIS Demo: http://www.brentjensgeoict.nl/appinaday_rd.html
Extras
Extras PostGIS (Extra) advanced features & functions Easy export: AsKML(), AsGML() Linear Referencing Routing: PGRouting GIS analyses using database Geography type (calculations over a sphere, e.g. distance)
Extras Calculate height profiles, using: Linear Referencing Height contours Geography type (calculations over a sphere, e.g. Distance)
Extras Route:
Extras Contours:
Extras Calculate distances: Intersect of route and contours Distance calculation over line Using geography type (coordinates over sphere, lat/lon) Linear referencing: walk aling a line and use distance from start (or something else) as reference
Extras SELECT road_heights.gid,..., ST_line_locate_point(road_geom, road_heights_geom)*st_length(geography(road_geom)) as distancefromstart, road_heights.height, road_heights_geom FROM ( SELECT pb_etappe4.gid,..., contours.height as height, (ST_Dump(ST_Intersection(contours.the_geom, pb_etappe4.the_geom))).geom As road_heights_geom, pb_etappe4.the_geom as road_geom FROM contours INNER JOIN pb_etappe4 ON ST_Intersects(contours.way, pb_etappe4.the_geom) ) As road_heights order by distancefromstart;
Extras Result:
Extras Distance:
Extras Height:
Extras Distance and height graph:
Extras: performance Performance tips CLUSTERing on geometry indices, to physically reorder the data rows in the index order. Speedup look ups of data on disk -- first, set the geom to not being null alter table planet_osm_line alter the_geom set not null; -- second, cluster on the index-name (of the geom-column) cluster planet_osm_line_index on planet_osm_line;
Extras: performance Use and tune tablespaces for particular usage (dynamic datasets vs. static data) Tablespaces define locations in the file system where the files representing database objects can be stored. Different file systems for storage thanks to Agustin Matilla Sanz (Geodan, PDOK)
Extras PDOK: Why open source? 1. Requirements mean many servers and processing power needed. Proprietary software is often licensed by number of cores/cpu s which are used by the software simultaneously. A highly scalable system may therefore be confronted with high license costs.
Extras PDOK: Why open source? (continued) 2.Open source software components will not provide all functionality which is required by PDOK, but this will also be the case with proprietary software. In case of open source there are more means to exert influence on the directions of future developments.
Applications using PostGIS Dutch examples (not all infrastructure): Ruimtelijkeplannen.nl national database & web service for all spatial planning PDOK Many datasets Some authentic registers with geometry
Applications using PostGIS Waterschapshuis Central Geo facility Kadaster INSPIRE RVOB: real estate of government Police (vtspn) WION decentral systems: provinces Drenthe, Gelderland, Limburg
Applications using PostGIS Geo applications in municipalities: Several, e.g. 5 in Land van Cuijk, Maasdonk, Vlaardingen, Maarssen, Barneveld, Arnhem,... Groningen Seaports... and many more
Extras Infrastructure is the basic physical and organizational structures needed for the operation of a society or enterprise,[1] or the services and facilities necessary for an economy to function.[2] The term typically refers to the technical structures that support a society, such as roads, water supply, sewers, power grids, telecommunications, and so forth
References http://postgis.refractions.net/docs/ch04.html#id2638705 http://bag.vrom.nl http://www.postgresql.org/docs/8.4/static/manage-ag-tablespac