From documents to data House of Representatives of the States General Who am I My background Library & Information Science (LIS) Specialized in Information Access & Knowledge Organization Systems Department of Information Services / House of Representatives of the States-General Project manager Project Linked Data Project Manager Linked Data 2 Agenda Context Historical note Paper based dissemination E-Parliament From documents to data Project Linked Data Goal: More powerfull dissemination of parliamentary information Automation Oranization Procedure Historical Note Examples 3 4 Documents Documents Parliaments function through the medium of documents (World E-Parliament Report, 2008) Dissemination of parliamentary informatie based on documents But now Form of documents changed: Paper digital New possibilities 5 6
More documents Still lots of paper! 7 Registering the parliamentary process 8 Paper based dissemination 9 Based on documents No details, important information hidden By means of Manually adding metadata to documents as a whole Time consuming procedure Result of search Unordered pile of documents Not what our users are looking for! 10 E-Parlement Use of Information Technology to improve availabililty, accessibility and usability of parliamentary information. ICT & Parliament E-Parliament Central principles: Outward oriented: orientation to the citizen Transparency of the parliamentary process 12
Transparency of the parliamentary process Project Linked Data (1) House of representatives is THE source of parliamentary information for all stakeholders That means, that parliamentayr information is: Available for all stakeholders Up to date Timely Complete Reliable Objective Safe Availability From documents to data Deeper dissemination of parliamentary information Automation of manual input Parliamentary documents: Large quantity Unstructured textual data, but in reality Very structured, contain implicit knowledge Metadata are sparse and on a document level Metadata hidden in structure of documents And also Meaningfull In context Dissemination LOTS OF INFORMATION UNDER THE SURFACE! 13 14 Project Linked Data (2) Parlementary documents: Intrinsic quality Cooperation University of Amsterdam Political Mashup Project Making large quantities of textual data available for large scale automatic quantitative data and content analysis done by scientists from the humanities and social sciences AND, IN PRACTICE Transparency of parliamentary process Of every word spoken in parliament we know When it was spoken By whom In what function Speaking on behalf of which party In which context Who were present during the speech act 15 16 Project Linked Data goal and means 1. MAKE IMPLICIT STRUCTURE OF PARLIAMENTARY DOCUMENTS EXPLICIT 2. TURN IMPLICIT REFERENCES INTO HYPERLINKS, MAKING USE OF PERMANENT URI S (LINKED DATA) How? Textanalitics and XML-, Database en Information Retrieval technology Named entity recognition Normalisation Data mining, Machine Learning Natural Language Processing, Language Models 1. Make implicit structure of parliamentary documents explicit Meeting Topic Speech P Stage-direction (topic)+ (speech stage-direction)+ (p stage direction)+ (#PCDATA stage-direction)* (PCDATA) Meeting (1 day) Topic Stage direction Scene Stage direction Speech Paragraph 17
Structure of Hansards (1) Structure of Hansards (2) Meeting (1 day) Topic Stage direction Scene Stage direction Speech Paragraph 19 2. Turn implicit references into hyperlinks Procedure 21 using permanent identifiers By making use of Needed: 1. Resolver 2. Namespace 3. Internal practice of giving unique names to entities 22 Detailed: Deep dissemination Dissemination based on data in document Persons Parties Organizations Dossiers Controlled vocabulary terms From documents to data PID s in the parliamentary context: Published parliamentary documents Subunits in parliamentary documents Named entities 20 23 Analysis data XML Identification of entities Linking of entities Meaning (URI s, links) Result of search: specific answers to questions Automatic: Manual input is limited Users are looking for specific information, answers 24
Dynamic report of parliamentary activity New Possibilities 25 26 A picture of parliamentary debate Attaquograms Dynamic homepage for MP s 27 28 Social network analysis (2) Social network analysis (1) Normalize the names of the persons Transform all data into GraphML format Computed basic social network statistics for the last six cabinet periods Visualized networks Created an interactive page with all network data summarizing cosubmission of motions, amendments and written questions during the last six governmental periods 29 30
Not just textual data Wrap up Search engine connecting heterogeneous data: Missed a debate? Openkamer.tv From documents to data Deeper dissemination Automation of manual input Unlimited possibilities for visualization and analysis document may be any form, data may be anywhere! 31 32 Thank you! Specialist Information Access & Knowledge Organization Systems House of Representatives of the States General n.aders@tweedekamer.nl Maarten Marx Informatics Institute University of Amsterdam maartenmarx@uva.nl 33