BIG DATA : National data linkage infrastructure. James Boyd



Similar documents
Data linkage for paediatric trauma and health services research

5C R I M I N A L J U S T I C E R E S O U R C E S

COMMINSURE HOME INSURANCE PREMIUM, EXCESS AND DISCOUNT GUIDE.

Report into the Rural, Regional and Remote Areas Lawyers Survey. Prepared by the Law Council of Australia and the Law Institute of Victoria

Health expenditure Australia : analysis by sector

Information Bulletin AN INCREASING INDIGENOUS POPULATION? IMPLICATIONS FOR THE CRIMINAL JUSTICE SYSTEM. Justine Doherty

Economic benefits of closing the gap in Indigenous employment outcomes. Reconciliation Australia

PRINCIPLES FOR ACCESSING AND USING PUBLICLY-FUNDED DATA FOR HEALTH RESEARCH

Mesothelioma in Australia: Incidence (1982 to 2013) and Mortality (1997 to 2012)

Tutorial- Create a cascading drop-down control

Council of Ambulance Authorities

X NSW/ACT X NT X QLD X SA X TAS X VIC X WA

Information Circular

NAMS.PLUS Asset Management Tools for Practitioners

Family Day Care Educators

1. BACKGROUND Accuracy Timeliness Comparability Usability Relevance

THE AUSTRALIAN MESOTHELIOMA REGISTRY (AMR)

AROC. Establishing and Maintaining a National Clinical Registry. Frances Simmonds, AROC Director

As an aged care worker, this incentive is for you to upgrade your qualifications and build your career in aged care.

Motor Vehicle. Claim Form PLEASE RETURN COMPLETED FORM TO YOUR JLT OFFICE:

BOOKED CAR SCHEME GUIDELINES FOR TAXI AND HIRE CAR CONTRACTORS

IF YOU DO NOT ATTEND THE CONFERENCE THE CASE MAY BE SENT TO COURT AND ORDERS MADE IN YOUR ABSENCE.

How To Understand The Benefits Of Big Data

Competency Completion Online System Guide

Professional Indemnity Insurance Proposal Form Lawyers Excess of Loss / Top-Up Insurance

Product Disclosure Statement

Data Governance in-brief

As an aged care worker, this incentive is for you to upgrade your qualifications and build your career in aged care.

National Disability Insurance Scheme (NDIS): Funding the Unfunded Commitment

Managing injury and return to work policy

Deloitte Reverse Mortgage Survey December 2013

A quick guide to Australian discrimination laws

Learning Guide. Broker Guide. Learn. Think. Grow

Summary Report. Department of Innovation, Industry, Science and Research. Industry and Small Business Policy Division

Women s Health Victoria

The Comcare Self-Insurance Option Bruce Watson, Rod McInnes and Mark Hurst

Mutual Recognition. Who can apply? Build better.

11 Primary and community health

2. The costs of natural disasters

REQUEST FOR QUOTE. Review & assessment of National Quality Framework Resource Kit

PUBLIC & PRODUCTS LIABILITY PROPOSAL FORM IMPORTANT INFORMATION: PLEASE READ THE FOLLOWING INFORMATION BEFORE COMPLETING THIS PROPOSAL

REPORT OF 2015 NAPLAN TEST INCIDENTS

Professional Indemnity Insurance Application Form Marsh Accountants Insurance Program

Australian Housing Outlook By Robert Mellor, Managing Director BIS Shrapnel Pty Ltd October 2014

Data Governance Policy. Version October 2015

Professional Indemnity Insurance Proposal Form

Review of parentage laws

Becoming an Electricity Retailer

Feedback on the Inquiry into Serious Injury. Presented to the Road Safety Committee of the Parliament of Victoria. 08 May 2013

1.17 Life expectancy at birth

ACCOUNTANTS PROFESSIONAL INDEMNITY INSURANCE PROPOSAL FORM IMPORTANT FACTS RELATING TO THIS PROPOSAL FORM

A Secure Data Architecture for Telehealth Trial

OAMPS Sports Risk Management

Life insurance how much cover is needed? Fact Sheet - October 2014

BSB41507 Certificate IV in Project Management Information and Enrolment Kit

PROFESSIONAL INDEMNITY INSURANCE PROPOSAL

Marriage, families & separation

HUDSON SALARY GUIDES 2015

Marriage, families & separation

MESOTHELIOMA IN AUSTRALIA INCIDENCE 1982 TO 2008 MORTALITY 1997 TO 2007

MESOTHELIOMA IN AUSTRALIA INCIDENCE 1982 TO 2009 MORTALITY 1997 TO 2011

Marriage, families & separation

Obligations and expectations of Family Day Care educators

Information Management: A common approach

Information Security in Big Data using Encryption and Decryption

Private Health Insurance Australia

WILLIS ED GROUP STUDENT PERSONAL ACCIDENT CLAIM PROCEDURE FOR PARENTS

Transcription:

BIG DATA : National data linkage infrastructure James Boyd

What defines Big Data? Data whose scale, diversity and complexity requires new architecture, techniques, algorithms and analytics to manage it and extract value and hidden knowledge from it

Characteristics of Big Data Volume data volumes exponential increasing over time Variety (Complexity) various formats, types and structures Velocity fast processing of data to ensure it is representative

Administrative Data Collections (Big-ish data?) Administrative data are usually collected by government for some administrative purpose - not primarily for research Life events can be generated across a number of different government areas (Health, Education, Criminal Justice etc.) The databases are often population based, so important population subgroups are not missed.

Data Sharing Ability to share the same data resource with multiple applications or users: Data and information used/reused to inform significant decisions Bring together key elements across Government Enhance the value of information gained from a single source

Data Sharing - Challenges Governance arrangements for sharing: Custodian requirements Control of data release Users agreements Confidentiality, Privacy and Security: Protecting confidentiality and Privacy Ensure data security throughout

Data Linkage - Overview To establish efficiently and accurately which records belong to the same individual. Personal identifying information makes Data Linkage Possible: Family Name First Name(s) Date of birth Postcode

Matching Techniques Exact matching can lead to inexact results e.g. requiring exact match on a number of fields e.g. surname, first initial, date of birth, sex - expect at least 10-15% errors because of discrepancies Probability matching more accurate Quantifies levels of agreement & disagreement 2% true links missed

How Does Linkage Work? Bring together the pairs of records to be compared Quantify the relative probability that the two records belong to the same person Make the linkage decision

Development of linkage methodology Matching Challenges?

Population Health Research Network PHRN: Collaborative Network developing data linkage capability within and between Australian jurisdictions 6 state/territory linkage units - 2 existing (WA, NSW/ACT) + 4 new (Qld, Vic, Tas & SA/NT) Program Office in Perth providing coordination and national client services National linkage (Centre for Data Linkage at Curtin University and AIHW Commonwealth Data Integration) Secure Unified Research Environment (SURE) and secure Data Delivery (Sax Institute)

Centre for Data Linkage (CDL) Building national data linkage infrastructure Facilitate linkages that span across state/territory borders Link these datasets with research datasets Secure linkage of datasets Research & Development into data linkage methods

Why develop a new linkage system? Address weaknesses & gaps in existing systems (complexity, scale, performance, functionality, administration) Provide an enterprise-grade platform that is reliable, easy to maintain and operate, with auditing capabilities Automate functions that traditionally require manual intervention Tackle emerging problems e.g. privacy-preserving linkage

What differentiates the NLS? Large data volume (linkage, data management, output, scalability) Manages multiple linkage & extraction projects Manages new, amended and deleted records (open file handling) Handles diverse linkage & DC needs e.g. enduring vs project linkage; researcher wishing to link their own data; DCs imposing restrictions on linkage Secure & auditable

Managing change over time How to handle change over time ( New, Amends & Deletes )? NLS handles new, amended & deleted Records (Open file handling) NLS handles deletion of Data collections, Data Providers and Linkage Projects NLS differentiates between end-dated & deleted records

Any point in time referencing NLS stores full history of records and groups Groups are dynamic entities Linkage structure can be recreated for any record at any (previous) point in time

Graph of Matching Group

PHRN: Proof of Concept Project Hospital-related mortality CDL created linkage keys using demographic data from WA, NSW, SA and QLD hospital morbidity and mortality data collections Linkage of around 45,000,000 event records Linkage Processing completed within 10 days Over 2 billion pair relationships

Contact Details James Boyd Centre for Data Linkage Curtin University j.boyd@curtin.edu.au