Small Steps Towards Big Data Ric Clarke, Australian Bureau of Statistics



Similar documents
Big Data, Official Statistics and Some Initiatives by the. Australian Bureau of Statistics

Monitoring Overview with a Focus on Land Use Sustainability Metrics

SatelliteRemoteSensing for Precision Agriculture

What the Hell is Big Data?

Sensor Devices and Sensor Network Applications for the Smart Grid/Smart Cities. Dr. William Kao

Precision Farming Technology Systems and Federal Crop Insurance

The Challenges of Geospatial Analytics in the Era of Big Data

2. Issues using administrative data for statistical purposes

AGCO 4205 River Green Parkway Duluth, GA USA Matt Rushing VP, Advanced Technology Solutions Product Line

UN Global Working Group (GWG) on Big Data for Official Statistics. Presented by: Gemma Van Halderen

Precision Farming and the Future of Crop Production

Internet of Things World Forum Chicago

Land Use/Land Cover Map of the Central Facility of ARM in the Southern Great Plains Site Using DOE s Multi-Spectral Thermal Imager Satellite Images

Is big data the new oil fuelling development?

Big Data for Official Statistics The 2030 Agenda for Sustainable Development

COMP9321 Web Application Engineering

SAMPLE MIDTERM QUESTIONS

Cultivating Agricultural Information Management System Using GIS Technology

U.S. Geological Survey Earth Resources Operation Systems (EROS) Data Center

Geospatial intelligence and data fusion techniques for sustainable development problems

WHAT IS GIS - AN INRODUCTION

Applications of Deep Learning to the GEOINT mission. June 2015

Interactive Visual Big Data Analytics for Large Area Farm Biosecurity Monitoring: i-ekbase System

Horizontal IoT Application Development using Semantic Web Technologies

INTERACTIVE CONTENT AND DYNAMIC PUBLISHING A VITAL PART OF AN NSO S OUTPUT AND COMMUNICATION STRATEGY

Big Data and Analytics: Challenges and Opportunities

Resolutions of Remote Sensing

Big Data Governance Certification Self-Study Kit Bundle

Big Data: Challenges in Agriculture. Big Data Summit, November 2014 Moorea Brega: Agronomic Modeling Lead The Climate Corporation

Digital image processing

Introduction to Pattern Recognition

THE SCIENCE THE FUTURE OF CANADIAN CANOLA: APPLY THE SCIENCE OF AGRONOMICS TO MAXIMIZE GENETIC POTENTIAL.

PRECISION TECHNOLOGIES AND SERVICES FOR A COMPLETE SOLUTION

ENVIRONMENTAL MONITORING Vol. I - Remote Sensing (Satellite) System Technologies - Michael A. Okoye and Greg T. Koeln

NSF Workshop: High Priority Research Areas on Integrated Sensor, Control and Platform Modeling for Smart Manufacturing

Industrial Challenges for Content-Based Image Retrieval

Data Management for the North American Carbon Program

HOW TO DO A SMART DATA PROJECT

Predictive Maintenance for Effective Asset Management

Geospatial Software Solutions for the Environment and Natural Resources

ESA Earth Observation Big Data R&D Past, Present, & Future Activities

Databases for 3D Data Management: From Point Cloud to City Model

Health Informatics and Artificial Intelligence: the next big thing in health/aged care

Crop Drought Stress Monitoring by Remote Sensing (DROSMON) Overview. Werner Schneider

RESOLUTION MERGE OF 1: SCALE AERIAL PHOTOGRAPHS WITH LANDSAT 7 ETM IMAGERY

Making Integrated Campaigns Work: How a Search Marketing Mindset Can Drive the ROI of Display Advertising

Principles and Practices of Data Integration

The USGS Landsat Big Data Challenge

CLOUD. Enabling your location intelligence capability. Introducing PSMA Cloud

The Impact of Big Data on Social Research David Rhind Sharon Witherspoon

Satellite and ground-based remote sensing for rapid seismic vulnerability assessment M. Wieland, M. Pittore

Technology Implications of an Instrumented Planet presented at IFIP WG 10.4 Workshop on Challenges and Directions in Dependability

Big data and Earth observation New challenges in remote sensing images interpretation

WHAT DOES BIG DATA MEAN FOR OFFICIAL STATISTICS?

The Global Digital Revolution and Canadian Agriculture and Mining

.FOR. Forest inventory and monitoring quality

Digital Agriculture: Leveraging Technology and Information into Profitable Decisions

Michigan Tech Research Institute Wetland Mitigation Site Suitability Tool

Extraction of Satellite Image using Particle Swarm Optimization

High speed 3D capture for Configuration Management DOE SBIR Phase II Paul Banks

Agenda. Big Data & Hadoop ViPR HDFS Pivotal Big Data Suite & ViPR HDFS ViON Customer Feedback #EMCVIPR

Ambiata.com. Personalisation with Predictive Analytics Dr Rami Mukhtar National ICT Australia May 2013

Agricultural Data and Insurance

M2M Communications and Internet of Things for Smart Cities. Soumya Kanti Datta Mobile Communications Dept.

Big Data R&D Initiative

Open Source UAS Software Toolkits. Keith Fieldhouse Technical Lead, Kitware Inc.

The use of Big Data for statistics

Selecting the appropriate band combination for an RGB image using Landsat imagery

The premier software for extracting information from geospatial imagery.

Lake Monitoring in Wisconsin using Satellite Remote Sensing

2009 CAP Grant Kickoff USGS, Reston, VA May 21, 2009

Blueprints and feasibility studies for Enterprise IoT (Part Two of Three)

Oracle Big Data Spatial and Graph

The Australian Public Service Big Data Strategy

MODULE 5. MODULE 5: Spatial Data Discovery and Access.

Data Management Framework for the North American Carbon Program

Transcription:

Small Steps Towards Big Data Ric Clarke, Australian Bureau of Statistics ECB Workshop on Using Big Data, 7-8 April 2014 1

Agenda Review some basic Big Data concepts Describe the Big Data opportunity for official statistics Outline concerns about using Big Data for official statistics Introduce a framework for statistical inference from Big Data Provide a snapshot of current Big Data initiatives in the ABS ECB Workshop on Using Big Data, 7-8 April 2014 2

Fundamental shifts Innovation revolution The Internet of Everything A mobile population Ubiquitous connectivity The Millennial Generation Knowledge circulation Service orientation 3

4 The ABS view of Big Data Rich data sets of such size, complexity and volatility that it is not possible to leverage their business value with existing data capture, storage, processing, and analysis capabilities Administrative Records Commercial Transactions Sensor Data Behaviour Metrics Online Opinion PIT Records Credit Card Transactions Satellite Imagery Search Engine Queries Social Media Comments Medical Records Scanner Transactions Ground Sensor Data Web Pages Views and Navigation Bank Records Online Purchases Location Data Media Subscriptions Twitter Feeds ECB Workshop on Using Big Data, 7-8 April 2014 4

Key aspects of bigness Multisource Multiconnected Size Complexit y Volatility Multipurpos e Multistructured Multidimensional ECB Workshop on Using Big Data, 7-8 April 2014 5

Big Data, Big Opportunities Creating sample frames or registers Providing data for a subgroup of a population Providing data for some attributes of a population Enabling data imputation, editing and confrontation Enabling data linking and fusion Replacing traditional survey collection entirely Producing new statistical products Improving statistical operations ECB Workshop on Using Big Data, 7-8 April 2014 6

Big Data, Big Challenges Business benefit Privacy and public trust Methodological soundness Technological feasibility Data acquisition ECB Workshop on Using Big Data, 7-8 April 2014 7

Framework for statistical inference Target population U on which statistical inferences are to be made Example: agricultural land parcels Big Data population U B included in the Big Data source, and assume U B U Example: agricultural land parcels captured in satellite sensor imagery Measurement M U of the target population U that is of interest Example: crop type associated with an agricultural land parcel Proxy measurements Y B (or Y U ) available on the population U B (or U) Consider Y B to be a sample (random or otherwise) from Y U Example: ground cover reflectance in selected wavelengths Transformation process T that turns Y U into M U Example: classification model that assigns a crop type to the reflectance measurement of a land parcel Sampling process I that determines the selection of Y B from Y U Usually unknown and requires detailed contextual knowledge to model Censoring process R that renders part of Y U unavailable Example: missing imagery data due to bad weather ECB Workshop on Using Big Data, 7-8 April 2014 8

Framework for statistical inference For finite population inferences, we are interested in predicting g(m U ) given the observed Y B, where g is some function Assume that the probability density function f(y U ;θ) is known, where the parameter θ has known prior distribution f(θ) Assume also that the pdf f(m U Y U ;ϕ), as is the prior distribution f(ϕ) of the parameter ϕ Using a Bayesian approach, we want to predict the posterior distribution f(m U Y B, I, R) In the paper, we show that f(m U Y B, I, R) f(m U Y B ) provided that two ignorability conditions are satisfied: f(r M U, Y B, Y U \Y B, I, θ, ϕ) = f(r Y B ) f(i M U, Y B, Y U \Y B, θ, ϕ) = f(i Y B ) i.e. The sampling and censoring processes can be ignored in transforming Y B to M U ECB Workshop on Using Big Data, 7-8 April 2014 9

Example: sensor data for agricultural statistics In the case of the remote sensing example, M U represents crop type for land parcels in the Australian continent, and Y B the remote sensing data covering Australia from Landsat 7 As the full data Y U is available from Landsat 7, Y U = Y B and the first requirement of ignorability conditions is satisfied When there is missing data, the second requirement needs to be checked. If the missing data is due to random short-term bad weather, it is safe to assume that this is not associated with the measure of interest (crop type) and we treat the data set as a random sample In the case where missing data is due to systemic effects, then an assessment is required to determine whether the observed data set comprises a random sample ECB Workshop on Using Big Data, 7-8 April 2014 10

Big Data, Big Technologies Semantic Web Machine intelligence Data visualisation Distributed computing ECB Workshop on Using Big Data, 7-8 April 2014 11

ABS use of Big Data The ABS continually strives to Reduce the cost of statistical production and support Improve the relevance and timeliness of its products Create new statistics that better meet emerging needs As part of its business transformation program to achieve these aspirations, ABS is taking small steps to exploit particular Big Data opportunities ECB Workshop on Using Big Data, 7-8 April 2014 12

Big Data Flagship Project Build a strong foundation for the mainstream use of Big Data in statistical production Methods Skills Tools and infrastructure Through a coordinated set of targeted R&D initiatives Match Big Data opportunities to specific business problems Deliver fit for purpose solutions as working prototypes Enhance partnerships with academia, industry and other NSIs Contribute to a whole-ofgovernment capability ECB Workshop on Using Big Data, 7-8 April 2014 13

Research areas Satellite and ground sensor data for agricultural statistics Mobile positioning data for measuring population mobility Multiply-linked employer-employee data for productivity analysis Predictive modelling of survey non-response behaviour Predictive Modelling of Unemployment for small areas Data visualisation techniques for exploring large datasets ECB Workshop on Using Big Data, 7-8 April 2014 14

Sensor Data for Agricultural Statistics Use of satellite sensor data for producing agricultural statistics Landsat 7 imagery from US Geological Survey (multispectral data from 7 frequency bands, 30m grid size) Estimate land use and crop type Apply machine learning to automated feature recognition Promising results for wheat, barley, oats, canola and field peas Stage 2: extend to the use of ground sensor data Sense-T ground sensor data (temperature, moisture, etc) Estimate crop yield Apply domain-specific agronomic models of crop growth ECB Workshop on Using Big Data, 7-8 April 2014 15

Questions? ECB Workshop on Using Big Data, 7-8 April 2014 16