Data Analytics at NICTA. Stephen Hardy National ICT Australia (NICTA) shardy@nicta.com.au



Similar documents
BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

HT2015: SC4 Statistical Data Mining and Machine Learning

Challenges for Data Driven Systems

An Introduction to Data Mining

Scope and Sequence Interactive Science grades 6-8

Course Requirements for the Ph.D., M.S. and Certificate Programs

Big Data Text Mining and Visualization. Anton Heijs

CPO Science and the NGSS

MS1b Statistical Data Mining

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Big Data and Marketing

Machine Learning for Data Science (CS4786) Lecture 1

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

Machine Learning: Overview

testo dello schema Secondo livello Terzo livello Quarto livello Quinto livello

BIG DATA What it is and how to use?

PeerEnergyCloud Trading Renewable Energies

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

An Overview of Knowledge Discovery Database and Data mining Techniques

Partnership to Improve Solar Power Forecasting

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Solar Irradiance Forecasting Using Multi-layer Cloud Tracking and Numerical Weather Prediction

Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011

Big Data Analytics for SCADA

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Ambiata.com. Personalisation with Predictive Analytics Dr Rami Mukhtar National ICT Australia May 2013

Big Data and Complex Networks Analytics. Timos Sellis, CSIT Kathy Horadam, MGS

Sensor Devices and Sensor Network Applications for the Smart Grid/Smart Cities. Dr. William Kao

Machine Learning Business Intelligence, Culturomics and Life Sciences

Predictive modelling around the world

PREDICTIVE AND OPERATIONAL ANALYTICS, WHAT IS IT REALLY ALL ABOUT?

Information Management course

ICT Perspectives on Big Data: Well Sorted Materials

Cloud tracking with optical flow for short-term solar forecasting

Introduction to Data Mining

Big Data Analytic Paradigms -From PCA to Deep Learning

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

Curriculum Map Earth Science - High School

PROGRAM DIRECTOR: Arthur O Connor Contact: URL : THE PROGRAM Careers in Data Analytics Admissions Criteria CURRICULUM Program Requirements

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Data Isn't Everything

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Interpretation of Data (IOD) Score Range

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Statistics for BIG data

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Statistics Graduate Courses

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends

MIDLAND ISD ADVANCED PLACEMENT CURRICULUM STANDARDS AP ENVIRONMENTAL SCIENCE

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Virtual Reality Scientific Visualisation - A Solution for Big Data Analysis of the Block Cave Mining System

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Computer Animation and Visualisation. Lecture 1. Introduction

Medical Big Data Interpretation

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

SYSTEMS, CONTROL AND MECHATRONICS

Introduction to Data Mining

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Principles of Data Mining by Hand&Mannila&Smyth

From Big Data to Smart Data Thomas Hahn

Big Data Big Knowledge?

Traffic Prediction and Analysis using a Big Data and Visualisation Approach

Data, Measurements, Features

Search and Data Mining: Techniques. Introduction Anna Yarygina Boris Novikov

MSCA Introduction to Statistical Concepts

The University of Jordan

The Next Generation Science Standards (NGSS) Correlation to. EarthComm, Second Edition. Project-Based Space and Earth System Science

Machine learning for algo trading

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

AORC Technical meeting 2014

MSCA Introduction to Statistical Concepts

Nagarjuna College Of

ANALYTICS IN BIG DATA ERA

Master of Science in Computer Science

Proposal for New Program: BS in Data Science: Computational Analytics

Improving Accuracy of Solar Forecasting February 14, 2013

HUAWEI Advanced Data Science with Spark Streaming. Albert Bifet

IT services for analyses of various data samples

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Anomaly Detection and Predictive Maintenance

ANALYTICS IN BIG DATA ERA

National Security and Cyber Defense with Big Data

Power Prediction Analysis using Artificial Neural Network in MS Excel

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

Learning outcomes. Knowledge and understanding. Competence and skills

Big Data Analytics and Healthcare

Guidelines for the Estimation and Reporting of Australian Black Coal Resources and Reserves

Big Data for smart infrastructure: London Bridge Station Redevelopment. Sinan Ackigoz & Krishna Kumar Cambridge, UK

Transcription:

Data Analytics at NICTA Stephen Hardy National ICT Australia (NICTA) shardy@nicta.com.au NICTA Copyright 2013

Outline Big data = science! Data analytics at NICTA Discrete Finite Infinite Machine Learning for the natural sciences NICTA Copyright 2013 2

Data, Data, Everywhere 3

Evolution vs. Revolution Statistics Machine Learning Computer Science problems Personal techniques techniques Societal Challenges Enterprise problems problems Government Scientific Challenges techniques Analysis of data to prove or disprove hypotheses = science!! 4

Not just the data Data Scale Infrastructure Algorithmic complexity Machine learning toolkits Graphical models Volume Analytics Engines SQL / NoSQL Graph learning Deep learning Velocity Variety Distributed computation File systems Random forests Nonparametric statistics Big Data Data Analytics Big Analytics 5

What is NICTA? Australia s National Centre of Excellence in Information and Communication Technology 700 Staff, 5 labs, $100m/y revenue NICTA objectives Research Excellence in ICT Wealth Creation for Australia Transforming Industry $3bn/y direct impact on GDP from projects New Industries Eleven spin-outs, working with ICT SMEs Skills and Capacity 17 University partners, 280 PhD Students NICTA Copyright 2010

Data Analytics: A summary Discrete ℵ P(n i ) Events People Finite R n P(x i ) Signals Location Infinite I P( f i ) Spatial Fields Temporal Fields NICTA Copyright 2013 7

NICTA Data Analytics (1) Discrete ℵ P(n i ) Events, People, Text, Gene Sequences Scoobi data mining / Active learning Energy constrained machine learning Edge-distributed learning Offer targeting Risk Estimation Behaviour prediction Biomedical texts Opinion Watch Event Watch Machine learning for Natural Language Processing Patent analysis Biomedical informatics Sentiment analysis Xenome GWIS Efficient compressed storage and search for sequence data Bioinfomatics NICTA Copyright 2013 8

Event watch Demo http://pmo-eventwatch.research.nicta.com.au/demo/ Sentiment Analysis 40,000 world lexicon Part of Speech Sentiment Key phase extractor Named Entity Recognition LDA: Latent Dirichlet Allocation Differential topic modeling Supervised LDA 9

Key technology - Topic modeling Document 5 Document 4 Document 3 A B C D Vocabulary Document 2 Document 1 1 Probability distribution Topic A Probability distribution 2 Probability distribution Topic B Probability distribution 3 Probability distribution 4 Probability distribution Topic C Probability distribution 5 Probability distribution Topic D Probability distribution Documents consist of words Documents are modeled as a mixture of topics Words are associated with topics Latent Dirichlet Allocation learns the distributions and allocates every word in each document to a topic 10

NICTA Data Analytics (2) Finite R n P(x i ) Signals, Location, Genetics SparSNP Efficient distributed sparse regression method Disease expression Cri$cal(Water(Mains( Non-parametric Bayesian methods Preventative Maintenance Structural(Health(Monitoring( distributed, autonomous, real-time data with classification / clustering Fault Prediction Service optimisation SmartGrid( NICTA Copyright 2013 1 1

12

Machine Learning Process Existing data NICTA s analysis Cond. Assessment Age Type Material Size Length Failures Soil Pressure Location Weather and many more Hierarchical Beta Process Risk / age Risk / type Risk / size Age profile Complex data mix Accurate Improved prediction Data Driven prediction from multiple existing data sources Dynamic model update and aggregation 13

Improvement on failure prediction Use 1998-2008 break records for modelling building Use 2009-2011 break record for testing Multiple factors Laid year, material, size, coating, and soil Failures detected Wollongong NICTA Weibull Length of condition assessment NICTA Weibull NICTA NICTA COPYRIGHT Copyright 2013 2012 zoom in (2.5%) 14

Risk Map Risk ranking of pipes based on likelihood of failure Red = highest Top 10% pipes 10% ~ 40% pipes 40% ~ 60% pipes Last 40% pipes Actual breaks in the following year Blue = lowest

NICTA Data Analytics (3) Infinite I P( f i ) Spatial Fields, Temporal Fields Renewable Energy Solar Energy Forecast Software Geothermal( Groundwater( Did you know failure to predict solar energy production will mean we won t fully capture available solar resources? The Problem Electricity grids around the world were not designed to manage large fluctuations of supply in power generation. Traditional forms of power supply such as coal-fired stations provide a stable, non-fluctuating form of power supply. However, the energy we receive from the sun is much more unpredictable and grids are not designed to cope with the dynamic nature of renewable energy production. Data Fusion with Current prediction methods are not accurate enough the suburb level and not fine-grained enough (i.e. uncertainty estimation currently a matter of days, not minutes). Current methods also require expensive (up to $75,000) and obtrusive equipment in a large area to collect the required data. Resource exploration Soils( ((((((((Air(quality( Solar! Impact google.com.au/images Non-parametric Bayesian methods en.wikipedia.org Resource management NICTA aims to lower the costs of solar monitoring systems to allow for fast, affordable forecast systems to be installed all over Australia. Specifically, we aim to: Develop low-cost devices ($500) that measure current levels of rooftop solar power production by monitoring 150 households across the ACT. Technical Contact Nicholas.Engerer@nicta.com.au Business Contact Jodi.Steel@nicta.com.au Utilise low-cost sky cameras ($250) to detect cloud cover. From these images, NICTA s researchers will project the motion of the clouds and estimate the 'darkness' of their shadows, thereby predicting their inhibitive effect on power output. Develop software that will predict solar energy production by suburb within minutes and hours rather than days. Transparent Machine Learning Resource discovery Plant system diversity Non-linear laser physics Big(Data(Knowledge(Discovery( NICTA Copyright 2013 Collaborators The Solar Energy Forecast Software project is part of NICTA s Security and Environment Business Team, providing security for people, resources and critical systems. Research Excellence in ICT Wealth Creation for Australia 16

Engineered Geothermal Systems

Geophysical Data Gravity Magnetics Core Samples Temperature Reflection Seismic Magnetotellurics Gravity Gradiometry Down-hole Geophysics Stress Porosity Passive Seismic Micro Seismic...

Distributions of geologies Magneto-Telleurics Seismic Magnetism Gravity Probability Distribution

Results fusing gravity & boreholes Predicted mean density and uncertainty 20

Reuse Statistics Machine Learning Computer Science problems Personal techniques techniques Societal Challenges Enterprise problems problems Government Scientific Challenges techniques How can we apply new techniques of machine learning / analytics to science? 21

Machine Learning in the Natural Sciences Big Data Knowledge Discovery Science and Industry Endowment Fund (www.sief.org) project Collaboration between NICTA (machine learning) SIRCA (big data) Sydney Uni (plate tectonics) Macquarie Uni (forest ecosystems, non-linear laser physics) How do we make machine learning easier to use in the natural sciences?

The End