Intro Inverse Modelling

Similar documents
Gaussian Conjugate Prior Cheat Sheet

Lecture 3: Linear methods for classification

Christfried Webers. Canberra February June 2015

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Algebra I Vocabulary Cards

Least-Squares Intersection of Lines

1.5. Factorisation. Introduction. Prerequisites. Learning Outcomes. Learning Style

Tutorial on Markov Chain Monte Carlo

Principle of Data Reduction

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

1 3 4 = 8i + 20j 13k. x + w. y + w

LINEAR INEQUALITIES. less than, < 2x + 5 x 3 less than or equal to, greater than, > 3x 2 x 6 greater than or equal to,

Centre for Central Banking Studies

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Machine Learning and Pattern Recognition Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.

constraint. Let us penalize ourselves for making the constraint too big. We end up with a

A Log-Robust Optimization Approach to Portfolio Management

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

2.2 Creaseness operator

Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the school year.

Understanding and Applying Kalman Filtering

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

STA 4273H: Statistical Machine Learning

Introduction to Support Vector Machines. Colin Campbell, Bristol University

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

1 Lecture: Integration of rational functions by decomposition

Corrections to the First Printing

Linear Equations in One Variable

Parameter Estimation for Bingham Models

Prentice Hall: Middle School Math, Course Correlated to: New York Mathematics Learning Standards (Intermediate)

Common Core State Standards for Mathematics Accelerated 7th Grade

Numerical methods for American options

Practice with Proofs

J. P. Oakley and R. T. Shann. Department of Electrical Engineering University of Manchester Manchester M13 9PL U.K. Abstract

Java Modules for Time Series Analysis

Representing Uncertainty by Probability and Possibility What s the Difference?

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

PRE-CALCULUS GRADE 12

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Model-based Synthesis. Tony O Hagan

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on

Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization

Bayesian Classifier for a Gaussian Distribution, Decision Surface Equation, with Application

PTE505: Inverse Modeling for Subsurface Flow Data Integration (3 Units)

Maximum Likelihood Estimation

3.1 Solving Systems Using Tables and Graphs

Summer Assignment for incoming Fairhope Middle School 7 th grade Advanced Math Students

CSCI567 Machine Learning (Fall 2014)

Nonparametric adaptive age replacement with a one-cycle criterion

ENGG1811 Computing for Engineers. Data Analysis using Excel (weeks 2 and 3)

Data Mining: Algorithms and Applications Matrix Math Review

Algebra I Teacher Notes Expressions, Equations, and Formulas Review

Dealing with large datasets

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Basics of Statistical Machine Learning

Weighted-Least-Square(WLS) State Estimation

DRAFT. Algebra 1 EOC Item Specifications

Review of Fundamental Mathematics

LECTURE 2: Stress Conditions at a Fluid-fluid Interface

Linear Threshold Units

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

15 Kuhn -Tucker conditions

Introduction to General and Generalized Linear Models

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Logistic Regression (1/24/13)

4.1 INTRODUCTION TO THE FAMILY OF EXPONENTIAL FUNCTIONS

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Detection of changes in variance using binary segmentation and optimal partitioning

Aim. Decimal Search. An Excel 'macro' was used to do the calculations. A copy of the script can be found in Appendix A.

Introduction to Matrix Algebra

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Multivariate normal distribution and testing for means (see MKB Ch 3)

Constrained curve and surface fitting

Algebra II End of Course Exam Answer Key Segment I. Scientific Calculator Only

Theory at a Glance (For IES, GATE, PSU)

Estimating an ARMA Process

OPTIMAL PORTFOLIO ALLOCATION WITH CVAR: A ROBUST

Some probability and statistics

14. Nonlinear least-squares

MIDLAND ISD ADVANCED PLACEMENT CURRICULUM STANDARDS AP ENVIRONMENTAL SCIENCE

UNCORRECTED PAGE PROOFS

Duality in General Programs. Ryan Tibshirani Convex Optimization /36-725

Descriptive Statistics

H.Calculating Normal Vectors

Theory versus Experiment. Prof. Jorgen D Hondt Vrije Universiteit Brussel jodhondt@vub.ac.be

Lecture 9: Introduction to Pattern Analysis

Computing a Nearest Correlation Matrix with Factor Structure

In order to describe motion you need to describe the following properties.

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Notes on metric spaces

An Introduction to Partial Differential Equations

Solving Quadratic & Higher Degree Inequalities

Integer Factorization using the Quadratic Sieve

Transcription:

Intro Inverse Modelling Thomas Kaminski (http://fastopt.com) Thanks to: Simon Blessing (FastOpt), Ralf Giering (FastOpt), Nadine Gobron (JRC), Wolfgang Knorr (QUEST), Thomas Lavergne (Met.No), Bernard Pinty (JRC), Peter Rayner (LSCE), Marko Scholze (QUEST), Michael Voßbeck (FastOpt) 4th Earth Observation Summer School, Frascati, August 8 FastOpt

Overview BP: Basic concepts: from remote sensing measurements to surface albedo estimates () BP: Basic concepts: from remote sensing measurements to surface albedo estimates () TK: Introduction to inverse modelling TK: Intro tangent and adjoint code construction TK3: Demo: BRF inverse package and Carbon Cycle Data Assimilation System BP3: Monitoring land surfaces: applications of inverse packages of both surface BRF and albedo models FastOpt

References and Definitions A. Tarantola, Inverse Problem Theory and Methods for Model Parameter Estimation, SIAM, (987, 4) I. G. Enting, Inverse Problems in Atmospheric Constituent Transport, C.U.P. ()

Definitions Name Symbol Description Parameters p Quantities not changed by model, i.e. process parameters, boundary conditions State variables v(t) Quantities altered by model from time step to time step Control variables x Quantities exposed to optimisation, a combination of subsets of p and v(t = ) Observables o Measurable quantities Observation operator H Transforms v to o Model M Predicts o given p and v(t = ), includes H Data d Measured values of o

Statement of Problem Given a model M, a set of measurements d of some observables o, and prior information on some control variables x, produce an updated description of x. x may include parts of p and v(t = ) o = M( x)

Information and PDFs We seek true value but all input information is approximate Treat model, measurements and priors as PDFs describing distribution of true value Data and prior error easy Model error is difference between actual and perfect simulation for a given x Arises from errors in equations and errors in those parameters not included in x

Combining PDFs Operate in joint parameter and data space Estimates are combination of prior, measurement and model (black triangle) Estimate is multiplication of PDFs Only involves forward models Parameter estimate projection of PDF onto X-axis

Summary statistics from PDF Combination of PDFs (upper) and posterior PDF for parameters (lower) Can calculate all summary statistics from posterior PDF Maximum Likelihood Estimate (MLE) maximum of PDF

Tightening Model Constraint Note no viable solution If we cannot treat model as soft constraint we must inflate data uncertainty Model error hard to characterise

Gaussian Case G(x) = e (x µ) σ πσ.5.5.5 - - - - - - - - - - - - Data: P d e (d ).6 Parameters: P x e x.6 Model: P M e (d x).6

Prior.5 - - - -

Data.5 - - - -

Prior plus Data.5 - - - -

Model.5 - - - -

Prior plus Data plus Model.5 - - - -

Prior plus Data plus Model - - - -

PDFs and cost functions MLE most common quantity Maximise PDF numerically Most common P (x) e J so maximising P minimising J J usually called the cost (or misfit or objective) function Exponentials convenient because multiplying exponentials adding exponents

Perfect and imperfect models Imperfect Model Perfect Model Include observables ( o) in control variables Observables not included in control vector J = ( x x ) + ( o d)! (M( x) o) + J = ( x x ) + (M( x)! d) σx σobs σm σx σd x prior estimate for x, σ standard deviations. Allows explicit model error Different from weak constraint 4D-Var in which model state added to control variables If model imperfect, carried model error in σ d : σ d =σ obs + σ M This is the form usually used in data assimilation For linear M there is a closed solution Otherwise minimised by iterative algorithm, usually using gradients Can calculate posterior σ x by rearranging PDF

Covariances J = ( ( x x ) σ x + (M( x) d) ) σd Takes the more general form J = ( ( x x ) T C( x ) ( x x ) + (M( x) d) T C( d) (M( x) d) ) Off-diagonal terms represent correlation in uncertainty Persistent instrumental error Model error usually correlated Errors in initial conditions from previous forecast

Linear Gaussian case J = ( ( x x ) T C( x ) ( x x ) + (M x d) T C( d) (M x d) ) x opt = x + C( x )M T ( MC( x )M T + C( d)) ( d M x ) C( x) = C( x ) + M T C( d) M = d J(x opt ) dx C( x) does not depend on values, only uncertainties and model

How to solve it?. Minimisation Efficient minimisation algorithms use J(x) and the gradient of J(x) in an iterative procedure. Typically the prior value is used as starting point of the iteration. The gradient is helpful as it always points uphill. The adjoint is used to provide the gradient efficiently. Example: Newton algorithm for minimisation Gradient: g(x) = dj/dx(x) Hessian: H(x) = dg/dx(x) = d J/dx (x) At the minimum, x min : g(x min ) =, hence: g(x) = g(x) g(x min ) ~ H (x) (x-x min ) rearranging yields: (x min - x) ~ - H - (x) g(x) Figure: Tarantola (987) Smart gradient algorithms use an approximation of H(x) Figure: Fischer (996) FastOpt

How to solve it?. Error bars Hessian quantifies the curvature of the cost function Use Hessian at minimum to approximate C(x) - Figure taken from Tarantola (987) --> FastOpt

Exercise Do algebra for simple case Gaussian control variable observation Model: identity FastOpt