Hybrid Data Assimilation in the GSI Rahul Mahajan NOAA / NWS / NCEP / EMC IMSG GSI Hybrid DA Team: Daryl Kleist (UMD), Jeff Whitaker (NOAA/ESRL), John Derber (EMC), Dave Parrish (EMC), Xuguang Wang (OU) 14 August 2015 DTC GSI Workshop, Boulder
Incremental Variational DA J( δx)= 1 2 δx T 1 B s δx + 1 2 ( H k M k δx d k ) T R 1 k H k M k δx d k J : Penalty = Fit to background + Fit to observations δx : Analysis increment (x a x b ) ; where x b is a background B s : (Static) Background error covariance, estimated a priori / offline H : Linearized observation (forward) operator M : Tangent-linear of the forecast model R : Observation error covariance (Instrument + representativeness) K k=1 ( ) d : Innovation y h(x b ), where y are the observations Cost function (J) is minimized to find solution, δx ; x a = x b + δx
Why is B e necessary? Introduces flow-dependence/errors of the day to analysis increments Provides multivariate correlations from dynamic model - quite difficult to incorporate this into fixed error covariance models Sparse observations near coherent dynamical features are utilized more effectively. Evolves with system, can capture changes in the observing network through evolving background error covariances More information extracted from the observations => better analysis => better forecasts
How the EnKF may achieve its improvement relative to 3D-Var: better background-error covariances B s B e Courtesy: Jeff Whitaker
What does B e gain us? Surface pressure observation near an atmospheric river Background surface pressure (white contours) and precipitable water increment (red contours) after assimilating a single surface pressure observation (yellow dot) using B e 3DVar increment would be zero! (cross-covariances are hard to model with a static B s ) Courtesy: Jeff Whitaker
More examples of flow-dependent background-error covariances Courtesy: Jeff Whitaker
Hybrid Data Assimilation J( δx)= 1 2 δx T ( β s B s + β e B e ) 1 δx + J o Simply put, linear combination of static and ensemble based B: B s : Static background error covariance B e : Ensemble estimated background error covariance β s : Weighting factor for static contribution β e : Weighting factor for ensemble contribution (typically 1- β s ) This does not address the need or lack of the TL/Adj. in 4D
Hybrid Methods Nomenclature Hybrid: Variational methods that combine static and ensemble covariances. EnVar: Variational methods using ensemble covariances Hybrid 4DVar: Variational method using a combination of static and ensemble covariances at the beginning of the window, but using a tangent-linear and adjoint model Hybrid 4DEnVar: Variational method using a combination of static and ensemble covariances at the all times in the window, without the need of a tangent-linear or adjoint
GSI Hybrid 3DEnVar (ignoring preconditioning for simplicity) Incorporate ensemble perturbations directly into variational cost function through extended control variable method Lorenc (2003), Buehner (2005), Wang et. al. (2007), etc. Other methods exist, e.g. Buehner et. al. (2011), Liu et. al (2008) 1 J( δx s,α)=β s 2 δx T 1 1 s B s δx s + β e 2 M m=1 α m T L 1 α m + 1 ( 2 Hδx d t ) T R 1 ( Hδx t d) M e δx t =δx s +Τ α m! x m = δx s +Τ α m! X m X m=1 δx t : (total increment) sum of increment from static δx s and ensemble parts x m e ( ) α m : extended control variable; : ensemble perturbations L: correlation matrix [effectively the localization of ensemble perturbations] T: operator mapping from ensemble grid to analysis grid (for dual-resolution) M m=1
Preconditioning Sidebar ν m = β e L 1 α m z = β s B s 1 δx s J( z,ν )= 1 2 δx T s z + 1 2 M m=1 α mt ν m +J o δx s = β s 1 Bz α = β e 1 Lν In preconditioned conjugate gradient minimization, inverses of B and L not needed and the solution is pre-conditioned by full B. This formulation differs from the UKMO and CMC, who use a square root formulation and apply the weights directly to the increment: δx t =δx s +Τ M m=1 α m! x m e
Single Temperature Observation 3DVAR β s =0.0 11 β s =0.5
So what s the catch? Need an (fairly large) ensemble to accurately represent the uncertainty in the background forecast Fairly large implies O(50-100) for NWP type applications smaller ensemble sizes lead to larger sampling errors more weight will be given to the static larger ensembles have increased computational burden An update algorithm is needed for the ensemble perturbations. NCEP operations applies a Ensemble Kalman Filter (EnKF) EnKF in itself is a standalone DA system that updates the mean and perturbations with new observations. Google Ensemble Data Assimilation for an excellent review article by Tom Hamill
Dual- Res Coupled Hybrid Var/EnKF Cycling T254 L64 member 1 forecast member 2 forecast member 3 forecast Generate new ensemble perturba6ons given the latest set of observa6ons and first- guess ensemble EnKF member update recenter analysis ensemble member 1 analysis member 2 analysis member 3 analysis Ensemble contribu6on to background error covariance Replace the EnKF ensemble mean analysis and inflate T574 L64 high res forecast GSI Hybrid Ens/Var high res analysis Previous Cycle Current Update Cycle
What if I am not running an EnKF? In principle any ensemble can be used; so long as the ensemble represents the forecast errors well. GSI can ingest GFS global ensemble to update regional models WRF ARW/NMM NAM RR HWRF 80 member GFS/EnKF 6h ensemble forecasts are archived at NCEP since May 2012. Real time ensemble is also publicly available.
Ensemble s of EnsVar no EnKF! Much too expensive right now!
GSI Hybrid Configuration Hybrid related parameters controlled via GSI namelist &hybrid_ensemble Logical to turn on/off hybrid ensemble option (l_hyb_ens) Ensemble size (n_ens), resolution (jcap_ens, nlat_ens, nlon_ens) Source of ensemble: GFS spectral, native model, etc. (regional_ensemble_option) Weighting factor for static contribution to increment (beta1_inv) Horizontal and vertical distances for localization, via L on augmented control variable (s_ens_h, s_ens_v) Localization distances are the same for all variables since operating on α. i.e. no variable localization Option to specify different localization distances as a function of vertical level (readin_localization) Instead of single parameters, read in ascii text file containing a value for each layer Example for global in fix directory (global_hybens_locinfo.l64.txt) Other regional options related to resolution, pseudo ensemble, etc. Other 4D options related to binning, thinning of observations in the assimilation window.
Localization Simple example Courtesy: Jeff Whitaker
Localization Real world example
Hybrid 4DEnVar [No Adjoint] The cost function can be easily expanded to include a static contribution 1 J( δx s,α)=β s 2 δx T 1 1 s B s δx s + β e 2 M 1 2 K k=1 α T m L 1 α m + Where the 4D increment is prescribed exclusively through linear combinations of the 4D ensemble perturbations and a static contribution Here, the static contribution is considered time-invariant (i.e. from 3DVAR-FGAT). Hybrid weighting parameters (β) exist just as in the other hybrid variants. Also, the ensemble perturbation weights (α) are assumed to be the same. M m=1 ( H k δx t,k d k ) T R 1 k H k δx t,k d k ( ) ( ) δx t,k =δx s +Τ α m! x e m,k = δx s +Τ α m! X m,k X k m=1 M m=1 ( )
Single Observation (-3h) Example for 4D Variants 4DVar 4DEnVar H-4DVar β s =0.25 H-4DEnVar β s =0.25
Time Evolution of Increment t = -3h Solution at beginning of window same to within round-off (because observation is taken at that time, and same weighting parameters used) t = 0h Evolution of increment qualitatively similar between dynamic and ensemble specification t = +3h ** Current linear and adjoint models in GSI are computationally impractical for use in 4DVAR other than simple single observation testing at low resolution H-4DVar H-4DEnVar
3DVar / H3DEnVar / H4DEnVar H4DEnsVar 3DVar Move from 3D Hybrid (current operations) to Hybrid 4D-EnVar yields improvement that is about 75% in amplitude in comparison from going to 3D Hybrid from 3DVAR.
Summary The hybrid ensemble GSI system uses an ensemble of first-guess forecasts to better estimate the background-error covariance term in the cost function. More information can be extracted from observations Added expense mostly IO Added complexity running (and updating) an ensemble. maintaining an additional DA component (EnKF), alternative EVIL More tuning is necessary that depend on model, resolution, observing network localization length scales ensemble and static weights Ensemble (co)variances must be representative of control forecast error. Current operations is H3DEnVar, with an upgrade to H4DEnVar coming in Q2FY16.
Some Relevant References Buehner, M., P. L. Houtekamer, C. Charette, H. L. Mitchell, B. He, 2010a: Intercomparison of variational data assimilation and the ensemble Kalman filter for global deterministic NWP. Part I: Description and single-observation experiments. Mon. Wea. Rev., 138, 1550-1566. Buehner, M., P. L. Houtekamer, C. Charette, H. L. Mitchell, B. He, 2010b: Intercomparison of variational data assimilation and the ensemble Kalman filter for global deterministic NWP. Part II: One-month experiments with real observations. Mon. Wea. Rev., 138, 1567-1586. Kleist, D. T., and K. Ide, 2013: An OSSE-based Evaluation of Hybrid Variational-Ensemble Data Assimilation for the NCEP GFS, Part I: System description and 3D hybrid results. Mon. Wea. Rev. Kleist, D. T., and K. Ide, 2013: An OSSE-based Evaluation of Hybrid Variational-Ensemble Data Assimilation for the NCEP GFS, Part II: 4D EnVar and Hybrid Variants. Mon. Wea. Rev. Liu C., Q. Xiao, and B. Wang, 2008: An ensemble-based four dimensional variational data assimilation scheme. Part I: technique formulation and preliminary test. Mon. Wea. Rev., 136, 3363-3373 Wang, X., C. Snyder, and T. M. Hamill, 2007a: On the theoretical equivalence of differently proposed ensemble/3d-var hybrid analysis schemes. Mon. Wea. Rev., 135, 222-227. Wang, X., D. Parrish, D. Kleist, J. Whitaker, 2013: GSI 3DVar-Based Ensemble Variational Hybrid Data Assimilation for NCEP Global Forecast System: Single-Resolution Experiments. Mon. Wea. Rev., 141, 4098 4117.