A shared component hierarchical model to represent how fish assemblages vary as a function of river temperatures and flow regimes



Similar documents
STATHAB/FSTRESS SOFTWARES

Columbia River Project Water Use Plan. Monitoring Program Terms of Reference

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Columbia River Project Water Use Plan. Monitoring Program Terms of Reference LOWER COLUMBIA RIVER FISH MANAGEMENT PLAN

Statistics Graduate Courses

11. Time series and dynamic linear models

A Statistician s View of Big Data

Removal fishing to estimate catch probability: preliminary data analysis

Supplementary PROCESS Documentation

STAT3016 Introduction to Bayesian Data Analysis

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Exploratory Data Analysis -- Introduction

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

Vulnerability Assessment of New England Streams: Developing a Monitoring Network to Detect Climate Change Effects

Learning outcomes. Knowledge and understanding. Competence and skills

COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences Academic Year Qualification.

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared.

13. Steps 3.1 & 3.2. Develop objectives, indicators and benchmarks. Essential EAFM Date Place. Version 1

Analysing Ecological Data

240ST014 - Data Analysis of Transport and Logistics

Contributions to extreme-value analysis

Simple Predictive Analytics Curtis Seare

Regression Modeling Strategies

The Economics of Ecosystem Based Management: The Fishery Case

BayesX - Software for Bayesian Inference in Structured Additive Regression

Machine Learning with MATLAB David Willingham Application Engineer

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Machine Learning.

Introduction to Principal Components and FactorAnalysis

Combining GLM and datamining techniques for modelling accident compensation data. Peter Mulquiney

Logistic Regression (a type of Generalized Linear Model)

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Multivariate Analysis of Ecological Data

Functional Principal Components Analysis with Survey Data

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

MSCA Introduction to Statistical Concepts

Some Essential Statistics The Lure of Statistics

Reproductive biology of Capoeta tinca in Gelingüllü Reservoir, Turkey

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Spatial distribution of 0+ juvenile fish in differently modified lowland rivers

Thank you to all of our 2015 sponsors: Media Partner

MSCA Introduction to Statistical Concepts

Additional sources Compilation of sources:

SAS Software to Fit the Generalized Linear Model

Example G Cost of construction of nuclear power plants

List of publications and communications

Multivariate Statistical Inference and Applications

How to Get More Value from Your Survey Data

Moral Hazard in the French Workers Compensation System

Lecture 9: Introduction to Pattern Analysis

Statistics in Applications III. Distribution Theory and Inference

Data mining and official statistics

Principles of Data Mining by Hand&Mannila&Smyth

Survival analysis methods in Insurance Applications in car insurance contracts

Coenological and habitat affinities of Cobitis elongatoides, Sabanejewia balcanica and Misgurnus fossilis in Slovakia

Observing the Changing Relationship Between Natural Gas Prices and Power Prices

Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization

The application of root cause analysis for definition of scenarios in automotive domain

> plot(exp.btgpllm, main = "treed GP LLM,", proj = c(1)) > plot(exp.btgpllm, main = "treed GP LLM,", proj = c(2)) quantile diff (error)

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

Teaching Business Statistics through Problem Solving

Guidance for Industry

SUGI 29 Statistics and Data Analysis

The Filtered-x LMS Algorithm

Faculty of Science School of Mathematics and Statistics

SPOT 4 TAKE 5 Program

Teaching Multivariate Analysis to Business-Major Students

AP Biology Unit I: Ecological Interactions

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

What is discrete longitudinal data analysis?

CIESIN Columbia University

Segmentation models and applications with R

Course/Seminar Gilbert Ritschard Wednesday 10h15-14h M-5383 Anne-Laure Bertrand (Ass)

GLACier-fed rivers, HYDRoECOlogy and climate change; NETwork of monitoring sites (GLAC-HYDRECO-NET).

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Water Quality Modeling in Delaware s Inland Bays: Where Have We Been and Where Should We Go?

Introduction to Predictive Modeling Using GLMs

Pérégrination d un modèle alose au rythme d une méthode ABC

Visualization of textual data: unfolding the Kohonen maps.

A CONTENT STANDARD IS NOT MET UNLESS APPLICABLE CHARACTERISTICS OF SCIENCE ARE ALSO ADDRESSED AT THE SAME TIME.

Appendix B Checklist for the Empirical Cycle

VALIDATION OF ANALYTICAL PROCEDURES: TEXT AND METHODOLOGY Q2(R1)

Module 223 Major A: Concepts, methods and design in Epidemiology

Module 3: Correlation and Covariance

Spreadsheet software for linear regression analysis

Teaching model: C1 a. General background: 50% b. Theory-into-practice/developmental 50% knowledge-building: c. Guided academic activities:

Elements of statistics (MATH0487-1)

Operation Structure (OS)

PROBABILITY AND STATISTICS. Ma To teach a knowledge of combinatorial reasoning.

Functional Data Analysis of MALDI TOF Protein Spectra

Transcription:

A shared component hierarchical model to represent how fish assemblages vary as a function of river temperatures and flow regimes Jeremy PIFFADY (1,2), Éric PARENT (1) & Yves SOUCHON (2) (1) Équipe Modélisation, Risques, Statistique, Environnement de l UMR 518 INRA/AgroParisTech, 19, Avenue du Maine, 75732 Paris Cedex 15, France, (2) Institut de recherche en sciences et technologies pour l environnement, Pôle d Hydrobiologie des cours d eau,3 bis Quai Chauveau, 69336 Lyon 14 Janvier 2011 Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 1 / 34

Summary 1 How interannual variations of fish assemblages are linked to temperature and flow regimes? The Bugey case study location The response variables The explanatory variables 2 Challenges to the statistical analyst Challenging features Why not a GLM? 3 A shared component hierarchical model A cocktail model structure Inference Results Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 2 / 34

Jeremy s story of a modelling challenge Y = f(x, ε) Y : ecosystem behavior X : environmental variations of interest ε : unknown perturbations, noise f :...functional form of the answer...to be defined as well Application to three groupings of juveniles in the upper River Rhone during the 1980-2005 period. Let s find a statistician!? But we do have data, let s have a look... Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 3 / 34

The Bugey site Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 4 / 34

The Bugey site cont d Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 5 / 34

Many fish species can be found Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 6 / 34

But only 8 species are permanently caught Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 7 / 34

These eight species are common fish Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 8 / 34

They can be clustered in three groups Only the 8 species representing more than 5% each of total abundance were analysed : bleak (Alburnus alburnus), barbel (Barbus barbus), chub (Leuciscus cephalus), roach (Rutilus rutilus), gudgeon (Gobio gobio), nase (Chondrostoma nasus), stream bleak (Alburnoides bipunctatus) and dace (Leuciscus leuciscus). Gp1={bleak and dace} (Cool water group) Gp2={gudgeon, barbel and nase} (Benthic group) Gp3={ stream bleak, roach and chub} (Thermophilic group) Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 9 / 34

These three groups exhibits different time patterns Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 10 / 34

Temperature rules fish activities(reproduction, etc.) Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 11 / 34

Flow regimes mainly govern habitat features Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 12 / 34

Biological knowledge is require to extract yearly significant quantities from the daily temperature signal Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 13 / 34

Yearly patterns are extracted from flow regimes Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 14 / 34

Nine possibly explanatory covariates are extracted and standardized indices are designed Covariate mean sd C12 115.4 11.4 C18 162.5 13.9 Cmx 214.6 16.1 S1 12.5 88.0 S2 26.4 71.5 Qm1 565.0 176.2 Qmx1 920.3 275.6 Qm2 580.9 152.7 Qmx2 880.2 221.1 Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 15 / 34

Statistical challenges 1 The Bugey protocol versus traditionnal ecological hypotheses Only one pass : no difference can be made between capturability and population size Dynamic non linear models such as prey-predator with interactions cannot The system is not closed. Emigration/immigration The system is influenced by the nuclear plant warming the waters 2 The Bugey sampling protocol versus common statistical hypotheses A poorly controlled experiment Variables with different natures and different scales 3 Ambitious observational data study with much lack of contrast Is there anything to see? Abrupt changes? Are flows and temperatures the main drivers? Do they vary enough? Are not the remaining fish the most adapted (less significant of a change) species? Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 16 / 34

Why not a GLM? Write for each group s of species Y s,t dp ois(λ s,t ) log(λ s,t ) = X t β s + σε t with Y s counts of species s in experiment t X t values of the design matrix for experiment t β s coefficient characterizing answer to species s to environmental variations ε t N(0, 1) overdispersion due to uncontrolled conditions of experiment t Pb : 1 An additional model selection in search of influential variables 2 Poisson assumptions (same capturability,same fishing protocol) 3 Model selection to point out relevant explanatory variables may be tricky Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 17 / 34

A cocktail model structure Objectives Get rid of the main unstationnarities in the sampling protocol i-e work conditionnaly to the total number of captures Explain the variations of the specific ratios of species Avoid model selection traps Principles Join a multivariate analysis and a logistic regression model within a bayesian hierarchical structure A latent variable as a shared component : let s call it the hypersignal! Similar to a Partial Least Squareregression in the frequentist world Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 18 / 34

A cocktail model structure Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 19 / 34

A cocktail model structure (Y 1t, Y 2t, Y 3t ) dmult(p 1t, p 2t, p 3t, N t ) p 1t, p 2t, p 3t = f(x t ) which f? Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 20 / 34

A cocktail model structure Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 21 / 34

A cocktail model structure Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 22 / 34

A principal component-like analysis X 1 t = α 1 Z t + σ 1 ε 1t X 2 t = α 2 Z t + σ 2 ε 2t... X 9 t = α 9 Z t + σ 9 ε 9t Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 23 / 34

A cocktail model structure Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 24 / 34

A cocktail model structure (Y 1t, Y 2t, Y 3t ) dmult(p 1t, p 2t, p 3t, N t ) logit(p 1,t ) = a 1 Z t + b 1 + c 1,site + d 1,season logit(p 2,t ) = a 3 Z t + b 3 + c 3,site + d 3,season p 1,t + p 2,t + p 3,t = 1 X 1 t = α 1 Z t + σ 1 ε 1,t X 2 t = α 2 Z t + σ 2 ε 2,t... X 9 t = α 9 Z t + σ 9 ε 9,t Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 25 / 34

Inference : the hypersignal versus time Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 26 / 34

Inference : trying to explain the hypersignal as a function of environmental covariates Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 27 / 34

Inference : trying to understand the hypersignal as an explanation for species relative abundance Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 28 / 34

Back to the data Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 29 / 34

The four last years were taken as a validation period in predictive mode [Y new X new, y old ] = [Y new θ, Z, X new ][θ, Z y old, X new ]dθdz θ,z with θ = (α, σ, a, b, c, d) and p a,b,c,d (Z) s.t.logit(p) = az + b + c site + d season [Y new X new, y old ] = [Y new p a,b,c,d (Z), N new ][Z, X new, α, σ][θ y old ]dθdz θ,z Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 30 / 34

Discussion The protocol variations are somehow stabilized An a priori structure might be assumed for the hypersignal : Hidden Markov Model, Shifting level Model, Spline... Is the numbering the groups of any relevance? What would be the second principal component? Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 31 / 34

Discussion con d Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 32 / 34

Take home messages Y = f(x, ε) easy mathematical formulation but hard to specify Observational data with poor control and few constrast Much interplay between data exploratory analysis and model design Take into account overdispersion, different natures between inputs & outputs, model choice None readymade toolbox solution, design the model of your own! Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 33 / 34

Bibliographie Parent, E. et Bernier, J. (2007). Le Raisonnement Bayésien : Modélisation et inférence. Springer France, Paris. Boreux, JJ., Parent, E. et Bernier, J. (2010). Pratique du calcul Bayésien. Springer France, Paris. Hoff, P. (2009). A first course in Bayesian Statistical Methods. Springer. Marin, J.M. et Robert, C.P. (2007).(chapitre 3) The Bayesian Core. Springer. Éric PARENT et al. (Équipe Morse) Bayesian Hypersignal & Fish Assemblage 14 Janvier 2011 34 / 34