Applying linear mixed model to farm panel data set

Similar documents
Integration of Registers and Survey-based Data in the Production of Agricultural and Forestry Economics Statistics

Member States Factsheets I R E L A N D CONTENTS. Main figures - Year inhabitants Area km 2

Member States Factsheets I T A L Y CONTENTS. Main figures - Year inhabitants Area km 2

Estimating emission inventories of French farms using the Farm Accountancy Data Network (FADN)

Technical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE

Diversification of the marketing chains among organic producers

GCI = OC w + OC l + OC c <

STATISTICA Formula Guide: Logistic Regression. Table of Contents

WALLOON AGRICULTURE IN FIGURES

Farm Business Survey - Statistical information

Chapter 6: Multivariate Cointegration Analysis

Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure

Total Income from Farming in the United Kingdom. First estimate for 2015

Sensitive Agricultural Products in the EU under the Doha Round

Time-Series Regression and Generalized Least Squares in R

Designing public private crop insurance in Finland

Chapter 4: Vector Autoregressive Models

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

SAS Syntax and Output for Data Manipulation:

EU Milk Margin Estimate up to 2014

Republic of Macedonia Farm Business Data 2001/2002

SAS Software to Fit the Generalized Linear Model

TIME SERIES ANALYSIS

Agricultural Production and Research in Heilongjiang Province, China. Jiang Enchen. Professor, Department of Agricultural Engineering, Northeast

Statistics in Retail Finance. Chapter 6: Behavioural models

LANLP18 SQA Unit Code H5AG 04 Deliver basic treatments to livestock

Statistical Profile of Lunenburg County

Econometric Modelling for Revenue Projections

ADAPTATION OF STATISTICAL MATCHING IN MICRO-REGIONAL ANALYSIS OF AGRICULTURAL PRODUCTION

Introduction to Matrix Algebra

TECHNICAL APPENDIX Investment in Water Resource Management

Random effects and nested models with SAS

Available study programs at Czech University of Life Sciences Prague

Centre for Central Banking Studies

Penalized regression: Introduction

TURKEY - FINLAND FRESH FRUIT AND VEGETABLE TRADE

Farming at dairy farms (produktion på mælkelandbrug)

Univariate Regression

Farmer attitudes and potential barriers to the use of new organic fertilisers (Danish survey)

Introducing the Multilevel Model for Change

From the help desk: Swamy s random-coefficients model

ADVANCED FORECASTING MODELS USING SAS SOFTWARE

Longitudinal Data Analyses Using Linear Mixed Models in SPSS: Concepts, Procedures and Illustrations

ABARES. Lending to Australian Agriculture Financing the Farm

SCREENING CHAPTER 18 STATISTICS ANIMAL PRODUCTION STATISTICS

Week 5: Multiple Linear Regression

Rob J Hyndman. Forecasting using. 11. Dynamic regression OTexts.com/fpp/9/1/ Forecasting using R 1

Java Modules for Time Series Analysis

IMPACT OF PHASING OUT MILK QUOTAS ON STRUCTURE AND PRODUCTION OF FINNISH DAIRY SECTOR. Heikki Lehtonen

IAS/IFRS forest accounting and valuation on profitability bookkeeping

861 Example SPLH. 5 page 1. prefer to have. New data in. SPSS Syntax FILE HANDLE. VARSTOCASESS /MAKE rt. COMPUTE mean=2. COMPUTE sal=2. END IF.

IAS/IFRS Accounting and Forest Valuation on Farm Forestry

ENVIRONMENTAL IMPACTS OF ECO- LOCAL FOOD SYSTEMS

Farming. A comprehensive solution for farms, from recording cattle births to FADN reporting

FIXED AND MIXED-EFFECTS MODELS FOR MULTI-WATERSHED EXPERIMENTS

Some Quantitative Issues in Pairs Trading

Agri-Food Strategy Board Call For Evidence. Developing A Strategic Plan For The Agri-Food Sector In Northern Ireland

ANOVA. February 12, 2015

Milk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Forecasting areas and production of rice in India using ARIMA model

Simple linear regression

Assignments Analysis of Longitudinal data: a multilevel approach

Marketing Mix Modelling and Big Data P. M Cain

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

MATCHING BOTTOM-UP AND TOP-

Statistical Models in R

Financial Risk Management Exam Sample Questions/Answers

Model Selection and Claim Frequency for Workers Compensation Insurance

STATISTICAL PROFILE OF CAPE BRETON. Prepared By: Nova Scotia Federation of Agriculture

Australian lamb Stephen Hooper

Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA

Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

REBUTTAL TESTIMONY OF BRYAN IRRGANG ON SALES AND REVENUE FORECASTING

Factors Impacting Dairy Profitability: An Analysis of Kansas Farm Management Association Dairy Enterprise Data

This work is licensed under a Creative Commons Attribution-NonCommercial- NoDerivs 3.0 Licence.

1. Basic Certificate in Animal Health and Production (CAHP)

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

SAS R IML (Introduction at the Master s Level)

Nontechnical Summary

Biostatistics Short Course Introduction to Longitudinal Studies

Joint models for classification and comparison of mortality in different countries.

Projections of Global Meat Production Through 2050

Water Usage in Agriculture and Horticulture Results from the Farm Business Survey 2009/10 and the Irrigation Survey 2010

Not Your Dad s Magic Eight Ball

Fitting Subject-specific Curves to Grouped Longitudinal Data

More effective agricultural water protection TEHO-project. May 2011

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Notes on Applied Linear Regression

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Longitudinal Data Analysis

Volatility modeling in financial markets

TEMPORAL CAUSAL RELATIONSHIP BETWEEN STOCK MARKET CAPITALIZATION, TRADE OPENNESS AND REAL GDP: EVIDENCE FROM THAILAND

DLG Trend Monitor Europe

Data Analysis Tools. Tools for Summarizing Data

ORGANIC FARMING SCHEME BUSINESS PLAN

Review Jeopardy. Blue vs. Orange. Review Jeopardy

THE UNIVERSITY OF CHICAGO, Booth School of Business Business 41202, Spring Quarter 2014, Mr. Ruey S. Tsay. Solutions to Homework Assignment #2

UNIVERSITY OF WAIKATO. Hamilton New Zealand

Transcription:

Applying linear mixed model to farm panel data set Alina Sinisalo 1 1 MTT Agrifood Research Finland, e-mail: alinasinisalo@mttfi August 2014 Abstract The development of production costs were studied in Finnish farms in 2000-2011 with a linear mixed model taking into account farm-level information (production type, economic size, arable land) and time effect Interindividual differences in intraindividual changes over time were analyzed The production cost increased over time and as the size of arable land and standard output increased Keywords: linear mixed model, panel data, farm, cost, Finland 1 Introduction The structure of agriculture has changed In 2000-2011 the overall number of farms has dropped by 24% (pig farms 59%, dairy farms 52%, cattle farms 45%, horticulture outdoor 43%, poultry farms 39%, horticulture indoor 38%, mixed production 17%, cereal farms 9%) On the contrary the number of sheep and goat husbandry farm increased by 9% and other crop farms by 13% (Tike, 2014) During the same period the average farm size has grown, ie farms have more animals and more arable land This study focuses on the development of production costs in 2000 2011 on Finnish farms by production type The prices of goods and services currently consumed in agriculture (+47%) and goods and services contributing to agricultural investments (+42%) have increased faster than the Consumer Price Index that is used to measure general inflation rate (+21%) (OSF, 2014a,b) In almost all production types the production costs have increased: sheep and goat husbandry 220%, pig farms 111%, cattle farms 107%, dairy farms 92%, poultry farms 69%, cereal farms 64%, mixed production 55%, horticulture outdoor 43%, and horticulture indoor 28% On the contrary the production costs on other crop farms have decreased by 26% (MTT, 2014) Quite often production cost studies are based on average results (MTT, 2014), one year observations (Riepponen, 2003; Rantala, 1997) or different kind of models (Ala- Mantila, 1998) There are few studies in which the panel characteristics of dataset are implemented The goal of this study was to study how the production costs in 103

farms have developed in 2000s taking into account farm-level information (production type, economic size, arable land) and time effect by observing the same farms for several years, and to explore possible differences in the development between farms 2 Materials Farms participating in MTT profitability bookkeeping were studied for the years 2000 2011 The data set was formed as panel Each farm was repeatedly measured in one year intervals There were 11115 observations from 1564 different farms and on average 926 different farms every year in the data Hence, the data set was unbalanced This is due to the fact that it is voluntary to participate in MTT bookkeeping activities and, on the other hand, some farms had exited the business In this study the total production costs (continuous dependent variable) were studied The total is sum of following components: material, livestock, machinery, building, wages and interest costs Only costs targeted to agriculture were included; costs targeted to forestry and private operations were not studied The production costs were deflated by using Consumer price indices (OSF, 2014a) year to 2011 prices (2000=100) The farm-level data were weighted with weight factors calculated individually for each farm taking into account the type of operations, economic size and location by support areas Weights were calculated for each farm by stratum indicators separately for every year Furthermore weights were calibrated taking into account the total arable land in Finland Thus, after weighting, the used data can be used to describe the results of all Finnish farms In this study 10 different production types were included: cereal farms, other crop farms, horticulture indoor, horticulture outdoor, dairy farms, cattle farms, sheep and goats, pig farms, poultry farms and mixed production Cereal farms were compared to the other production types Farm size is often described by arable land, and it was included in the model as continuous explanatory variable describing the average amount of arable land for each year The standard output (SO) is the average monetary value of the agricultural output at farm-gate price per unit There is a regional standard output coefficient for each product (average value over 5 year period) The sum of all standard outputs per unit multiplied by hectares or headcount forms the overall economic size, expressed in euro (Eurostat, 2014) We classified the standard output variable into three classes We chose the smallest class (SO less than 50000 e) as basis for testing so that the medium class (SO 50000 100000 e) andthelargestclass(somorethan100000e) were compared to the smallest class 104

3 Methods AlinearmixedmodelincludesbothfixedandrandomeffectsAfixedeffectmodel or a random effect model can be considered as a special case of a mixed effects model The linear mixed model for an individual farm, i, was defined as followed: y i = X iβ fixed + Z ib i + i, random random b i N(0, D), i N(0, R i), b 1,,b n, 1,, n independent, (1) where y i is n i 1 response vector for farm i, β denotes a p 1 vector of unknown population parameters and X i denotes a known n i p design matrix linking β to y i The b i denotes a k 1 vector of unknown individual effects and Z i aknown n i k design matrix linking b i to y itheb i are distributed as N(0, D) (normal with mean 0 and covariance matrix D), independently of each other and of the i that are distributed as N(0, R i) TheD is a k k and R i is an n i n i positivedefinite covariance matrix Parameters β are treated as fixed effects and b i and i as random effects In our unbalanced panel data set the n i varied between 1 to 12 Unstructured (UN) covariance structure was chosen for random effects in the model since it is suitable for longitudinal data Random effects were defined over farm register number (observation unit i) For residual random effects first-order autoregressive (AR1) covariance structure was chosen because it is suitable for data containing sequential observations and correlations declining exponentially with time Six fixed effects, p, and two random effects, k were included in the model Intercept β 0,timeβ 1,landβ 2,standardoutputβ 3β 12, standardoutputβ 13β 15 and weight β 16 were treated as fixed effects Intercept b 0 and time b 1 were included in the model also as random effects The unstructured 2 2 covariance matrix for the random effects is denoted as followed: D = Var(b i)= σ 2 b0 σ b0,b 1 σ b0,b 1 σ 2 b 1, (2) where three parameters, b 0 variance, b 1 variance, b 0 and b 1 covariance, are denoted as UN(1,1), UN(2,2) and UN(2,1), respectively The first-order autoregressive covariance matrix for residual is denoted R i = Var( i)= σ 2 σ 2 ρ σ 2 ρ n i 1 σ 2 ρ σ 2 σ 2 ρ n i 2 σ 2 ρ n i 1 σ 2 ρ n i 2 σ 2 (3) 105

The total number of different farms in our data set was N =1564 N vectors of form (1) can be stacked y = y 1 y N, X = X 1 X N, Z = Z 1 0 0 0 0 0 0 Z N b = b 1 b N, = 1 N, and presented as followed: y = Xβ + Zb +, b N(0, G), N(0, R), Cov[b, ] =0 (4) observations between farms, i, being independent Model (4) parameters are included in the fixed effect vector β and unknown variables in covariance matrices G and R Explanatory variables are included in X and Z Vector b includes random effect variables 4 Results The results of linear mixed model explaining the production cost are presented in Table (1) Livestock farms have higher production costs compared to cereal farms Costs increase year-to-year (time =6073) As arable land is increased, also costs increase (land =825) If farms are categorized by their standard output, small farms (standard output <50000 e) have greater production costs than medium-sized farms (50000 100000 e) and large farms (>100000 e) The results show that year-to-year correlation is strong (0900, p<0001) Productions costs change at different pace between farms (p<0001) 5 Discussion Linear mixed model presented in this study fits well for the data set, Finnish farms Values predicted with the model slightly overestimate the average values besides for poultry farms, horticulture indoors and outdoors 106

Table 1: The estimates for model parameters Effect Parameter Estimate SE p-value intercept β 0 78275 4595 <0001 time β 1 6073 469 <0001 land β 2 825 46 <0001 Production type Other crop farms β 3 1766 2324 0447 Sheep and goat husbandry β 4 8316 12979 0641 Horticulture outdoor β 5 12641 6015 0036 Mixed production β 6 20974 2891 <0001 Cattle farms β 7 26952 4011 <0001 Dairy farms β 8 31561 3896 <0001 Pig farms β 9 39122 4676 <0001 Poultry farms β 10 48889 6798 <0001 Horticulture intdoor β 11 218666 10245 <0001 Cereal farms β 12 0 0 Standard output 50000 100000 e β 13 10068 1866 <0001 100000 β 14 25727 2377 <0001 0 50000 e β 15 0 0 Weight factor weight β 16-2599 3759 0489 Covariance parameters UN(1,1) σ0 2 4253 10 9 1012 10 9 <0001 UN(2,1) σ 0σ 1 0829 10 9 0059 10 9 <0001 UN(2,2) σ1 2 0162 10 9 0012 10 9 <0001 Residual AR1 diagonal σ 2 6846 10 9 0815 10 9 <0001 AR1 rho ρ 0900 0010 <0001 Observations 11115-2 REML log-likelihood 269583 AIC 269593 BIC 269630 107

Weight factors were calculated for each observation and weights were included in model as continuous explanatory variable that, however, was not significant The inclusion of weights should be further studied The forecasting possibilities should be further studied References Ala-Mantila, O (1998) The production costs of agricultural products according to farms models (in Finnish with abstract in English) Researchreports222Helsinki: Agricultural Economics Research Institute, pp 6 93 Eurostat (2014) Glossary: Standard output Available at http://eppeurostat eceuropaeu/statistics_explained/indexphp/glossary:standard_output_(so) MTT (2014) Agriculture and horticulture Data: MTT profitability bookkeeping results Availableathttp://wwwmttfi/economydoctor OSF (2014a) Official statistics of Finland Consumer price index Available at http://wwwstatfi/til/khi/index_enhtml OSF (2014b) Official Statistics of Finland Index of purchase prices of the means of agricultural production Availableathttp://wwwstatfi/til/ttohi/index_en html Rantala, J (1997) Costs of dairy production Reports and discussion papers 151 Helsinki: Pellervo Economic Research Institute Riepponen, L (2003) Production costs of milk and cereals on Finnish bookkeeping farms in 1998 2000 (In Finnish with English abstact) Maa- ja elintarviketalous 19 Helsinki: MTT Agrifood Research Finland Available at http://urnfi/urn: ISBN:951-729-736-x Tike (2014) Structural data of agriculture Availableathttp://wwwmaataloustilastot fi 108