Intro to Data Analysis, Economic Statistics and Econometrics



Similar documents
Descriptive statistics parameters: Measures of centrality

Data Mining in Economic Science

The Real Business Cycle model

The Life-Cycle Motive and Money Demand: Further Evidence. Abstract

Income Distribution Database (

II. DISTRIBUTIONS distribution normal distribution. standard scores

These two errors are particularly damaging to the perception by students of our program and hurt our recruiting efforts.

Exercise 1.12 (Pg )

Econ 303: Intermediate Macroeconomics I Dr. Sauer Sample Questions for Exam #3

Some Essential Statistics The Lure of Statistics

Practical Time Series Analysis Using SAS

Northumberland Knowledge

Introduction to Statistics and Quantitative Research Methods

Probability Using Dice

Inflation. Chapter Money Supply and Demand

Nonparametric tests these test hypotheses that are not statements about population parameters (e.g.,

Final Exam Practice Problem Answers

ADD-INS: ENHANCING EXCEL

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Chapter 12. Aggregate Expenditure and Output in the Short Run

Week 1. Exploratory Data Analysis

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Empirical Project, part 2, ECO 672, Spring 2014

HELP Interest Rate Options: Equity and Costs

Management Accounting Theory of Cost Behavior

Marketing Mix Modelling and Big Data P. M Cain

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Chapter 9 Aggregate Demand and Economic Fluctuations Macroeconomics In Context (Goodwin, et al.)

Topic 1 - Introduction to Labour Economics. Professor H.J. Schuetze Economics 370. What is Labour Economics?

CALCULATIONS & STATISTICS

A Primer on Forecasting Business Performance

Tutorial 5: Hypothesis Testing

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

14.02 Principles of Macroeconomics Problem Set 1 Fall 2005 ***Solution***

Additional sources Compilation of sources:

Sahra Wagenknecht. Hie Limits of Choice. Saving Decisions and Basic Needs in Developed Countries. Campus Verlag Frankfurt/New York

14.02 Principles of Macroeconomics Problem Set 1 *Solution* Fall 2004

How To Write A Data Analysis

The Friedman Test with MS Excel. In 3 Simple Steps. Kilem L. Gwet, Ph.D.

How To Check For Differences In The One Way Anova

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Calculating the Probability of Returning a Loan with Binary Probability Models

Chapter 11: Activity

2007 Thomson South-Western

Optimal Consumption with Stochastic Income: Deviations from Certainty Equivalence

MSc Economics Programme Specification. Course Title MSc Economics.

430 Statistics and Financial Mathematics for Business

In this chapter, you will learn improvement curve concepts and their application to cost and price analysis.

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study

Professor Christina Romer. LECTURE 17 MACROECONOMIC VARIABLES AND ISSUES March 17, 2016

Econometrics Simple Linear Regression

DOES A STRONG DOLLAR INCREASE DEMAND FOR BOTH DOMESTIC AND IMPORTED GOODS? John J. Heim, Rensselaer Polytechnic Institute, Troy, New York, USA

Testing Research and Statistical Hypotheses

Preparation course MSc Business & Econonomics- Macroeconomics: Introduction & Concepts

Practical. I conometrics. data collection, analysis, and application. Christiana E. Hilmer. Michael J. Hilmer San Diego State University

Household Debt in the U.S.: 2000 to 2011

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs

Fairfield Public Schools

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Detection of changes in variance using binary segmentation and optimal partitioning

5.5. Solving linear systems by the elimination method

Study Guide for the Final Exam

Chapter 5 Estimating Demand Functions

Edmonds Community College Macroeconomic Principles ECON 202C - Winter 2011 Online Course Instructor: Andy Williams

The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables. Kathleen M. Lang* Boston College.

STATISTICAL DATA ANALYSIS

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

NEW MEXICO Grade 6 MATHEMATICS STANDARDS

THE IMPACT OF MACROECONOMIC FACTORS ON NON-PERFORMING LOANS IN THE REPUBLIC OF MOLDOVA

Introduction to Macroeconomics TOPIC 2: The Goods Market

Introduction to General and Generalized Linear Models

Introduction to Economics, ECON 100:11 & 13 Multiplier Model

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

THE KEYNESIAN MULTIPLIER EFFECT RECONSIDERED

SAMPLING FOR EFFECTIVE INTERNAL AUDITING. By Mwenya P. Chitalu CIA

ECON 3312 Macroeconomics Exam 3 Fall Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Inequality, Mobility and Income Distribution Comparisons

Example: Boats and Manatees

Beads Under the Cloud

REVIEW ONE. Name: Class: Date: Matching

PRINCIPLES OF ECONOMICS. Tomáš Hanák

PROFESSIONAL F O R E C A S T E R S

Chapter 7. One-way ANOVA

Course Syllabus/Content Diploma in Business Management

The Framework paper presented the following summary of projected program and real resource costs for the Job Corps:

5.4 Solving Percent Problems Using the Percent Equation

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

GMAT.cz GMAT (Graduate Management Admission Test) Preparation Course Syllabus

DATA ANALYSIS FOR MANAGERS. Harry V. Roberts Graduate School of Business University of Chicago

Simple Regression Theory II 2010 Samuel L. Baker

Overview Accounting for Business (MCD1010) Introductory Mathematics for Business (MCD1550) Introductory Economics (MCD1690)...

Spatial panel models

Chapter 9. The Valuation of Common Stock. 1.The Expected Return (Copied from Unit02, slide 39)

What explains modes of engagement in international trade? Conference paper for Kiel International Economics Papers Current Version: June 2011

GUIDANCE FOR ASSESSING THE LIKELIHOOD THAT A SYSTEM WILL DEMONSTRATE ITS RELIABILITY REQUIREMENT DURING INITIAL OPERATIONAL TEST.

MACROECONOMICS II INFLATION, UNEMPLOYMENT AND THE PHILLIPS CURVE

Transcription:

Intro to Data Analysis, Economic Statistics and Econometrics Statistics deals with the techniques for collecting and analyzing data that arise in many different contexts. Econometrics involves the development and use of special statistical methods within a framework that is consistent with the ways of economic analysis. Econometric analysis starts from a statement about some behavioral relationship, which might come from economic theory, or simple observation of reality. Then it is developed into an equation that specifies how one variable is determined by the values of other variables. Let s start with a simple example It is coherent to believe that there is some relationship between consumption and income, in economic theory is widely believed that this relationship is fairly stable at the aggregate level. We can write this relationship as follows: C β 0 β 1 Y u (1) 1

C represents consumption, Y represents income, and they are the variables in this equation. β 0 and β 1 are the coefficients, and they are unknown constants or parameters. The disturbance term, u represents all the factors on top of income, that could help determine consumption that we choose not to include or think they are not relevant enough. This term can also include pure chance and error in the measurement of C. This behavioral equation is a model of the economic process that determines the level of consumption in the economy. It is obviously abstract and simplified given the complexity of reality. In most cases in econometrics we are going to assume our models are correctly specified, in some cases we will be able to test them. If the model is incorrect in its essence all our conclusions would be wrong. Econometricians face the challenging task of blending knowledge of economic theory and statistics to produce models that reflect, as correctly as possible, the complex economic reality. 2

The main task of the econometrician when faced with a model like (1) is to try to estimate the unknown parameters of interest, β 0 and β 1. For example we could use data on these variables and statistical techniques we will describe later on to get that β 0 0 568 and β 1 0 907. Then we can write the estimated model as C 0 568 0 907 Y (2) The interesting thing here is to tie this back to the economic model. So we interpret the estimated value of β 1 as the marginal propensity to consume, which theoretically is believed to be less than 1. Can we conclude from the result in (2) that the theory is correct? We have to remember that we have only computed an estimate of β 1. In order to confirm or reject the hypothesis implied by the theory we need to study questions regarding errors associated to the estimation process, we need to study more to be able to answer this question. 3

Some useful terminology Data are the quantitative facts or pieces of information that we deal with in any statistical analysis. When taken together with all the other data we use in a particular analysis they constitute a data set. When we study an economic process, the data present information about a set of cases of the occurrence of that process, these cases are called observations, and the nature of these cases define what we call unit of observation. For example in our initial example, if we understand the relationship at the macroeconomic level, the observations might be different years of a particular economy, or they might be information about different countries. If we understand the relationship between consumption and income as occurring at the microeconomic level, then the unit of observation is going to be the family, or the individual. The observations will consist of data for the same family over time, or for different families. 4

A variable is a measurable characteristic about which we collect data. For example when studying consumption at the micro level this might be: consumption, income, family size, the number of children, county of residence, whether all adults in the household are working, etc. Notice that some of them are discrete (take on a limited number of values) and others are continuous (can assume any value in a given range). All these concepts come together to form a rectangular data set, were we have a relatively large number of observations, n, which are arranged in the rows, and a number of variables measuring some characteristic of each of our units of observations, arranged in the columns. A given set of observations is often thought of as being a sample drawn from some larger population. In many applications it is important for the researcher to have access to a representative sample, so the conclusions can be considered valid for everyone in that population. 5

It is important to distinguish between Cross-section data and Time-Series data. Cross-section data arise when the observations are data for different entities (persons, firms, countries) for which a common set of variables is measured at about the same point in time. For example, the data collected in the U.S. decennial Census are a cross-section, with information on income, age, family size, etc. The data on unemployment rates, inflation rates and labor force participation rates of individuals 15 to 64, in all the U.S. States in a given year are also a Cross-section. Time-series data arise when the observations are data for the same agent in different periods of time. For example, records of a person s employment and earnings in each year of his life. Data on unemployment rates in a particular state since the 1960s. 6

Before moving into specific measures used to describe data a couple of additional issues should be discussed. Change and Growth: When trying to describe how a variable changes over time we can use two different concepts. The absolute change or growth, which is simply the difference between a particular period s value and the previous one. For a time-series variable Y the absolute growth in the i th period is given by the difference Y i Y i Y i 1 (3) notice that the result might be negative. Then we can compute the relative change, which relates the absolute change in a variable to its previous level. Relative change or growth in the i th period is measured by the rate of change or the rate of growth r i Y i Y i 1 Yi Y i 1 Y i 1 Y i Y i 1 1 (4) Notice that rate here has the meaning of a measure of relative change. Although the absolute growth concept is easier we usually rely more on rate of change or rates of growth to describe economic data. Constructed Variables: It is extremely common in economics to construct variables as a combination of other variables. We construct indexes to reflect the average of an underlying set of prices or quantity variables in each year. For example the Dow Jones Index, the Consumer Price Index (CPI), the Index of Industrial Output, the Index of Consumer Confidence, etc. 7

Data Analysis: Measures of Central Tendency The Mean or Average : by far the most common measure of central tendency. It is defined as the sum of the values of the variable divided by the number of observations: N i 1 x i N µ (5) when we have the whole population N. In a given sample: n i 1 where usually n N. x i n X sometimes denoted as ˆµ (6) In a simple example, imagine we have the incomes of five individuals on campus: $15,000; $35,000; $45,000; $79,000; $120,000. The mean of these incomes is $58,800. It is easy to compute and easy to understand. However, it has disadvantages: for example is very sensitive to extreme values or outliers. In our example if the highest salary is replaced with say (do you know whose salary this is?) $225,000, then the average is $79,800, which is much larger although only one observation has changed. Notice that the mean is in general a value that is not in the data. However, one advantage is that it has a clear mathematical formulation, and it is very stable across different samples of a given population. 8

The Median: it is less common but in many cases a better way to describe a variable. It is the value of the middle observation on X, after the observations have been ordered from smallest to largest. In other words, 50% of the observations are above this number, and 50% are below, that is why it is also called the 50 th percentile. In our example above the median is $45,000 (notice that this is regardless of whether the highest number is $120,000, or $ 225,000. If we have an even number of observations, then we have to compute the average of the two values in the middle. In the example above if we add the $225,000 to the original 5 values, the median would be $62,000. Notice that the median is not affected by extreme values, but it is less stable when comparing different samples of the same population. Notice also the fact that in this example the median is considerably smaller than the mean, given by the fact that the set of higher salaries are comparatively further away from the median than the lower salaries. The distribution is not symmetric. The Mode: It is the most frequently occurring value. The mode is sometimes a useful measure for identifying the typical value of a discrete variable, such as family size, but it is not too useful for continuous variables because every observation s value may be unique. For our example, all values happen equally often. It has the advantage of actually occurring in the data. 9