CS&SS / STAT 566 CAUSAL MODELING



Similar documents
Module 223 Major A: Concepts, methods and design in Epidemiology

Econometrics and Data Analysis I

Fairfield Public Schools

University of Alaska Anchorage Department of Economics Spring 2015 Course Syllabus

Stats 202 Data Analysis Project Winter 2016

RR765 Applied Multivariate Analysis

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

BIOM611 Biological Data Analysis

SPPH 501 Analysis of Longitudinal & Correlated Data September, 2012

ECON 523 Applied Econometrics I /Masters Level American University, Spring Description of the course

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Data management and SAS Programming Language EPID576D

1 Goals & Prerequisites

Business Analytics (AQM5201)

2013 MBA Jump Start Program. Statistics Module Part 3

EC 331: Research in Applied Economics

UNIVERSITY OF MICHIGAN SCHOOL OF INFORMATION SI301: Models of Social Information Processing Syllabus

Brown University Department of Economics Spring 2015 ECON 1620-S01 Introduction to Econometrics Course Syllabus

NORTHWESTERN UNIVERSITY Department of Statistics. Fall 2012 Statistics 210 Professor Savage INTRODUCTORY STATISTICS FOR THE SOCIAL SCIENCES

STAT 2080/MATH 2080/ECON 2280 Statistical Methods for Data Analysis and Inference Fall 2015

QMB Business Analytics CRN Fall 2015 T & R 9.30 to AM -- Lutgert Hall 2209

PH 7525 Introduction to Data & Statistical Packages Course Reference #: Spring 2011

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

STAT 1403 College Algebra Dr. Myron Rigsby Fall 2013 Section 0V2 crn 457 MWF 9:00 am

EViews, in Economics Department Computer Lab and computer labs on campus. TEXT CHAPTER DATE (Week) TOPIC (End of Chapter Problems)

QMB 3302 Business Analytics CRN Spring 2015 T R -- 11:00am - 12:15pm -- Lutgert Hall 2209

PSY 311: Research Methods in Psychology I (FALL 2011) Course Syllabus

QMB Business Analytics CRN Fall 2015 W 6:30-9:15 PM -- Lutgert Hall 2209

STAT 360 Probability and Statistics. Fall 2012

MATH 1900, ANALYTIC GEOMETRY AND CALCULUS II SYLLABUS

Student Writing Guide. Fall Lab Reports

Department/Academic Unit: Public Health Sciences Degree Program: Biostatistics Collaborative Program

COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK DEPARTMENT OF INDUSTRIAL ENGINEERING AND OPERATIONS RESEARCH

Advanced Statistics & Data Analysis

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

Executive Master of Public Administration. QUANTITATIVE TECHNIQUES I For Policy Making and Administration U6310, Sec. 03

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Organizing Your Approach to a Data Analysis

Simple Linear Regression

Writing a Scientific Research Paper

POL 204b: Research and Methodology

Research Methods in Advertising and Public Relations COMM 420 Spring Earth & Eng. Sci. W/F 12:20 PM to 2:15 PM

BIO 226: APPLIED LONGITUDINAL ANALYSIS COURSE SYLLABUS. Spring 2015

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

RUTHERFORD HIGH SCHOOL Rutherford, New Jersey COURSE OUTLINE STATISTICS AND PROBABILITY

Social Work Statistics Spring 2000

Introduction to Data Science: CptS Syllabus First Offering: Fall 2015

School of Public Health and Health Services Department of Epidemiology and Biostatistics

The learning system used in ECE 1270 has been designed on the basis of the principles of

Statistics 3202 Introduction to Statistical Inference for Data Analytics 4-semester-hour course

UNIVERSITY OF SOUTHERN CALIFORNIA Marshall School of Business BUAD 425 Data Analysis for Decision Making (Fall 2013) Syllabus

Multiple Regression: What Is It?

Descriptive Statistics

NIKE Case Study Solutions

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

BUAD 310 Applied Business Statistics. Syllabus Fall 2013

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Economic Statistics (ECON2006), Statistics and Research Design in Psychology (PSYC2010), Survey Design and Analysis (SOCI2007)

Social Marketing. MGT 3250Y Fall 2013 Fridays 6:00 8:50 p.m. Room: S4037.

PA 750: Financial Management in Public Service Tuesday, 6:00-8:45 pm DTC Lab 617

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

Techniques of Statistical Analysis II second group

Content Sheet 7-1: Overview of Quality Control for Quantitative Tests

Faculty of Science School of Mathematics and Statistics

De Anza College Introduction to Microeconomics Winter 2016 ONLINE Don Uy-Barreta

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

Syllabus MAC1105 College Algebra

IN THE CITY OF NEW YORK Decision Risk and Operations. Advanced Business Analytics Fall 2015

Module 3: Correlation and Covariance

Chapter 1 Introduction. 1.1 Introduction

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

COURSE SYLLABUS. Office Hours: MWF 08:30am-09:55am or by appointment, DAV 238

Faculty of Science Course Syllabus Department of Mathematics and Statistics Life Contingencies II ACSC/STAT 4720 Fall 2015

Fall Lecture: MWRF: 12 noon - 12:50 p.m. at Moulton 214. Lab at Moulton 217: Sec. 2: M 9 a.m. - 11:50 a.m.; Sec. 3: M 2 p.m. - 4:50 p.m.

Analyzing data. Thomas LaToza D: Human Aspects of Software Development (HASD) Spring, (C) Copyright Thomas D. LaToza

COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences Academic Year Qualification.

MATH 4470/5470 EXPLORATORY DATA ANALYSIS ONLINE COURSE SYLLABUS

Master of Management BAHR580D: Business Communications Course Outline

Fall, Alternating: Friday, 11 am - 2 pm Tuesday, 11 am - 1 pm. Location, NB. Advance confirmation always preferred;

WHAT IS A JOURNAL CLUB?

MTH 420 Re-examining Mathematical Foundations for Teachers. Fall 2015

ISMT527 - SPRING 2003 DATA MINING TOOLS AND APPLICATIONS

CS 425 Software Engineering

MIS 424 COURSE OUTLINE

Chapter Four. Data Analyses and Presentation of the Findings

Geostatistics Exploratory Analysis

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS PROJECT SCHEDULING W/LAB ENGT 2021

DSCI 3710 Syllabus: Spring 2015

The University of Texas at Austin School of Social Work SOCIAL WORK STATISTICS

Anglo americká vysoká škola, o.p.s.

NRRT 765 Applied Multivariate Analysis

Decision Sciences Data Analysis for Managers

CORRELATION ANALYSIS

2015 TUHH Online Summer School: Overview of Statistical and Path Modeling Analyses

MKTG 330 FLORENCE: MARKET RESEARCH Syllabus Spring 2011 (Tentative)

Online Courses: During the Course

To: GMATScore GMAT Course Registrants From: GMATScore Course Administrator Date: January 2006 Ref: SKU ; GMAT Test Format & Subject Areas

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Transcription:

CS&SS / STAT 566 CAUSAL MODELING Instructor: Thomas Richardson E-mail: thomasr@u.washington.edu Lecture: MWF 2:30-3:20 SIG 227; WINTER 2016 Office hour: W 3.30-4.20 in Padelford (PDL) B-313 (or by appointment) Teaching Assistant: Wen Wei Loh E-mail: wloh@u.washington.edu Lab Time:F 3:30-4:20 SAV 117 Web-site: www.stat.washington.edu/tsr/s566 Overview This course provides an introduction to formal approaches to causality, based on potential outcomes (aka counterfactuals) and graphs. These provide precise answers to questions such as: In what way does information coming from a randomized experiment differ from that resulting from a purely observational study? If subjects in a randomized experiment don t comply with the treatment to which they were assigned, how (if at all) should this influence the conclusions that we draw? What is endogeneity and how should it be adjusted for? What is mediation? How do we define a direct effect? Can the measurement of background variables or covariates help to strengthen inferences drawn from observational data? If so, which extra variables should be measured, and how should they be taken into account? What is confounding? When should variables be adjusted for? How can matching be used to adjust for confounding? What is the relationship between causal models or hypotheses and statistical models or hypotheses? A central theme of this course will be that in order to properly answer these and other questions it is necessary to use a formal theory of causation; intuition alone is neither a safe nor a reliable guide. A causal theory aims to provide answers to causal questions in the same way that probability theory provides a formal method of answering questions about chance. Methods and software will be presented that are based on these theoretical approaches. 1

Causal methods in the Social Sciences Among other models, we will look at the method of instrumental variables, originating in Economics, and structural equation models (SEM) which are widely used in the Social Sciences. We will consider some of the following questions: What is an instrumental variable? What makes a SEM different from a regression model? What is being assumed when a structural equation model is constructed? Can all of these assumptions be tested? Given enough data will we ever/always be able to reject a SEM model which is false? Are there principled ways of searching for SEM models? Texts: Causality: Models, Reasoning and Inference. J. Pearl, Cambridge University Press, Cambridge, 2 nd Edition, 2009. Mostly Harmless Econometrics. J.D. Angrist, J.-S. Pischke Princeton University Press, 2008. Counterfactuals and Causal Inference. S.L. Morgan and C. Winship, Cambridge University Press, 2007. Supplementary Texts: Single World Intervention Graphs (SWIGs): A Unification of the Counterfactual and Graphical Approaches to Causality: A primer. Thomas Richardson and James Robins. Second UAI Workshop on Causal Structure Learning, Bellevue, Washington, 07/15/2013. http://www.statslab. cam.ac.uk/~rje42/uai13/richardson.pdf Causal Inference for Statistics, Social, and Biomedical Sciences. An Introduction. Guido W. Imbens and D. B. Rubin, Cambridge University Press, 2015. Credit Credit will be assigned on the basis of Homework (80%); Project (20%). Students taking the course for credit will be required to do homework exercises, complete a project and make a poster presentation. (For more details on the project see below.) Students will be asked to come up with topics and datasets for projects. 2

If your dataset is used for a project I will add 0.1 to your grade (assuming it is < 4.0!). Poster presentations will be scheduled during the last week of classes and/or exam week. If insufficient datasets are available, I will give a final exam. Pre-requisites The pre-requisite for this class is one prior class in Statistics. I will try to briefly review most concepts before using them. The course will presume some familiarity with basic mathematical concepts, such as what constitutes a proof. If you are having trouble following material that is being presented please let me know as soon as possible (in class, by coming to office hours, or via e-mail). I am always happy to provide additional examples, or clarifications, if requested. (I am not a mind-reader, so please let me know if it would be helpful!) Registration Owing to limited spaces in the class, if you do not show up during the first week you may be dropped in order to allow others to register. Outline coverage of material (by week) 1. Motivation; Counterfactuals & Causal Effects 2. Graphs and independence: d-separation; 3. Graphs and counterfactuals; 4. Presentation of data-sets Measures of dependence: Odds Ratios, Risk Ratios, Risk differences 5. Imperfect Experiments and Instrumental Variables Handling Non-Compliance 6. Direct Effects and Mediation 7. Structural Equation Models (SEMs) Understanding the constraints implied by SEMs 8. When does a regression coefficient give a causal effect? 9. Handling Confounding when structure is known: non-parametric identification and the do-calculus 10. When little is known about causal structure: exploratory search. 3

Assigned Readings Week 1: Introduction to Counterfactuals Morgan and Winship: Section 1.1; Sections 2.1-2.5; Angrist and Pischke: Chapter 1, and Chapter 2.1-2.2; Projects The group project counts for 20% of the grade. Groups will be assigned by the instructor according to criteria including the presence of someone with a proposed topic in each group and diversity of disciplines, among other criteria. Typical projects include: Applying methods of causal inference to a data-set; Investigating methods of causal inference, either analytically, or via a simulation study; Critique/summary of a recent research paper dealing with causality. Please start thinking now about possible datasets for projects. Projects from your own research, lab or department are highly appropriate. Any data should be available for analysis, i.e. the data should now be accessible, and analysis by a project group of two students from the class (including yourself) should be allowed by the owners of the data. If your dataset is used for a project I will add 0.1 to your final grade (assuming it is < 4.0!). Please contact me to talk about possible projects. All proposed projects will be presented in two-minute talks in class on Wednesday, January 20. After that, I will ask you for your preferences and assign project groups. I will be happy to meet with you to discuss possible approaches and software that is available (both before and after projects are proposed.) There are examples of projects from the past few years on the class web-site. The components of the project are as follows: 1. A one-page topic statement. Due Friday, January 29 in class. (5%). 2. A five minute presentation (no more!) of your chosen topic on February 8 and/or 10 in class. (10%). 3. Presentation of final results (40%). I will arrange one and possibly two poster sessions at a time TBD. I request that both team members participate in at least one presentation. 4

4. A written report of no more than 15 pages (including all materials, and in 12 point font size with one-inch margins on all sides). Due Thursday, 17 March at 2.30pm by email. Counts for 45%. Additional details for each component: One-page statement: This should include your names, the title of the project, a brief description of the problem and a few key references. It should state clearly what the data are and what the causal questions of interest are. It should not exceed one page. Please send me this by email (pdf format). Initial presentation (five minutes): Based on the one-page statement. I recommend that you prepare two to four slides, and/or provide a handout. You should also rehearse the presentation, with an emphasis on timing. Final poster presentation: This should present and summarize the problem or subject that you analyzed. your main results, and some idea of what you did. Written report: This should summarize the problem, the causal questions that you investigated, and your analysis. Do not include large amounts of raw computer output in this report (the 15-page limit includes all output, graphs and appendices). If computer results or plots are relevant to your argument, include them in the main body of the report, labeling them as Table 1, Figure 1, etc. A suggested outline for the report is: (a) state the problem and the questions being asked, include enough detail to be self-contained; (b) state your methods of analysis; (c) present your results; (d) describe/discuss your conclusions. 5