Course Title: Advanced Topics in Quantitative Methods: Educational Data Science Practicum



Similar documents
Course Description This course will change the way you think about data and its role in business.

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

Master of Science: Educational Psychology with an emphasis in Information Technology in Education Online Completion

IST565 M001 Yu Spring 2015 Syllabus Data Mining

Econometrics and Data Analysis I

NORTH CAROLINA AGRICULTURAL AND TECHNICAL STATE UNIVERSITY

CS Data Science and Visualization Spring 2016

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

A. Student Learning Outcomes (SLO) for the M.S. Degree: Students will

Advice for Students completing the B.S. degree in Computer Science based on Quarters How to Satisfy Computer Science Related Electives

Course Syllabus. Purposes of Course:

EASTERN ARIZONA COLLEGE Differential Equations

Students who successfully complete the Health Science Informatics major will be able to:

CRN: STAT / CRN / INFO 4300 CRN

QMB 3302 Business Analytics CRN Spring 2015 T R -- 11:00am - 12:15pm -- Lutgert Hall 2209

A Class Project in Survey Sampling

Accelerated Undergraduate/Graduate (BS/MS) Dual Degree Program in Computer Science

Using Web-based Tools to Enhance Student Learning and Practice in Data Structures Course

Learning outcomes. Knowledge and understanding. Competence and skills

DATA MINING FOR BUSINESS INTELLIGENCE. Data Mining For Business Intelligence: MIS 382N.9/MKT 382 Professor Maytal Saar-Tsechansky

Masters in Information Technology

How To Get A Masters Degree In Logistics And Supply Chain Management

Computer Science Electives and Clusters

NON-MATRICULATED STUDENT APPLICATION AND REGISTRATION GUIDE

City University of Hong Kong. Information on a Course offered by Department of Information Systems with effect from Semester B in 2013 / 2014

Information and Decision Sciences (IDS)

Computer Science & Information Security (BS) Program Learning Assessment Assessment Period

USC VITERBI SCHOOL OF ENGINEERING INFORMATICS PROGRAM

Executive Master of Public Administration. QUANTITATIVE TECHNIQUES I For Policy Making and Administration U6310, Sec. 03

Assessment Findings and Curricular Improvements Department of Psychology Undergraduate Program. Assessment Measures

Course Description. Prerequisites. CS-119/119L, Section 0137/0138 Course Syllabus Program Design & Development - Fall 2015

COMPUTER SCIENCE. FACULTY: Jennifer Bowen, Chair Denise Byrnes, Associate Chair Sofia Visa

Assessment for Master s Degree Program Fall Spring 2011 Computer Science Dept. Texas A&M University - Commerce

DIGITAL FORENSICS SPECIALIZATION IN BACHELOR OF SCIENCE IN COMPUTING SCIENCE PROGRAM

Graduate Program Policies and Procedures

GET 114 Computer Programming Course Outline. Contact: Office Hours: TBD (or by appointment)

UNIVERSITY OF DAYTON MANAGEMENT AND MARKETING DEPARTMENT MKT 315: RETAIL MARKETING Course Syllabus Winter 2008, Section 01

Information Systems. Administered by the Department of Mathematical and Computing Sciences within the College of Arts and Sciences.

KENNESAW STATE UNIVERSITY GRADUATE COURSE PROPOSAL OR REVISION, Cover Sheet (10/02/2002)

School of Management and Information Systems

Industrial and Systems Engineering Master of Science Program Data Analytics and Optimization

ANALYTICS CENTER LEARNING PROGRAM

George Mason University Graduate School of Education Special Education Program

Statistics 3202 Introduction to Statistical Inference for Data Analytics 4-semester-hour course

2. SUMMER ADVISEMENT AND ORIENTATION PERIODS FOR NEWLY ADMITTED FRESHMEN AND TRANSFER STUDENTS

UCC1: New Course Transmittal Form

Analysis of the Effectiveness of Online Learning in a Graduate Engineering Math Course

QMB Business Analytics CRN Fall 2015 W 6:30-9:15 PM -- Lutgert Hall 2209

QMB Business Analytics CRN Fall 2015 T & R 9.30 to AM -- Lutgert Hall 2209

Computer Science. General Education Students must complete the requirements shown in the General Education Requirements section of this catalog.

The impact of Discussion Classes on ODL Learners in Basic Statistics Ms S Muchengetwa and Mr R Ssekuma muches@unisa.ac.za, ssekur@unisa.ac.

CS 2302 Data Structures Spring 2015

GRAPHIC DESIGN 1, ART Spring Semester, 2014 Washington Hall, 1st Floor, Lab/Room 158 Mondays and Wednesdays, 5:20 p.m. 7:50 p.m.

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Truman College-Mathematics Department Math 125-CD: Introductory Statistics Course Syllabus Fall 2012

National Masters School in Language Technology

General Procedures for Developing an Online Course

BOR 6335 Data Mining. Course Description. Course Bibliography and Required Readings. Prerequisites

DATA MINING FOR BUSINESS ANALYTICS

INFORMATION SYSTEMS (INFO)

E6895 Advanced Big Data Analytics Lecture 3:! Spark and Data Analytics

LONDON SCHOOL OF COMMERCE. Programme Specification for the. Cardiff Metropolitan University. BSc (Hons) in Computing

MD - Data Mining

Our Raison d'être. Identify major choice decision points. Leverage Analytical Tools and Techniques to solve problems hindering these decision points

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

LeMoyne-Owen College. Division of Natural and Mathematical Sciences Programming in Java II, COSI 225 Spring, 2016

Section A: CURRICULUM, INSTRUCTIONAL DESIGN AND STUDENT ASSESSMENT

Course Syllabus Geography Program Winthrop University Spring GEOG/GEOL 305 Introduction to Geographic Information Systems

El Dorado Union High School District Educational Services

2015 Workshops for Professors

Requirements Fulfilled This course is required for all students majoring in Information Technology in the College of Information Technology.

PROGRAM DIRECTOR: Arthur O Connor Contact: URL : THE PROGRAM Careers in Data Analytics Admissions Criteria CURRICULUM Program Requirements

College of Health and Human Services. Fall Syllabus

OUTLINE FOR AN INTERDISCIPLINARY CERTIFICATE PROGRAM

Syllabus -- Spring 2016 (revised Feb 8, 2016) The syllabus and schedule are subject to change

The Department of Education. in Science and Technology

PCCC PCCC Course Description

GETTING STARTED WITH R AND DATA ANALYSIS

COMPUTER SCIENCE/ COMPUTER NETWORKING AND TECHNOLOGIES (COSC)

BUSA 501: Introduction to Business Analytics

Please consult the Department of Engineering about the Computer Engineering Emphasis.

THE UNIVERSITY OF HONG KONG FACULTY OF BUSINESS AND ECONOMICS

Computer Science. 232 Computer Science. Degrees and Certificates Awarded. A.S. Degree Requirements. Program Student Outcomes. Department Offices

Master of Science Degree in Applied Spatial Information Science. An Inter-Disciplinary Degree at Michigan Technological University

PRIORITY REVIEW GIVEN TO THOSE WHO SUBMIT APPLICATIONS BY JANUARY 5, (EI AND LD ONLY.)

Proposal for New Program: BS in Data Science: Computational Analytics

Syllabus of the Dept. of Applied Statistics EAST West University. Graduate Program

CS 425 Software Engineering. Course Syllabus

Quality Matters Online Course Development and Guidelines

DOVER-SHERBORN HIGH SCHOOL PROGRAM OF STUDIES

Master of Education: Educational Psychology with an emphasis in Educational Psychology Online Completion

Name of the University: University of Wisconsin-Madison Names of the students: Jing Jin Exchange semester: Fall Academic:

Information Technology in Education

Annual General Education Assessment Review

INSTRUCTIONAL TECHNOLOGY

Transcription:

COURSE NUMBER: APSTA- GE.2017 Course Title: Advanced Topics in Quantitative Methods: Educational Data Science Practicum Number of Credits: 2 Meeting Pattern: 3 hours per week, 7 weeks; first class meets Week of March 23, 2015. Course time: TBD Instructor: Y. Bergner Course Description (~250 words or less): This intensive laboratory course will focus on doing data analysis projects with real data selected by the students. The core skills are oriented around first framing good research questions, then having these guide interacting with data of all types and varying quality (e.g., web- scraped, or clickstream- based rather than large national surveys) via visualization, principled modeling and evaluation of models using statistical learning techniques such as regression, classification and clustering, and presentation of results, using reproducible research tools (e.g., knitr, sweave) in the R programming language. Course Notes: Class sessions will be hands- on, involving pair programming in R and short student presentations, leading up to a final project. It will be a flipped class, in the sense that most information download (readings, watching videos, and interactive tutorials) will take place outside of class, lecturing will be minimal, and meetings will be dedicated to discussion and hands- on work. Students are required to be fully prepared to engage interactively in lab every week. This is not a survey course on applied machine learning algorithms, but rather a deep dive into the holistic process of knowledge extraction. Students deciding between APSTA- GE 2110 (Applied Statistics: Using Large Databases in Education) and this course should note the following. o APSTA- GE 2110 uses large national surveys (e.g., ECLS- K, NELS), teaches STATA software scripting skills (e.g., merging databases), and emphasizes best practices in applied statistical analysis of survey data (e.g., weighting and adjusting standard errors in complex surveys). The course pairs nicely with RESCH- GE 2139 (Survey Research) PADM.GP.2902 (Multiple Regression and Introduction to Econometrics). o This course, APSTA- GE 20xx, may use messy (non- tabular) data that is more common to web- derived data, such as clickstreams and web- scraping (processing the text in HTML files), but the type of data may be tabular as well. The tools used in this course are different and the skills needed to use these tools emphasize computer science skills in addition to statistical modeling skills, which also differ (APSTA- GE 2011 (Classification and Clustering) is a prerequisite). Thus, students are expected to have some prior programming experience and will quickly adapt these to learn the R programming

language as well as tools that integrate computer code, statistical analysis, and technical writing (e.g., the knitr library). We recommend this course to advanced PhD, MS- A3SR and MS Data Science students, provided they have prior programming experience and the appropriate prerequisite knowledge. Course Prerequisites/Expectations: Some prior experience with statistics (e.g., APSTA- GE 2003) and some exposure to programming (experience with loops, functions, and data structures, e.g., at the level of NYU s undergraduate course Introduction to Computer Science (CSCI- UA- 0101)) in any language, e.g., C, Java, Python, R); Students are strongly encouraged to take APSTA- GE 2011, Classification & Clustering offered Winter 2015 before taking this course (the classification and clustering topics are central to this course). Students are also encouraged to use the first half of the Spring 2015 term leading up to this course gathering data sources and considering research questions which the data are intended to address. Learning Objectives: By the end of the course, students will be able to: 1. Formulate, contextualize, and defend research questions. 2. Explore and transform educational data sets using R. 3. Understand how regression, classification and clustering methods are applied to educational data and how to evaluate these methods. 4. Generate high quality visualizations. 5. Present methods and results according to professional standards of reproducible research. 6. Learn from and collaborate with others on educational data science projects. Course Format: (Lecture, lab, seminar, recitation or combination) One combined lecture and lab session each week; Spring offering Course Outline (list of lectures/topics each session) Week Topics Lab Activity Advance Preparation 1 Overview; R, knitr, and swirl; Data manipulation visualization Bring a laptop to class! finding and reading data files 2 Descriptive statistics; cleaning data; transforming variables; missing data First 5 minute presentation; descriptive statistics lab Swirl R Prog Alt, parts 1-5. External data mini- project. 3 Regression- based methods Regression trees; Second chance Strobl et al.

4 Classification- based methods; association measures for categorical variables 5 Clustering methods; internal and external quality indices Course Requirements There will be 3 projects, presented and written up individually. Students are not only encouraged to work together, but each time you will be assigned a partner as an internal reviewer. Furthermore, students will write external peer reviews for one midterm and one final project. Projects will be scored on a four- point scale: {strong accept, weak accept, weak reject, strong reject}, and you will not be considered eligible to present the next project until you and your assigned partner have both received strong accepts on your prior presentations (see grading notes below). Second chances are of course permitted. Evaluation for this course will be weighted as follows: Projects o Mini- project 10% o Midterm project 20% o Final project 40% Peer reviews 10% Class participation 20% presentations (SCPs) Midterm project presentation; classification lab Clustering lab; SCPs; project time Read vcd- Tutorial (on CRAN) Steinley 6 Visualization Viz lab; SCPs; project time Tufte; Gelman 7 Presentation and peer review Final presentations Peer review protocol (Peer reviews due within 1 week) ASSIGNMENT AND GRADING DETAILS Projects: The three projects will be assessed for excellence in: quality of the code (well- commented; functional); organization of the code/writing (is the research question clearly stated? Are the data well described? Are the analytic techniques clear and well executed? Is the flow of the presentation well- organized); and reproducibility/flexibility/extendibility of the code (how modular is the design? Could the structure be reused for a slightly different problem?). To receive maximum credit for each project, satisfaction of all three requirements is required. Reviews:

Final and midterm projects will be both peer reviewed (rubric will be provided in class) and rated by the instructor on the same four- point scale of {strong accept, weak accept, weak reject, strong reject}. To emphasize the importance of writing detailed peer reviews, student written reviews will be scored on the scale: 1=inadequate, 2=useful, 3=exemplary. Class Participation: This course is highly interactive, both in terms of working and learning in teams and as a classroom. However, interaction takes a variety of forms, ranging from one- on- one discussions to group presentations, so that different skills are emphasized at different times. The evaluation of class participation uses a flexible scale so that everyone can achieve the highest measure. For each class meeting, 1=present, 2=responsive, 3=active, and the overall participation grade is obtained by summing over the class sessions. The following system can be used to convert evaluation scales used in this course to letter grades: Letter grade Projects Peer review Class participation gradegrade A Strong accept Exemplary Active B Weak accept Useful Responsive C Weak reject Inadequate Present D Strong reject Df Required Readings and/or Text (a partial reading list is acceptable) Gelman A. (2013). Choices in statistical graphics: My stories. Presentation to New York Data Visualization Meetup. URL: http://www.stat.columbia.edu/~gelman/presentations/vistalk_meetup_new_handout.pdf Steinley, D. (2006). K- means clustering: A half- century synthesis. British Journal of Mathematical and Statistical Psychology, 59, 1-34. Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological methods, 14(4), 323. Tufte, E. R. (2001). Graphical Excellence. In The Visual Display of Quantitative Information (pp. 13 51). There will be a number of readings, videos, and tutorials available from the web. Academic Integrity: All students are responsible for understanding and complying with the NYU Steinhardt Statement on Academic Integrity. A copy is available at: http://steinhardt.nyu.edu/policies/academic_integrity.

Students with Disabilities: Students with physical or learning disabilities are required to register with the Moses Center for Students with Disabilities, 726 Broadway, 2nd Floor, (212-998- 4980 and online at http://www.nyu.edu/csd) and are required to present a letter from the Center to the instructor at the start of the semester in order to be considered for appropriate accommodation.