Big Data in Education



Similar documents
Performance Measures for Machine Learning

Evaluation & Validation: Credibility: Evaluating what has been learned

Enhancements to State Reports

EARLY CHILDHOOD/KINDERGARTEN/ELEMENTARY/INTERMEDIATE PRINCIPAL

Methodologies for Evaluation of Standalone CAD System Performance

Progress and Promise A report on the Boston Pilot Schools Executive Summary January 2006

Data and Analysis. Informatics 1 School of Informatics, University of Edinburgh. Part III Unstructured Data. Ian Stark. Staff-Student Liaison Meeting

FRAUD DETECTION IN MOBILE TELECOMMUNICATION

How School Librarians. Help Kids Achieve. Standards. The Second Colorado Study. Keith Curry Lance Marcia J. Rodney Christine Hamilton-Pennell

Copyright 2006, SAS Institute Inc. All rights reserved. Predictive Modeling using SAS

Year Four Evaluation of City Year Greater Philadelphia

Knowledge Discovery and Data Mining

The Competence of Finnish Nurse Educators

PREDICTING SUCCESS IN THE COMPUTER SCIENCE DEGREE USING ROC ANALYSIS

Use of the Chi-Square Statistic. Marie Diener-West, PhD Johns Hopkins University

Colorado Springs School District 11

Principal & Teacher Evaluation in Illinois: Past, Present & Future 10/18/2011 1

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

EXAMINATION POINT SCORES USED IN THE 2014 SCHOOL AND COLLEGE PERFORMANCE TABLES

Home Schooling Achievement

OREGON STUDENT ACCOUNTING MANUAL

Online Performance Anomaly Detection with

Hispanic or Latino Student Success in Online Schools

Performance Measures in Data Mining

Organizing Your Approach to a Data Analysis

Mapping State Proficiency Standards Onto the NAEP Scales:

How To Cluster

Introduction: Online school report cards are not new in North Carolina. The North Carolina Department of Public Instruction (NCDPI) has been

HIGHLY QUALIFIED TEACHERS (M)

REPORT TO THE HOUSE COMMITTEE ON EDUCATION OF THE LOUISIANA LEGISLATURE

California Standards Implementation. Teacher Survey

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Appendix E. Role-Specific Indicators

Committee On Public Secondary Schools. Standards for Accreditation

Data Housed at the North Carolina Education Research Data Center

NEW YORK STATE EDUCATION DEPARTMENT. Deloitte Consulting. TEACH School District Training Guide

ROC Curve, Lift Chart and Calibration Plot

Analysis of Site-level Administrator and Superintendent Certification Requirements in the USA (2010)

About the Finalist. Cumberland County Schools North Carolina DISTRICT PROFILE

Item: 2014 NC Teacher Working Conditions Survey Presentation

Benchmarking default prediction models: pitfalls and remedies in model validation

KANSAS ASSOCIATION OF SCHOOL BOARDS 1420 SW Arrowhead Road, Topeka, Kan

Carol J. Kaffenberger, Ph.D. George Mason University. Vivian V. Lee, Ed. D. National Office for School Counselor Advocacy College Board

Bar Charts, Histograms, Line Graphs & Pie Charts

Charles Secolsky County College of Morris. Sathasivam 'Kris' Krishnan The Richard Stockton College of New Jersey

South Carolina State Report Card

TENNESSEE BASIC EDUCATION PROGRAM 2.0

GTFS: GENERAL TRANSIT FEED SPECIFICATION

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

A descriptive analysis of state-supported formative assessment initiatives in New York and Vermont

Frequently Asked Questions Contact us:

Technology Plan Components

VIRTUAL REALITY GAME CONTROLLED WITH USER S HEAD AND BODY MOVEMENT DETECTION USING SMARTPHONE SENSORS

Master of Science in Education Educational Administration

Fridley Alternative Compensation Plan Executive Summary

An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework

State Technology Report 2008

Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

Mexico. While 15-year-old Mexicans are doing better in school. enrolment rates for 15-19year-olds remain very low.

TIgeometry.com. Geometry. Angle Bisectors in a Triangle

Predictive Analytics for Donor Management

Probability analysis of blended coking coals

The Professional Development Needs of Teachers in SAGE Classrooms: Survey Results and Analysis. Alex Molnar Glen Wilson Daniel Allen Sandra Foster

What is Data Science? Data, Databases, and the Extraction of Knowledge Renée November 2014

Performance Metrics for Graph Mining Tasks

Measures of diagnostic accuracy: basic definitions

Multiple Measures. by Victoria L. Bernhardt

Successful Family Engagement in the Classroom:

Transcription:

Big Data in Education Alex J. Bowers, Ph.D. Associate Professor of Education Leadership Teachers College, Columbia University April 24, 2014

Schutt, R., & O'Neil, C. (2013). Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly.

Schutt, R., & O'Neil, C. (2013). Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You re dealing with Big Data when you re working with data that doesn t fit into your computer unit. Note that makes it an evolving definition: Big Data has been around for a long time.... Today, Big Data means working with data that doesn t fit in one computer. p.24

Big Analysis in Education Leadership

You re dealing with Big Data when you re working with data that doesn t fit into your computer unit. Note that makes it an evolving definition: Big Data has been around for a long time.... Today, Big Data means working with data that doesn t fit in one computer. p.24 Schutt, R., & O'Neil, C. (2013). Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. A data scientist is someone who knows how to extract meaning from and interpret data, which requires both tools and methods from statistics and machine learning, as well as being human Once she gets the data into shape, a crucial part is exploratory data analysis, which combines visualization and data sense She ll communicate with team members, engineers, and leadership in clear language and with data visualizations so that even if her colleagues are not immersed in the data themselves, they will understand the implications. p.16

Data scientist surpasses statistician on Google Trends Statistician Data Scientist December 18, 2013 http://flowingdata.com/2013/12/18/data-scientist-surpasses-statistician-on-google-trends/

Mike Bostock & D3: github.com/mbostock github.com/mbostock/d3/wiki/gallery

http://ericksondata.com/wp/2012/venn-diagram-of-data-science/ Kris Erickson

But what do school practitioners (the users) actually say that they need?

But what do school practitioners (the users) actually say that they need? Brocato, K., Willis, C., Dechert, K. (2014) Longitudinal Data Use: Ideas for District, Building and Classroom Leaders. In A.J. Bowers, A.R. Shoho, B. G. Barnett (Eds.) Using Data in Schools to Inform Leadership and Decision Making (forthcoming). Charlotte, NC: Information Age Publishing Inc.

Brocato, K., Willis, C., Dechert, K. (2014) Longitudinal Data Use: Ideas for District, Building and Classroom Leaders. In A.J. Bowers, A.R. Shoho, B. G. Barnett (Eds.) Using Data in Schools to Inform Leadership and Decision Making (forthcoming). Charlotte, NC: Information Age Publishing Inc.

What components of a statewide longitudinal data system are needed for that system to best meet the respective needs of superintendent, principal and teacher leaders? Brocato, K., Willis, C., Dechert, K. (2014) Longitudinal Data Use: Ideas for District, Building and Classroom Leaders. In A.J. Bowers, A.R. Shoho, B. G. Barnett (Eds.) Using Data in Schools to Inform Leadership and Decision Making (forthcoming). Charlotte, NC: Information Age Publishing Inc.

What components of a longitudinal data system are needed for that system to best meet the respective needs of superintendent, principal, and teacher leaders? Superintendents Principals Teachers Longitudinal Data System

Superintendents (n=55) Individual Student/Teacher Teacher data: performance, attendance, past experience Who rides the bus Record of parent conferences Test scores for students as well as whether or not they have met growth Comparative All district s comparative test data, district demographics Access test data, budget information, comparing per pupil expenditures, demographic data for the local area as compared to the region and state Student growth from year to year, demographic group growth

Principals (n=45) Teacher Data Teachers past performance and years of experience Background of teachers and records showing student test scores broken down by teacher Student and teacher achievement and growth data, teacher evaluation information, student attendance and dropout data, staff attendance, accountability information Student Data Achievement information on teachers based on student performance Background of teachers being hired; teacher performance data Are we able to keep high performing teachers in our district?

Teachers (n=193) Brocato, Willis & Dechert (2014) p.5

Teachers continued Brocato, Willis & Dechert (2014) p.6

What components of a longitudinal data system are needed for that system to best meet the respective needs of superintendent, principal, and teacher leaders? Figure 1. Summary of eight themes and corresponding frequency percentage in the more frequent level of category

Which stages of the Data Science Process do we do well in education? Where are the challenges? Schutt, R., & O'Neil, C. (2013). Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly.

K-12 Schooling Outcomes Dropout risk Accuracy, precision, sensitivity & specificity Longitudinal patterns in teacher assigned grades Data driven decision making (3DM) Targeted and data-informed resource allocation to improve schooling outcomes Bowers, A.J., Sprott, R., Taff, S.A. (2013) Do we Know Who Will Drop Out? A Review of the Predictors of Dropping out of High School: Precision, Sensitivity and Specificity. The High School Journal.96(2), 77-100. http://hdl.handle.net/10022/ac:p:21258

Issues of Accuracy in Predicting High School Dropout Dropping out of high school in the U.S. is associated with a multitude of negative outcomes. However, other than a select number of demographic and background variables, we know little about the accuracy of current dropout predictors that are school-related. Current predictions accurately predict only about 50-60% of students who actually dropout. According to Gleason & Dynarski (2002), accurate prediction of who will dropout is a resource and efficiency issue: A large percentage of students are mis-identified as at-risk. A large percentage of students who are at-risk are never identified. Current reporting of dropout flags across the literature is haphazard. Almost none report accuracy Many report specificity or sensitivity, but rarely both A dropout flag may be highly precise, in that almost all of the students with the flag dropout, but may not be accurate since the flag may identify only a small proportion of the dropouts. Balfanz et al. (2007); Bowers (2010); Gleason & Dynarski (2002); Pallas (2003); Rumberger (2004)

Re-analyzing Past Dropout Flags for Accuracy, Precision, Sensitivity and Specificity Literature search. Queried multiple databases: JSTOR, Google Scholar, EBSCO, Educational Full Text Wilson Web Studies were included that: Were published since 1979 Examined High School dropout Examined school-wide characteristics and included all students A focus on the student level Reported frequencies for re-analysis Initially yield 6,434 overlapping studies 301 studies were read in full 140 provided school-wide samples and quantifiable data 36 articles provided enough data for accuracy re-calculations Yield 110 separate dropout flags Relative Operating Characteristic (ROC) Hits versus False-Alarms

Predictor Event table for calculating dropout contingency proportions Event Dropout Graduate Dropout a True-positive (TP) Correct b False-positive (FP) Type I Error a+b Graduate c False-negative (FN) Type II Error d True-negative (TN) Correct c+d a+c b+d a+b+c+d=n Precision = a/(a + b) Positive Predictive Value True-Positive Proportion = a/(a + c) Sensitivity True-Negative Proportion = d/(b + d) Specificity False-Positive Proportion = b/(b + d) 1-Specificity Hits False Alarms Fawcett (2004); Hanley & McNeil (1982); Swets (1988); Zwieg & Campbell (1993)

An example of the true-positive proportion plotted against the false-positive proportion for Balfanz et al. (2007) comparing the relative operating characteristics (ROC) of each dropout flag. True-positive proportion (Sensitivity) 1.0 Perfect prediction 0.8 Better prediction 0.6 Worse prediction 0.4 Low attendance 0.2 0 0 0.2 0.4 0.6 0.8 1.0 False-positive proportion (1-Specificity) Bowers, A.J., Sprott, R., Taff, S.A. (2013)

Relative operating characteristics (ROC) of all dropout flags reviewed, plotted as the true-positive proportion against the false-positive proportion. Numbers refer to dropout indicator IDs. True-positive proportion (Sensitivity) 1.0 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1.0 False-positive proportion (1-Specificity) Bowers, A.J., Sprott, R., Taff, S.A. (2013)

Relative operating characteristics (ROC) of all dropout flags reviewed, plotted as the true-positive proportion against the false-positive proportion. Numbers refer to dropout indicator IDs. True-positive proportion (Sensitivity) 1.0 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1.0 False-positive proportion (1-Specificity) Bowers, A.J., Sprott, R., Taff, S.A. (2013)

Relative operating characteristics (ROC) of all dropout flags reviewed, plotted as the true-positive proportion against the false-positive proportion. Numbers refer to dropout indicator IDs. True-positive proportion (Sensitivity) 1.0 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1.0 False-positive proportion (1-Specificity) Bowers, A.J., Sprott, R., Taff, S.A. (2013)

http://itp.wceruw.org/documents/knowlesdropoutearlywarningsystemforitpjan2013.pdf Knowles, J. (2014)

Knowles, J. (2014) http://itp.wceruw.org/documents/knowlesdropoutearlywarningsystemforitpjan2013.pdf