Codewebs: Scalable Homework Search for Massive Open Online Programming Courses

Similar documents

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Programming Exercise 3: Multi-class Classification and Neural Networks

Learning Program Embeddings to Propagate Feedback on Student Code

Supervised Learning (Big Data Analytics)

Making Sense of the Mayhem: Machine Learning and March Madness

Introduction to Logistic Regression

Journée Thématique Big Data 13/03/2015

COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers

Knowledge Discovery from patents using KMX Text Analytics

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

Designing Programming Exercises with Computer Assisted Instruction *

Introduction to Machine Learning Using Python. Vikram Kamath

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

MasteryTrack: System Overview

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

Big Data Analytics CSCI 4030

Mining a Change-Based Software Repository

COS 116 The Computational Universe Laboratory 9: Virus and Worm Propagation in Networks

Data Mining Practical Machine Learning Tools and Techniques

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Machine Learning over Big Data

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DYNAMIC QUERY FORMS WITH NoSQL

Cassandra. References:

Using Adaptive Random Trees (ART) for optimal scorecard segmentation

Sanjeev Kumar. contribute

Beads Under the Cloud

Fall 2012 Q530. Programming for Cognitive Science

Machine Learning and Data Mining -

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu

Checking for Dimensional Correctness in Physics Equations

Big Data Text Mining and Visualization. Anton Heijs

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014

Learning is a very general term denoting the way in which agents:

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Chapter 12. Introduction. Introduction. User Documentation and Online Help

Machine Learning Big Data using Map Reduce

A Python Tour: Just a Brief Introduction CS 303e: Elements of Computers and Programming

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Keywords data mining, prediction techniques, decision making.

Lecture 8 February 4

Recovering Business Rules from Legacy Source Code for System Modernization

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Machine Learning Final Project Spam Filtering

Instant SQL Programming

Search and Information Retrieval

How To Use Neural Networks In Data Mining

SPECIFICATION BY EXAMPLE. Gojko Adzic. How successful teams deliver the right software. MANNING Shelter Island

MONITORING AND DIAGNOSIS OF A MULTI-STAGE MANUFACTURING PROCESS USING BAYESIAN NETWORKS

This document contains Chapter 2: Statistics, Data Analysis, and Probability strand from the 2008 California High School Exit Examination (CAHSEE):

A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc.

Gerard Mc Nulty Systems Optimisation Ltd BA.,B.A.I.,C.Eng.,F.I.E.I

Microsoft Azure Machine learning Algorithms

Web Data Mining: A Case Study. Abstract. Introduction

not possible or was possible at a high cost for collecting the data.

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Open source business rules management system

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Event driven trading new studies on innovative way. of trading in Forex market. Michał Osmoła INIME live 23 February 2016

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

Data and Analysis. Informatics 1 School of Informatics, University of Edinburgh. Part III Unstructured Data. Ian Stark. Staff-Student Liaison Meeting

Statistical Machine Translation

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Pattern Insight Clone Detection

Recurrent Neural Networks

Using Formulas, Functions, and Data Analysis Tools Excel 2010 Tutorial

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r

Classification using Logistic Regression

Sample Fraction Addition and Subtraction Concepts Activities 1 3

Experiences with Online Programming Examinations

Lecture 1: Course overview, circuits, and formulas

Chapter 4: Artificial Neural Networks

Audit Analytics. --An innovative course at Rutgers. Qi Liu. Roman Chinchila

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

Database Marketing, Business Intelligence and Knowledge Discovery

Information Visualization WS 2013/14 11 Visual Analytics

Big Data & Scripting Part II Streaming Algorithms

Analytics on Big Data

Linear Classification. Volker Tresp Summer 2015

Orthogonal Projections and Orthonormal Bases

Equity forecast: Predicting long term stock price movement using machine learning

Simulated learners in peers assessment for introductory programming courses

A Concrete Introduction. to the Abstract Concepts. of Integers and Algebra using Algebra Tiles

The Goldberg Rao Algorithm for the Maximum Flow Problem

Chapter 6. The stacking ensemble approach

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

Transcription:

Codewebs: Scalable Homework Search for Massive Open Online Programming Courses Leonidas Guibas Andy Nguyen Chris Piech Jonathan Huang Educause, 11/6/2013

Massive Scale Education 4.5 million users 750,000 users 1.2 million users

Complex and informative feedback in MOOCs Multiple choice Coding assignments Proofs Essay questions Short Response Easy to automate Limited ability to ask expressive questions or require creativity Hard to grade automatically Long Response Can assign complex assignments and provide complex feedback 3

Binary feedback for coding questions Linear Regression submission (Homework 1) for Coursera s ML class Test Inputs function [theta, J_history] = gradientdescent(x, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta % theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha What would a human grader m = length(y); % number of training examples J_history = zeros(num_iters, 1); (TA) do? for iter = 1:num_iters theta = theta-alpha*1/m*(x'*(x*theta-y)); J_history(iter) = computecost(x, y, theta); end Test Outputs Correct / Incorrect? 4

Complex, informative feedback function [theta, J_history] = gradientdescent(x, y, theta, alpha, num_iters) m = length(y); J_history = zeros(num_iters, 1); for iter = 1:num_iters hypo = X*theta; newmat = hypo y; trans1 = (X(:,1)); trans1 = trans1 ; newmat1 = trans1 * newmat; temp1 = sum(newmat1); temp1 = (temp1 *alpha)/m; A = [temp1]; theta(1) = theta(1) - A; trans2 = (X(:,2)) ; newmat2 = trans2*newmat; temp2 = sum(newmat2); temp2 = (temp2 *alpha)/m; B = [temp2]; theta(2)= theta(2) - B; J_history(iter) = computecost(x, y, theta); end theta(1) = theta(1) theta(2)= theta(2); Better: theta = theta-(alpha/m) Why?? *X'*(X*theta-y) Correctness Efficiency Style Elegance Good Good Poor Poor 5

Technical Challenges sum(x'*(x*theta-y)); sum(((theta *X ) -y) *X); sum(transpose(x*theta-y)*x); Source code similarity Scalability

The Data Code Web: Improving Life for Future Tas John, Andy, Chris, Leo

Coursera Machine Learning > 1 million submissions

Why Care? Code Web: Improving Life for Future Tas John, Andy, Chris, Leo

The feedback paradox Fast feedback is key for student success [Dihoff 04] Human feedback is an overwhelming time commitment for teachers [Sadler 06].

Cost disease

Grand Challenge Automatically provide feedback to students (for programming assignments)

Basic Idea Code Web: Improving Life for Future Tas John, Andy, Chris, Leo

[Sad Teacher] Input: many ungraded submissions for a programming assignment

[Happy Teacher] Output: annotated submissions

[Effective [Happy Teacher] Force multiply teacher effort.

Build A Homework Search Engine Code Web: Improving Life for Future Tas John, Andy, Chris, Leo

Abstract syntax tree representations function A = warmupexercise() A = []; A = eye(5); endfunction ASSIGN IDENT (A) INDEX_EXP ASTs ignore: Whitespace Comments IDENT (eye) ARGUMENT_LIST CONST (5)

Abstract syntax tree representations function A = warmupexercise() A = []; A = eye(5); endfunction ASTs ignore: Whitespace Comments

Indexing documents by phrases blue sky yellow submarine The bright and blue butterfly hangs on the breeze term/phrase best {1,3} blue {2,4,6} document list What basic queries should code We all something something yellow submarine search engine bright support? {7,8,10,11,12} heat {1,5,13} kernel {2,5,6,9,56} sky {1,2} submarine {2,3,4} woes {10,19,38} yellow {2,4}

Find the Feet That Fit Query ASTs that match on context

Find the Feet That Fit Query ASTs that match on context

Runtime (seconds) Runtime (seconds) Running time for indexing 14 12 10 8 6 4 2 0 3 2.5 2 1.5 1 0.5 0 # ASTs indexed 0 100 200 300 400 500 Average AST size (# nodes)

Subtree frequency Zipf s Law for code phrases 1000000 100000 starter code elbow 10000 1000 100 10 1 1 10 100 1000 10000 100000 Subtree rank

Great!

So what?

Data driven equivalence classes of code Code phrases Web: Improving Life for Future Tas John, Andy, Chris, Leo

There is Structure to Code

Exponential Solution Space def dosomething() { Part A Part B Part C 500 ways 200 ways 200 ways Without factoring: params = 20 10 6 With factoring: params = 9 10 2 }

Say you have two sub-forests Sub-forest Solution A Solution B

Each sub-forest has a complement Solution A Solution B

If those complements are equal Solution A Solution B

And the programs have the same output Solution A Solution B

The sub-forests were analogous in that context Code Block A Code Block B

{m} m rows (X) rows (y) size (X, 1) length (y) size (y, 1) length (x (:, 1)) length (X) size (X) (1) {alphaoverm} alpha / {m} 1 / {m} * alpha alpha.* (1 / {m}) alpha./ {m} alpha * inv ({m}) alpha * (1./ {m}) 1 * alpha / {m} alpha * pinv ({m}) alpha * 1./ {m} alpha.* 1 / {m} 1.* alpha./ {m} alpha * (1 / {m}).01 / {m} alpha.* (1./ {m}) alpha * {m} ^ -1 {hypothesis} (X * theta) (theta' * X')' [X] * theta (X * theta (:)) theta(1) + theta (2) * X (:, 2) sum(x.*repmat(theta',{m},1), 2) {residual} (X * theta - y) (theta' * X' - y')' ({hypothesis} - y) ({hypothesis}' - y )' [{hypothesis} - y] sum({hypothesis} - y, 2)

% remaining ASTs after canonicalization Reduction in #ASTs via canonicalization 100 1 equivalence More class equivalence classes 90 means more reduction 80 means fewer ASTs 70 60 50 40 30 20 We get better 10 results on the more common 0 ASTs 100 1000 10000 # unique ASTs considered (Ordered by frequency of AST)

# submissions covered (out of 40,000) How many submissions can we give feedback to with fixed effort? 25000 20000 15000 with 25 ASTs marked with 200 ASTs marked 10000 5000 0 0 1 10 19 # equivalence classes

# submissions covered (out of 40,000) How many submissions can we give feedback to with fixed effort? 25000 20000 15000 with 25 ASTs marked with 200 ASTs marked Equivalences + 25 marked 10000 5000 No Equivalences + 200 marked 0 0 1 10 19 # equivalence classes

Syntactic bug isolation using Codewebs Web: Improving Life for Future Tas John, Andy, Chris, Leo

Recall Bug detection evaluation (Difficult to evaluate localization!!) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Each point represents a single coding problem in Coursera s ML class regularized backpropagation for neural nets 0 0.2 0.4 0.6 0.8 1 Precision

Higher is better F-score Bug detection evaluation 0.95 0.9 0.85 0.8 0.75 0.7 0.65 with canonicalization without canonicalization # unique ASTs considered

Concrete Example Code Web: Improving Life for Future Tas John, Andy, Chris, Leo

Case study: The extraneous sum function [theta, J_history] = gradientdescent(x, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta % theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha Syntax based approach: m = length(y); % number of training examples J_history = zeros(num_iters, 1); Attach this message to everyone containing that exact expression for iter = 1:num_iters theta = theta-alpha*1/m*(x'*(x*theta-y)); J_history(iter) (covers = computecost(x, 99 submissions) y, theta); end function [theta, J_history] = gradientdescent(x, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent Dear to learn Lisa Simpson, theta consider the % theta = GRADIENTDESCENT(X, y, theta, dimension alpha, num_iters) of the expression: updates theta by % taking num_iters gradient steps with learning rate alpha m = length(y); % number of training examples J_history = zeros(num_iters, 1); Correct X'*(X*theta-y) and what happens after you call sum on it for iter = 1:num_iters theta = theta-alpha*1/m*sum(x'*(x*theta-y)); J_history(iter) = computecost(x, y, theta); end Incorrect

The extraneous sum bug takes many forms theta = theta-alpha*1/m*sum(x'*(x*theta-y)); theta = theta-alpha*1/m*sum(((theta *X ) -y) *X); theta = theta-alpha*1/m*sum(transpose(x*theta-y)*x); (Easier) Output based approach: Attach this message to everyone containing that exact expression (covers 1091 submissions)

Codewebs approach to feedback theta = theta-alpha*1/m*sum(x'*(x*theta-y)); Step 1: Find equivalent ways of writing buggy expression using Codewebs engine Step 2: Write a thoughtful/meaningful hint or explanation ~47% improvement over Step 3: Propagate just feedback using an message output based to any submission containing feedback equivalent system!! expression 1091 Output based 1208 Codewebs 1604 Combined # submissions covered by single message

Big data as a problem: can we give human quality feedback to a million code submissions? Big data as a solution: structure (clustering, subtree equivalence) of the solution space invisible without dense sampling! Where we re going next: Extensions to other languages Applications to new problem domains Data association problem for variables Dynamic analysis Temporal analysis

How do solutions evolve modeling the human creative process Feedback on progress, not just on final solution Think beyond computer science, beyond education