Codewebs: Scalable Homework Search for Massive Open Online Programming Courses Leonidas Guibas Andy Nguyen Chris Piech Jonathan Huang Educause, 11/6/2013
Massive Scale Education 4.5 million users 750,000 users 1.2 million users
Complex and informative feedback in MOOCs Multiple choice Coding assignments Proofs Essay questions Short Response Easy to automate Limited ability to ask expressive questions or require creativity Hard to grade automatically Long Response Can assign complex assignments and provide complex feedback 3
Binary feedback for coding questions Linear Regression submission (Homework 1) for Coursera s ML class Test Inputs function [theta, J_history] = gradientdescent(x, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta % theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha What would a human grader m = length(y); % number of training examples J_history = zeros(num_iters, 1); (TA) do? for iter = 1:num_iters theta = theta-alpha*1/m*(x'*(x*theta-y)); J_history(iter) = computecost(x, y, theta); end Test Outputs Correct / Incorrect? 4
Complex, informative feedback function [theta, J_history] = gradientdescent(x, y, theta, alpha, num_iters) m = length(y); J_history = zeros(num_iters, 1); for iter = 1:num_iters hypo = X*theta; newmat = hypo y; trans1 = (X(:,1)); trans1 = trans1 ; newmat1 = trans1 * newmat; temp1 = sum(newmat1); temp1 = (temp1 *alpha)/m; A = [temp1]; theta(1) = theta(1) - A; trans2 = (X(:,2)) ; newmat2 = trans2*newmat; temp2 = sum(newmat2); temp2 = (temp2 *alpha)/m; B = [temp2]; theta(2)= theta(2) - B; J_history(iter) = computecost(x, y, theta); end theta(1) = theta(1) theta(2)= theta(2); Better: theta = theta-(alpha/m) Why?? *X'*(X*theta-y) Correctness Efficiency Style Elegance Good Good Poor Poor 5
Technical Challenges sum(x'*(x*theta-y)); sum(((theta *X ) -y) *X); sum(transpose(x*theta-y)*x); Source code similarity Scalability
The Data Code Web: Improving Life for Future Tas John, Andy, Chris, Leo
Coursera Machine Learning > 1 million submissions
Why Care? Code Web: Improving Life for Future Tas John, Andy, Chris, Leo
The feedback paradox Fast feedback is key for student success [Dihoff 04] Human feedback is an overwhelming time commitment for teachers [Sadler 06].
Cost disease
Grand Challenge Automatically provide feedback to students (for programming assignments)
Basic Idea Code Web: Improving Life for Future Tas John, Andy, Chris, Leo
[Sad Teacher] Input: many ungraded submissions for a programming assignment
[Happy Teacher] Output: annotated submissions
[Effective [Happy Teacher] Force multiply teacher effort.
Build A Homework Search Engine Code Web: Improving Life for Future Tas John, Andy, Chris, Leo
Abstract syntax tree representations function A = warmupexercise() A = []; A = eye(5); endfunction ASSIGN IDENT (A) INDEX_EXP ASTs ignore: Whitespace Comments IDENT (eye) ARGUMENT_LIST CONST (5)
Abstract syntax tree representations function A = warmupexercise() A = []; A = eye(5); endfunction ASTs ignore: Whitespace Comments
Indexing documents by phrases blue sky yellow submarine The bright and blue butterfly hangs on the breeze term/phrase best {1,3} blue {2,4,6} document list What basic queries should code We all something something yellow submarine search engine bright support? {7,8,10,11,12} heat {1,5,13} kernel {2,5,6,9,56} sky {1,2} submarine {2,3,4} woes {10,19,38} yellow {2,4}
Find the Feet That Fit Query ASTs that match on context
Find the Feet That Fit Query ASTs that match on context
Runtime (seconds) Runtime (seconds) Running time for indexing 14 12 10 8 6 4 2 0 3 2.5 2 1.5 1 0.5 0 # ASTs indexed 0 100 200 300 400 500 Average AST size (# nodes)
Subtree frequency Zipf s Law for code phrases 1000000 100000 starter code elbow 10000 1000 100 10 1 1 10 100 1000 10000 100000 Subtree rank
Great!
So what?
Data driven equivalence classes of code Code phrases Web: Improving Life for Future Tas John, Andy, Chris, Leo
There is Structure to Code
Exponential Solution Space def dosomething() { Part A Part B Part C 500 ways 200 ways 200 ways Without factoring: params = 20 10 6 With factoring: params = 9 10 2 }
Say you have two sub-forests Sub-forest Solution A Solution B
Each sub-forest has a complement Solution A Solution B
If those complements are equal Solution A Solution B
And the programs have the same output Solution A Solution B
The sub-forests were analogous in that context Code Block A Code Block B
{m} m rows (X) rows (y) size (X, 1) length (y) size (y, 1) length (x (:, 1)) length (X) size (X) (1) {alphaoverm} alpha / {m} 1 / {m} * alpha alpha.* (1 / {m}) alpha./ {m} alpha * inv ({m}) alpha * (1./ {m}) 1 * alpha / {m} alpha * pinv ({m}) alpha * 1./ {m} alpha.* 1 / {m} 1.* alpha./ {m} alpha * (1 / {m}).01 / {m} alpha.* (1./ {m}) alpha * {m} ^ -1 {hypothesis} (X * theta) (theta' * X')' [X] * theta (X * theta (:)) theta(1) + theta (2) * X (:, 2) sum(x.*repmat(theta',{m},1), 2) {residual} (X * theta - y) (theta' * X' - y')' ({hypothesis} - y) ({hypothesis}' - y )' [{hypothesis} - y] sum({hypothesis} - y, 2)
% remaining ASTs after canonicalization Reduction in #ASTs via canonicalization 100 1 equivalence More class equivalence classes 90 means more reduction 80 means fewer ASTs 70 60 50 40 30 20 We get better 10 results on the more common 0 ASTs 100 1000 10000 # unique ASTs considered (Ordered by frequency of AST)
# submissions covered (out of 40,000) How many submissions can we give feedback to with fixed effort? 25000 20000 15000 with 25 ASTs marked with 200 ASTs marked 10000 5000 0 0 1 10 19 # equivalence classes
# submissions covered (out of 40,000) How many submissions can we give feedback to with fixed effort? 25000 20000 15000 with 25 ASTs marked with 200 ASTs marked Equivalences + 25 marked 10000 5000 No Equivalences + 200 marked 0 0 1 10 19 # equivalence classes
Syntactic bug isolation using Codewebs Web: Improving Life for Future Tas John, Andy, Chris, Leo
Recall Bug detection evaluation (Difficult to evaluate localization!!) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Each point represents a single coding problem in Coursera s ML class regularized backpropagation for neural nets 0 0.2 0.4 0.6 0.8 1 Precision
Higher is better F-score Bug detection evaluation 0.95 0.9 0.85 0.8 0.75 0.7 0.65 with canonicalization without canonicalization # unique ASTs considered
Concrete Example Code Web: Improving Life for Future Tas John, Andy, Chris, Leo
Case study: The extraneous sum function [theta, J_history] = gradientdescent(x, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta % theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha Syntax based approach: m = length(y); % number of training examples J_history = zeros(num_iters, 1); Attach this message to everyone containing that exact expression for iter = 1:num_iters theta = theta-alpha*1/m*(x'*(x*theta-y)); J_history(iter) (covers = computecost(x, 99 submissions) y, theta); end function [theta, J_history] = gradientdescent(x, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent Dear to learn Lisa Simpson, theta consider the % theta = GRADIENTDESCENT(X, y, theta, dimension alpha, num_iters) of the expression: updates theta by % taking num_iters gradient steps with learning rate alpha m = length(y); % number of training examples J_history = zeros(num_iters, 1); Correct X'*(X*theta-y) and what happens after you call sum on it for iter = 1:num_iters theta = theta-alpha*1/m*sum(x'*(x*theta-y)); J_history(iter) = computecost(x, y, theta); end Incorrect
The extraneous sum bug takes many forms theta = theta-alpha*1/m*sum(x'*(x*theta-y)); theta = theta-alpha*1/m*sum(((theta *X ) -y) *X); theta = theta-alpha*1/m*sum(transpose(x*theta-y)*x); (Easier) Output based approach: Attach this message to everyone containing that exact expression (covers 1091 submissions)
Codewebs approach to feedback theta = theta-alpha*1/m*sum(x'*(x*theta-y)); Step 1: Find equivalent ways of writing buggy expression using Codewebs engine Step 2: Write a thoughtful/meaningful hint or explanation ~47% improvement over Step 3: Propagate just feedback using an message output based to any submission containing feedback equivalent system!! expression 1091 Output based 1208 Codewebs 1604 Combined # submissions covered by single message
Big data as a problem: can we give human quality feedback to a million code submissions? Big data as a solution: structure (clustering, subtree equivalence) of the solution space invisible without dense sampling! Where we re going next: Extensions to other languages Applications to new problem domains Data association problem for variables Dynamic analysis Temporal analysis
How do solutions evolve modeling the human creative process Feedback on progress, not just on final solution Think beyond computer science, beyond education