Learning(to(Teach:(( Machine(Learning(to(Improve( Instruc6on BeverlyParkWoolf SchoolofComputerScience,UniversityofMassachuse<s bev@cs.umass.edu NIPS2015WorkshoponHumanPropelled MachineLearning,Dec13,2014 LongTermGoal Millionsofschoolchildrenwillhaveaccessto whatalexanderthegreatenjoyedasaroyal prerogerasve: thepersonalservicesofatutoraswell informedasaristotle Studentswillhave instantaccesstovast storesofknowledge throughtheir computerizedtutors PatSuppes,StanfordUniversity,1966 DiedNov2014)
AlexandertheGreatvaluedlearningsohighly,that hesaidhewasmoreindebtedtoaristotleforgivinghim knowledgethantohisfatherforgivinghimlife. Weareontrack. Key(components:( ( ArSficialIntelligence MachineLearning LearningSciences Weareabletoachievepersonalservicesof atutorforeverystudentandinstantaccess tovaststoresofknowledge
Then:(((~(400(BC( Now:((((((2014( Model(the(Student( Model(the(Domain( Intelligent( Tutoring( Systems( Personalize(Tutoring( Assess(Learning( Learning( @(Scale(
ResearchQuesSons HowtoretrievesubstancefromeducaSonaldata? Whatdoteachersandstudentsneedtoknow? WhatdoresearchersinLearningScienceswanttoknow? ExplorelargeeducaSonaldatasetsandhowtheyare analyzed createmodelsandpa<ernfinding. HowareresearchersinthefieldofeducaSonal technologyusingavarietyoftechniquestousedata toimproveteachingandlearning?
WhatKindofMLTechniques? VisualizaSonandmodeling Decisiontrees Bayesiannetworks LogisScRegression TemporalModels MarkovModels ClassificaSon:NaïveBayes,NeuralNetworks, Decisiontrees Reasoning(about(the(Learner(with(Machine(Learning
Techniques Pre-processing: Discretizing Variables, Normalizing and Transforming Variables Visualizations: Single Variables and Relationshipts Models: Correlations/ Crosstabulations Models: Causal Modeling Open Learner Models Bull & Mitrovic Models for Teachers, Parents Models of the Domain Arroyo, EDM 2010 Models: Linear Regression Heffernan Koedinger Models of Student Knowledge, Learning Ritter EDM2011 best paper Arroyo, Log Files Models of Student Affect/ Motivatio Pedagogi Engagemen cal t/use and Moves Misuse/Onoff task Tutorial and Actions Baker Arroyo, Log Files Arroyo, Log Files Beck & Rai Arroyo, Shanabrook; Baker Arroyo, EDM 2010 Arroyo -- Animalwatch Models: Feature Selection. Splitting Models vs. Accounting for. Martin & Koedinger Arroyo Classification: Logistic Regression Pavlik (PFA); Gong & Beck, v Cooper, David Beck Classification: Clustering Desmarais -- non negateive matrix factorization Yue Gong UMAP2012, clustering without features :-) Classification: Naive Bayes Classification: Neural Networks Burns, Handwriting DMello: Predicting affective states Baker Stern, MANIC Classification: Decision Trees random forest approach was widely used in KDD2011 cup de Vicente & Pain Models: Association Rule Learning Romero Merceron Temporal Models: Temporal Patterns and Trails over observable variables, and Markov Chains Romero (Educational Trails) Shanabrook Shanabrook Shanabrook Models: Bayesian Networks Zapata-Rivera Heffernan Lots of classic ITS work (HYDRIVE; William Murray) Conati; Arroyo; Rai Chaz Murray RTDT Temporal Models: Hidden Markov Models (latent variables) Mayo & Mitrovic Beck; Pardos Johns & Woolf IvonArroy,WorcesterPolytechnic InsStute DataSetsUsed DatasetscomefromLogFiles EducaSonaltutoringandassessment soeware,
LargeDataSets EventLogTableofaMathTutoringSystem.571,776rows,justinayearSme. Introduction Agenda Model the Student Model the Domain Personalize Tutoring Assess Learning Intelligent( Tutoring( Systems( Learning( @(Scale(
Student Model( Student Model(
Student Model(
AdatagdrivenapproachtowardautomaScpredicSonofstudents emosonalstateswithoutsensorsandwhilestudentsaressllacsvely engagedintheirlearning. Modelsfromstudentsongoingbehavior.AcrossgvalidaSonrevealed smallgainsinaccuracyforthemoresophisscatedstategbased modelsandbe<erpredicsonsoftheremainingunpredictedcases, comparedtothebaselinemodels. Bymodifyingthecontextofthetutoringsystemincludingstudents perceivedemosonaroundmathemascs,atutorcannow opsmizeandimproveastudentsmathemascsahtudes. DavidH.Shanabrook,DavidG.Cooper,BeverlyParkWoolf,andIvonArroyo StudentStates Describingstudent/tutorinteracSon
Problemstatepa<erns IBMsManyEyesWordTreealgorithm.Thetotal1280ATT(a<emptedandsolved)events. MostfrequentlyATTwasfollowedbyaSOFevent(seetoptree).Thesecondlevelofthe treeshowsthatthesequenceattattthehighestfrequenteventchangestotheatt event,i.e.theshieinbehavioroccursaeertwoattstates(seesecondtreeand topbranch).thisindicatestheattstateismoreoeenasolitaryevent,where theattattpa<ernwillconsnueintheattstate.thus,fromtheanalysisthemost frequent3problemstatepa<erns(e.g.,notrgnotrgnotr)are determined(seethirdtreeandsecondbranch).
ADynamicMixtureModeltoDetectStudent MoSvaSonandProficiency JeffJohns Autonomous( Learning(Laboratory Beverly Woolf Center for Knowledge Communication AAAI 7/20/2006 ProblemStatement Background Developamachinelearningcomponentforamathtutoring systemusedbyhighschoolstudents(sat,mcas) FocusonesSmaSngthe state ofastudent,whichisthenused forselecsnganappropriatepedagogicalacson Problem UsingamodeltoesSmatestudentability,but StudentsappearunmoSvatedin~30%ofproblems SoluSon ExplicitlymodelmoSvaSon(asadynamicvariable)andstudent proficiencyinasinglemodel
DetecSonofMoSvaSon UnmoSvatedstudentsdonotreapthefullrewardsof usingacomputergbasedintelligenttutoringsystem. DetecSonofimproperbehavioristhusanimportant componentofanonlinestudentmodel. DynamicmixturemodelbasedonItemResponseTheory.This modelsimultaneouslyessmatesastudent sproficiencyand changingmosvasonlevel. ByaccounSngforstudentmoSvaSon,thedynamicmixture modelresearcherscanmoreaccuratelyessmateproficiency andtheprobabilityofacorrectresponse. Created(Item(Response(Theory((IRT)(models(for(modeling(the(students( knowledge( Data(consists(of(responses((correct/incorrect)(for(400(students(across(70( problems,(where(a(student(performs(~33(problems(on(average( T(implemented(an(EM(algorithm(to(learn(the(parameters(of(the(IRT(model( T(crossTvalidated(results(indicate(the(model(can(predict(with(72%(accuracy( how(the(student(will(perform(on(each(problem( T(algorithms(can(be(used(online(to(es6mate(a(students(ability(while( interac6ng(with(the(tutor( T(currently(working(on(an(extension(of(the(IRT(model(to(include(informa6on( relevant(to(a(students(mo6va6on((6me(spent(on(problem,(number(of(hints( requested)( (
LowStudentMoSvaSon Example:Actualdatafromastudentperforming 12problems(green=correct,red=incorrect) Problemsareofroughlyequaldifficulty Studentappearstoperformwellinbeginningand worsetowardtheend Conclusion:Thestudent sproficiencyisaverage 1 2 3 4 5 6 7 8 9 10 11 12 LowStudentMoSvaSon Conclusion:Poorperformanceonthelastfive problemsisduetolowmosvason(not 50 proficiency) 40 Time(s) ToFirst Response 30 20 Student(is( unmo3vated( Use(observed( data(to(infer( mo3va3on!( 10 0 1 2 3 4 5 6 7 8 9 10 11 12
LowStudentMoSvaSon Opportunityforintelligenttutoringsystemsto improvestudentlearningbyaddressing mosvason Thisissueisbeingdealtwithonalargerscale bytheeducasonalassessmentcommunity Wise&Demars2005.LowExamineeEffortin LowgStakesAssessment:PotenSalProblemsand SoluSons.Educa3onal(Assessment. HiddenMarkovModel(HMM) AHMMisusedtocaptureastudent s changingbehavior(levelofmosvason) M 1 M 2 M n H 1 H 2 H n M i (hidden) Unmotivated Hint Unmotivated Guess Motivated H i (observed) Time to first response < t min AND Number of hints before correct response > h max Time to first response < t min AND Number of hints before correct response < h min If other two cases don t apply
Newedges(inred)changethecondiSonal probabilityofastudent sresponse:p(u i θ, M i ) M 1 M 2 M n H 1 H 2 H Mo3va3on((M i( )( n affects(student( response((u i( )( U 1 U 2 U n θ ParameterEsSmaSon UsesanExpectaSongMaximizaSonalgorithmto essmateparameters MgStepisiteraSve,similartotheIteraSveReweighted LeastSquares(IRLS)algorithm ModelconsistsofdiscreteandconSnuousvariables IntegralfortheconSnuousvariableisapproximatedusing aquadraturetechnique OnlyparametersnotesSmated P(U i θ,m i =unmo3vated@guess)=0.2 P(U i θ,m i =unmo3vated@hint)=0.02
ModelingAbilityandMoSvaSon Combinedmodeldoesnotdecreasetheability essmatewhenthestudentisunmosvated! Combinedmodel separatesabilityfrom mosvason(irtmodel lumpsthemtogether) Experiments Data:400highschoolstudents,70problems,astudent finished32problemsonaverage TraintheModel EsSmateparameters TesttheModel Foreachstudent,foreachproblem: EsSmateθandP(M i )viamaximumlikelihood PredictP(M i+1 )givenhmmdynamics PredictU i+1.doesitmatchactualu i+1? Comparecombinedmodelvs.justanIRTmodel
Results Combinedmodelachieved72.5%crossg validasonaccuracyversus72.0%fortheirt model GapisnotstaSsScallysignificant OpportuniSesforimprovingtheaccuracyof thecombinedmodel Longersequences(perstudent) Be<ermodelofthedynamics,P(M i+1 M i ) Conclusions Proposedanew,flexiblemodeltojointlyesSmate studentmosvasonandability NotseparaSngabilityfrommoSvaSonconflatesthetwo concepts Easilyadjustedforothertutoringsystems CombinedmodelachievedsimilaraccuracytoIRT model OnlineinferenceinrealgSme ImplementedinJava;ranitinonehighschoolinMay 06
Agenda Introduction Model Student Emotion Model the Domain Personalize Tutoring Assess Learning Sensorsusedintheclassroom Bayesiannetworks andlinearregression models
LinearModelstoPredictEmoSons VariablesthathelppredictselfgreportofemoSons.Theresultsuggestthat emosondependsonthecontextinwhichtheemosonoccurs(mathproblem justsolved)andalsocanbepredictedfromphysiologicalacsvitycapturedbythe sensors(bo<omrow). Introduction Agenda Model the Student Model the Domain Personalize Tutoring Assess Learning Intelligent( Tutoring( Systems( Learning( @(Scale(
Domain Model( KurtVanLehn, Domain Model( TheAndesBayesiannetworkbefore(lee)and aeer(right)theobservasonagisgabody. KurtVanLehn.
StudentacSons(lee) andtheselfg explanasonmodel (right). Thephysicsproblem asksthestudenttofi ndthetensionforce exertedonaperson hangingbyaropesed tohiswaist.assume themidshipmanwas namedjake. Domain Model(
Stephens,2006 Stephens,2006
Stephens,2006 Agenda Introduction Model the Student Model the Domain Personalize Tutoring Assess Learning
PredicSngStudentTimeToComplete TwoagentswerebuilttopredictstudentSmetosolve problems(becketal.,2000). 1) PopulaSonstudentmodel(PSM):responsiblefor modelinghowstudentsinteractedwiththetutor,based ondatafromtheensrepopulasonofusersandinput characterisscsofthestudent,aswellasinformason abouttheproblemtobesolvedandoutputaboutthe expectedsme(inseconds)thestudentwouldneedto solvethatproblem. 2) Pedagogicalagent(PA),anditwasresponsiblefor construcsnga teachingpolicy.itwasareinforcement learningagentthatreasonedaboutastudent s knowledgeandprovided customized examplesand hintstailoredforeachstudent(beck andwoolf,2001; Becketal.,1999a,2000). OverviewoftheADVISORmachine learningcomponentinanimalwatch. Thetutorpredictedacurrentstudent sreacsontoavariety ofteachingacsons,suchaspresentasonofspecificproblemtype. (Becketal,2000)
Thetutorpredictedacurrentstudent sreacsontoavariety ofteachingacsons,suchaspresentasonofspecificproblemtype. Accountedforroughly50%ofthevarianceintheamountofSmethesystempredicteda studentwouldspendonaproblemandtheactualsmespenttosolveaproblem. (Becketal,2000) ADVISORpredictedstudentresponseSmeusingits populasonstudentmodel
CycleNetwork CyclenetworkinDTtutor.ThenetworkisrolledouttothreeSmeperiods represensngcurrent,possible,andprojectedstudentacsons.(frommurrayet al.,2004.) ModelsbeingEvaluated Fewissuestosolve SarahSchultz,WPI Whichmodel,learnedoverdata,helpspredictfutureperformancebest? 60
ProblemSelecSonWithinaTopic Arroyoetal. EDMJounraleffort. 61 Pedagogical(Moves(:(Dynamically(adjusted( EmpiricalgbasedesSmatesofeffortleadtoadjustedproblemdifficultyand otheraffecsveandmetagcognisvefeedback 62
Whatis normal behavior? In(((EACH((problem(p i i=1,..,nn=totalproblemsinsystem LookingacrossthewholepopulaSonofstudentswho(used(a(problem( IncorrectA<empts Hints Time(eachbar=5seconds) E(I i )! E(H i )! E(T i )! δ IL! δ IH! δ HL! δ HH! δ TL! δ TH! 01234 01234567 Withinexpectedbehavior Anewstudentencountersthisproblem IstheirbehaviorwithinexpectaSon,oratypical? 63 Whatisoddbehavior? Inanyproblemp i i=1,..,nn=totalproblemsinsystem IncorrectA<empts Hints Time(eachbar=5seconds) E(I i )! E(H i )! E(T i )! δ IL! δ IH! δ HL! δ HH! δ TL! δ TH! 01234 01234567 Oddbehavior Attempts < E(I i ) δ IL! Hints > E(H i ) + δ HH! Time < E(T i ) δ TL! Few Inc. Attempts! Lots of Hints! Little Time! <! >! <! 64
IncreasingProblemDifficulty AtthenextSmestep.Assumeweknowproblemdifficultyofitems. H= Sortedlistofhardermathproblems Easiest Hardestofall X LastProbSeen m( ) # Harder(H[1..m],γ ) = H ceiling m &, + % (. * $ γ - Parameter γ=3 gg>challengerate 65 DecreasingProblemDifficulty AtthenextSmestep.Assumeweknowproblemdifficultyofitems. E= Sortedlistofeasiermathproblems Easiestofall Hardest X n( LastProbSeen * $ Easier(E[1..n],γ ) = E ceiling n n -, & )/ + % γ (. Parameter γ=3 66
Agenda Introduction Model the Student Model the Domain Personalize Tutoring Assess Learning Learning( @(Scale( Stanford scomputersciencecourse Machinelearningtechniqueswereusedtoautonomouslycreatea graphicalmodelofhowstudentsinanintroductoryprogramming courseprogressthroughthehomeworkassignment. Machinelearningalgorithmsfoundpa<ernsinhow(students solvedthecheckerboardkarelproblem.thesepa<ernswere moreinformasveatpredicsnghowwellstudentswouldperform ontheclass(midterm(thanthegradesstudentsreceivedonthe assignment.thealgorithmcapturedameaningfulgeneraltrend abouthowstudentsweresolvingprogrammingproblems. Piech,C.,Sahami,M.,Koller,D.,Cooper,S.,&Blikstein,P.(2012,February). Modelinghowstudentslearntoprogram.InProceedings(of(the(43rd(ACM( technical(symposium(on(computer(science(educa3on(pp.153g160).acm.
StudentModelinginComputer Programming Bag(of(Words(Difference:((Researchersfirstbuilthistogramsofthedifferentkey wordsusedinacomputerprogramandusedtheeuclideandistancebetweentwo histogramsasanaïvemeasureofthedissimilarity.thisisakintodistancemeasures oftextcommonlyusedininformasonretrievalsystems. ApplicaSonProgramInterface(API)CallDissimilarity:Theyraneachprogramwith standardinputsandrecordedtheresulsngsequenceofapicalls.theyused NeedlemangWunschglobalDNAalignmenttomeasurethedifferencebetweenthe listsofapicallsgeneratedbythetwoprograms. Piech,C.,Sahami,M.,Koller,D.,Cooper,S.,&Blikstein,P.(2012, February).Modelinghowstudentslearntoprogram.InProceedings(of( the(43rd(acm(technical(symposium(on(computer(science(educa3on (pp.153g160).acm. HiddenMarkovModel ThefirststepintheirstudentmodelingprocesswastolearnahighlevelrepresentaSonof howeachstudentprogressedthroughthecheckerboardkarelassignment.tolearnthis representasontheymodeledastudent sprogressasahiddenmarkovmodel(hmm)[17]. LearningaHMM.EachstatefromtheHMMbecomesanodeintheFSMandtheweightofa directededgefromonenodetoanotherprovidestheprobabilityoftransisoningfromone statetothenext.the(programs(hidden(markov(model(of(state(transi6ons(for(a(given( student.(the(node("codet"(denotes(the(code(snapshot(of(the(student(at(6me(t,(and(the(node( "statet"(denotes(the(hightlevel(milestone(that(the(student(is(in(at(6me(t.(n(is(the(number(of( Piech,C.,Sahami,M.,Koller,D.,Cooper,S.,&Blikstein,P.(2012,February). snapshots(for(the(student.( Modelinghowstudentslearntoprogram.InProceedings(of(the(43rd(ACM( technical(symposium(on(computer(science(educa3on(pp.153g160).acm.
DissimilarityMatrix Dissimilarity(matrix(for( clustering(of(2000(snapshots.( Each(row(and(column(in(the( matrix(represents(a(snapshot( and(the(entry(at(row(i,(column(j( represents(how(similar( snapshot(i(and(j(are((dark( means(more(similar)( Clusteringonasampleof2000randomsnapshotsfromthetrainingsetreturneda groupofwellgdefinedsnapshotclusters(seefigure2).thevalueofkthatmaximized silhoue<escore(ameasureofhownaturaltheclusteringwas)was26clusters.a visualinspecsonoftheseclustersconfirmedthatsnapshotswhichclusteredtogether werefuncsonallysimilarpiecesofcode. Piech,C.,Sahami,M.,Koller,D.,Cooper,S.,&Blikstein,P.(2012,February). Modelinghowstudentslearntoprogram.InProceedings(of(the(43rd(ACM( technical(symposium(on(computer(science(educa3on(pp.153g160).acm. Thefinitesetofhighglevelormilestonesthatastudentcouldbein.Astateisdefinedbya setofsnapshotswhereallthesnapshotsinthesetcamefromthesamemilestone. ThetransiSonprobability,ofbeinginastategiventhestatethestudentwasininthe previousunitofsme. Theemissionprobability,ofseeingaspecificsnapshotgiventhatyouareinaparScular state.tocalculatetheemissionprobabilityweinterpretedeachofthestatesasemihng snapshotswithnormallydistributeddissimilarises.inotherwords,giventhedissimilarity betweenaparscularsnapshotofstudentcodeandastate s"representasve"snapshot, wecancalculatetheprobabilitythatthestudentsnapshotcamefromagivenstateusing anormaldistribusonbasedonthedissimilarity. Piech,C.,Sahami,M.,Koller,D.,Cooper,S.,&Blikstein,P.(2012, February).Modelinghowstudentslearntoprogram.InProceedings(of( the(43rd(acm(technical(symposium(on(computer(science(educa3on (pp.153g160).acm.
Stanford smooc:teachingmachinelearningtopics Huang,J.,Piech,C.,Nguyen,A.,&Guibas,L.(2013,June).SyntacScand funcsonalvariabilityofamillioncodesubmissionsinamachinelearning mooc.inaied(2013(workshops(proceedings(volume(p.25). ThelandscapeofsoluSonsfor gradientdescentforlinearregression represensngover40,000studentcodesubmissionswithedgesdrawnbetween syntacscallysimilarsubmissionsandcolorscorrespondingtoperformanceona ba<eryofunittests(redsubmissionspassedallunittests). HourofCodeChallengeModeling HowYoungStudentsLearntoProgram
Correct(Answer( Node:uniqueparSal soluson. Arc:(NextsoluSon anexpertwould recommend. Code.orgproblemsolvinggraphoflearnedpolicyforhowtosolveasingleopen endedprogrammingassignmentfromover1musers.eachnodeisaunique parsalgsoluson.thenode0isthecorrectanswer. ChrisPiech,StanfordPh.D. student ImprovedRetenSon Code.orggatheredover137millionparSal solusons.notallstudentsmadeitthroughthe ensrehourofcodebutretensonwasquite highrelasvetoothercontemporary openaccesscourses.
63KPeerGradingfor7Kstudents Blue(Blob:(( StudentA Red(Squares: Studentswhograded StudentA Red(Circle:( Studentswho weregradedby StudentA. ACourseracoursetoteachHCI.Peergradingnetworkof63Kpeergrades for7kstudents.asinglestudentishighlighted,redsquaresgradedthe student,redcirclesweregradedbythestudent. ChrisPiech, StanfordPh.D. student Squares:QuesSons Circles:(Concepts Edges:(Strong QuesSonConcept RelaSonship Lan,A.S.,Studer,C.,Waters,A.E.,&Baraniuk,R.G.(2013).Joint topicmodelingandfactoranalysisoftextualinformasonand gradedresponsedata.arxiv(preprint(arxiv:1305.1956.
Agenda Introduction Model the Student Model the Domain Personalize Tutoring Assess Learning Intelligent( Tutoring( Systems( Learning( @(Scale( Longtermgoal Millionsofschoolchildrenwillhaveaccessto whatalexanderthegreatenjoyedasaroyal prerogerasve: thepersonalservicesofatutor aswellinformedasaristotle Studentswillhave instantaccesstovast storesofknowledge throughtheir computerizedtutors PatSuppes,StanfordUniversity,1966 DiedNov2014)
Longtermgoal Millionsofschoolchildrenwillhaveaccessto whatalexanderthegreatenjoyedasaroyal prerogerasve: thepersonalservicesofatutor aswellinformedasaristotle Studentswillhave instantaccesstovast storesofknowledge throughtheir computerizedtutors PatSuppes,StanfordUniversity,1966 DiedNov2014) Learning(to(Teach:(Machine( Learning(Techniques To(Improving(Instruc6on ThankYou! AnyQuesSons? NIPS2015WorkshoponHumanPropelled MachineLearning Dec13,2014