Learning(to(Teach:(( Machine(Learning(to(Improve( Instruc6on'

Size: px
Start display at page:

Download "Learning(to(Teach:(( Machine(Learning(to(Improve( Instruc6on'"

Transcription

1 Learning(to(Teach:(( Machine(Learning(to(Improve( Instruc6on BeverlyParkWoolf SchoolofComputerScience,UniversityofMassachuse<s NIPS2015WorkshoponHumanPropelled MachineLearning,Dec13,2014 LongTermGoal Millionsofschoolchildrenwillhaveaccessto whatalexanderthegreatenjoyedasaroyal prerogerasve: thepersonalservicesofatutoraswell informedasaristotle Studentswillhave instantaccesstovast storesofknowledge throughtheir computerizedtutors PatSuppes,StanfordUniversity,1966 DiedNov2014)

2 AlexandertheGreatvaluedlearningsohighly,that hesaidhewasmoreindebtedtoaristotleforgivinghim knowledgethantohisfatherforgivinghimlife. Weareontrack. Key(components:( ( ArSficialIntelligence MachineLearning LearningSciences Weareabletoachievepersonalservicesof atutorforeverystudentandinstantaccess tovaststoresofknowledge

3 Then:(((~(400(BC( Now:((((((2014( Model(the(Student( Model(the(Domain( Intelligent( Tutoring( Systems( Personalize(Tutoring( Assess(Learning(

4 ResearchQuesSons HowtoretrievesubstancefromeducaSonaldata? Whatdoteachersandstudentsneedtoknow? WhatdoresearchersinLearningScienceswanttoknow? ExplorelargeeducaSonaldatasetsandhowtheyare analyzed createmodelsandpa<ernfinding. HowareresearchersinthefieldofeducaSonal technologyusingavarietyoftechniquestousedata toimproveteachingandlearning?

5 WhatKindofMLTechniques? VisualizaSonandmodeling Decisiontrees Bayesiannetworks LogisScRegression TemporalModels MarkovModels ClassificaSon:NaïveBayes,NeuralNetworks, Decisiontrees Reasoning(about(the(Learner(with(Machine(Learning

6 Techniques Pre-processing: Discretizing Variables, Normalizing and Transforming Variables Visualizations: Single Variables and Relationshipts Models: Correlations/ Crosstabulations Models: Causal Modeling Open Learner Models Bull & Mitrovic Models for Teachers, Parents Models of the Domain Arroyo, EDM 2010 Models: Linear Regression Heffernan Koedinger Models of Student Knowledge, Learning Ritter EDM2011 best paper Arroyo, Log Files Models of Student Affect/ Motivatio Pedagogi Engagemen cal t/use and Moves Misuse/Onoff task Tutorial and Actions Baker Arroyo, Log Files Arroyo, Log Files Beck & Rai Arroyo, Shanabrook; Baker Arroyo, EDM 2010 Arroyo -- Animalwatch Models: Feature Selection. Splitting Models vs. Accounting for. Martin & Koedinger Arroyo Classification: Logistic Regression Pavlik (PFA); Gong & Beck, v Cooper, David Beck Classification: Clustering Desmarais -- non negateive matrix factorization Yue Gong UMAP2012, clustering without features :-) Classification: Naive Bayes Classification: Neural Networks Burns, Handwriting DMello: Predicting affective states Baker Stern, MANIC Classification: Decision Trees random forest approach was widely used in KDD2011 cup de Vicente & Pain Models: Association Rule Learning Romero Merceron Temporal Models: Temporal Patterns and Trails over observable variables, and Markov Chains Romero (Educational Trails) Shanabrook Shanabrook Shanabrook Models: Bayesian Networks Zapata-Rivera Heffernan Lots of classic ITS work (HYDRIVE; William Murray) Conati; Arroyo; Rai Chaz Murray RTDT Temporal Models: Hidden Markov Models (latent variables) Mayo & Mitrovic Beck; Pardos Johns & Woolf IvonArroy,WorcesterPolytechnic InsStute DataSetsUsed DatasetscomefromLogFiles EducaSonaltutoringandassessment soeware,

7 LargeDataSets EventLogTableofaMathTutoringSystem.571,776rows,justinayearSme. Introduction Agenda Model the Student Model the Domain Personalize Tutoring Assess Learning Intelligent( Tutoring( Systems(

8 Student Model( Student Model(

9 Student Model(

10 AdatagdrivenapproachtowardautomaScpredicSonofstudents emosonalstateswithoutsensorsandwhilestudentsaressllacsvely engagedintheirlearning. Modelsfromstudentsongoingbehavior.AcrossgvalidaSonrevealed smallgainsinaccuracyforthemoresophisscatedstategbased modelsandbe<erpredicsonsoftheremainingunpredictedcases, comparedtothebaselinemodels. Bymodifyingthecontextofthetutoringsystemincludingstudents perceivedemosonaroundmathemascs,atutorcannow opsmizeandimproveastudentsmathemascsahtudes. DavidH.Shanabrook,DavidG.Cooper,BeverlyParkWoolf,andIvonArroyo StudentStates Describingstudent/tutorinteracSon

11

12

13

14 Problemstatepa<erns IBMsManyEyesWordTreealgorithm.Thetotal1280ATT(a<emptedandsolved)events. MostfrequentlyATTwasfollowedbyaSOFevent(seetoptree).Thesecondlevelofthe treeshowsthatthesequenceattattthehighestfrequenteventchangestotheatt event,i.e.theshieinbehavioroccursaeertwoattstates(seesecondtreeand topbranch).thisindicatestheattstateismoreoeenasolitaryevent,where theattattpa<ernwillconsnueintheattstate.thus,fromtheanalysisthemost frequent3problemstatepa<erns(e.g.,notrgnotrgnotr)are determined(seethirdtreeandsecondbranch).

15 ADynamicMixtureModeltoDetectStudent MoSvaSonandProficiency JeffJohns Autonomous( Learning(Laboratory Beverly Woolf Center for Knowledge Communication AAAI 7/20/2006 ProblemStatement Background Developamachinelearningcomponentforamathtutoring systemusedbyhighschoolstudents(sat,mcas) FocusonesSmaSngthe state ofastudent,whichisthenused forselecsnganappropriatepedagogicalacson Problem UsingamodeltoesSmatestudentability,but StudentsappearunmoSvatedin~30%ofproblems SoluSon ExplicitlymodelmoSvaSon(asadynamicvariable)andstudent proficiencyinasinglemodel

16 DetecSonofMoSvaSon UnmoSvatedstudentsdonotreapthefullrewardsof usingacomputergbasedintelligenttutoringsystem. DetecSonofimproperbehavioristhusanimportant componentofanonlinestudentmodel. DynamicmixturemodelbasedonItemResponseTheory.This modelsimultaneouslyessmatesastudent sproficiencyand changingmosvasonlevel. ByaccounSngforstudentmoSvaSon,thedynamicmixture modelresearcherscanmoreaccuratelyessmateproficiency andtheprobabilityofacorrectresponse. Created(Item(Response(Theory((IRT)(models(for(modeling(the(students( knowledge( Data(consists(of(responses((correct/incorrect)(for(400(students(across(70( problems,(where(a(student(performs(~33(problems(on(average( T(implemented(an(EM(algorithm(to(learn(the(parameters(of(the(IRT(model( T(crossTvalidated(results(indicate(the(model(can(predict(with(72%(accuracy( how(the(student(will(perform(on(each(problem( T(algorithms(can(be(used(online(to(es6mate(a(students(ability(while( interac6ng(with(the(tutor( T(currently(working(on(an(extension(of(the(IRT(model(to(include(informa6on( relevant(to(a(students(mo6va6on((6me(spent(on(problem,(number(of(hints( requested)( (

17 LowStudentMoSvaSon Example:Actualdatafromastudentperforming 12problems(green=correct,red=incorrect) Problemsareofroughlyequaldifficulty Studentappearstoperformwellinbeginningand worsetowardtheend Conclusion:Thestudent sproficiencyisaverage LowStudentMoSvaSon Conclusion:Poorperformanceonthelastfive problemsisduetolowmosvason(not 50 proficiency) 40 Time(s) ToFirst Response Student(is( unmo3vated( Use(observed( data(to(infer( mo3va3on!(

18 LowStudentMoSvaSon Opportunityforintelligenttutoringsystemsto improvestudentlearningbyaddressing mosvason Thisissueisbeingdealtwithonalargerscale bytheeducasonalassessmentcommunity Wise&Demars2005.LowExamineeEffortin LowgStakesAssessment:PotenSalProblemsand SoluSons.Educa3onal(Assessment. HiddenMarkovModel(HMM) AHMMisusedtocaptureastudent s changingbehavior(levelofmosvason) M 1 M 2 M n H 1 H 2 H n M i (hidden) Unmotivated Hint Unmotivated Guess Motivated H i (observed) Time to first response < t min AND Number of hints before correct response > h max Time to first response < t min AND Number of hints before correct response < h min If other two cases don t apply

19 Newedges(inred)changethecondiSonal probabilityofastudent sresponse:p(u i θ, M i ) M 1 M 2 M n H 1 H 2 H Mo3va3on((M i( )( n affects(student( response((u i( )( U 1 U 2 U n θ ParameterEsSmaSon UsesanExpectaSongMaximizaSonalgorithmto essmateparameters MgStepisiteraSve,similartotheIteraSveReweighted LeastSquares(IRLS)algorithm ModelconsistsofdiscreteandconSnuousvariables IntegralfortheconSnuousvariableisapproximatedusing aquadraturetechnique OnlyparametersnotesSmated P(U i θ,m i =unmo3vated@guess)=0.2 P(U i θ,m i =unmo3vated@hint)=0.02

20 ModelingAbilityandMoSvaSon Combinedmodeldoesnotdecreasetheability essmatewhenthestudentisunmosvated! Combinedmodel separatesabilityfrom mosvason(irtmodel lumpsthemtogether) Experiments Data:400highschoolstudents,70problems,astudent finished32problemsonaverage TraintheModel EsSmateparameters TesttheModel Foreachstudent,foreachproblem: EsSmateθandP(M i )viamaximumlikelihood PredictP(M i+1 )givenhmmdynamics PredictU i+1.doesitmatchactualu i+1? Comparecombinedmodelvs.justanIRTmodel

21 Results Combinedmodelachieved72.5%crossg validasonaccuracyversus72.0%fortheirt model GapisnotstaSsScallysignificant OpportuniSesforimprovingtheaccuracyof thecombinedmodel Longersequences(perstudent) Be<ermodelofthedynamics,P(M i+1 M i ) Conclusions Proposedanew,flexiblemodeltojointlyesSmate studentmosvasonandability NotseparaSngabilityfrommoSvaSonconflatesthetwo concepts Easilyadjustedforothertutoringsystems CombinedmodelachievedsimilaraccuracytoIRT model OnlineinferenceinrealgSme ImplementedinJava;ranitinonehighschoolinMay 06

22 Agenda Introduction Model Student Emotion Model the Domain Personalize Tutoring Assess Learning Sensorsusedintheclassroom Bayesiannetworks andlinearregression models

23 LinearModelstoPredictEmoSons VariablesthathelppredictselfgreportofemoSons.Theresultsuggestthat emosondependsonthecontextinwhichtheemosonoccurs(mathproblem justsolved)andalsocanbepredictedfromphysiologicalacsvitycapturedbythe sensors(bo<omrow). Introduction Agenda Model the Student Model the Domain Personalize Tutoring Assess Learning Intelligent( Tutoring( Systems(

24 Domain Model( KurtVanLehn, Domain Model( TheAndesBayesiannetworkbefore(lee)and aeer(right)theobservasonagisgabody. KurtVanLehn.

25 StudentacSons(lee) andtheselfg explanasonmodel (right). Thephysicsproblem asksthestudenttofi ndthetensionforce exertedonaperson hangingbyaropesed tohiswaist.assume themidshipmanwas namedjake. Domain Model(

26 Stephens,2006 Stephens,2006

27 Stephens,2006 Agenda Introduction Model the Student Model the Domain Personalize Tutoring Assess Learning

28 PredicSngStudentTimeToComplete TwoagentswerebuilttopredictstudentSmetosolve problems(becketal.,2000). 1) PopulaSonstudentmodel(PSM):responsiblefor modelinghowstudentsinteractedwiththetutor,based ondatafromtheensrepopulasonofusersandinput characterisscsofthestudent,aswellasinformason abouttheproblemtobesolvedandoutputaboutthe expectedsme(inseconds)thestudentwouldneedto solvethatproblem. 2) Pedagogicalagent(PA),anditwasresponsiblefor construcsnga teachingpolicy.itwasareinforcement learningagentthatreasonedaboutastudent s knowledgeandprovided customized examplesand hintstailoredforeachstudent(beck andwoolf,2001; Becketal.,1999a,2000). OverviewoftheADVISORmachine learningcomponentinanimalwatch. Thetutorpredictedacurrentstudent sreacsontoavariety ofteachingacsons,suchaspresentasonofspecificproblemtype. (Becketal,2000)

29 Thetutorpredictedacurrentstudent sreacsontoavariety ofteachingacsons,suchaspresentasonofspecificproblemtype. Accountedforroughly50%ofthevarianceintheamountofSmethesystempredicteda studentwouldspendonaproblemandtheactualsmespenttosolveaproblem. (Becketal,2000) ADVISORpredictedstudentresponseSmeusingits populasonstudentmodel

30 CycleNetwork CyclenetworkinDTtutor.ThenetworkisrolledouttothreeSmeperiods represensngcurrent,possible,andprojectedstudentacsons.(frommurrayet al.,2004.) ModelsbeingEvaluated Fewissuestosolve SarahSchultz,WPI Whichmodel,learnedoverdata,helpspredictfutureperformancebest? 60

31 ProblemSelecSonWithinaTopic Arroyoetal. EDMJounraleffort. 61 Pedagogical(Moves(:(Dynamically(adjusted( EmpiricalgbasedesSmatesofeffortleadtoadjustedproblemdifficultyand otheraffecsveandmetagcognisvefeedback 62

32 Whatis normal behavior? In(((EACH((problem(p i i=1,..,nn=totalproblemsinsystem LookingacrossthewholepopulaSonofstudentswho(used(a(problem( IncorrectA<empts Hints Time(eachbar=5seconds) E(I i )! E(H i )! E(T i )! δ IL! δ IH! δ HL! δ HH! δ TL! δ TH! Withinexpectedbehavior Anewstudentencountersthisproblem IstheirbehaviorwithinexpectaSon,oratypical? 63 Whatisoddbehavior? Inanyproblemp i i=1,..,nn=totalproblemsinsystem IncorrectA<empts Hints Time(eachbar=5seconds) E(I i )! E(H i )! E(T i )! δ IL! δ IH! δ HL! δ HH! δ TL! δ TH! Oddbehavior Attempts < E(I i ) δ IL! Hints > E(H i ) + δ HH! Time < E(T i ) δ TL! Few Inc. Attempts! Lots of Hints! Little Time! <! >! <! 64

33 IncreasingProblemDifficulty AtthenextSmestep.Assumeweknowproblemdifficultyofitems. H= Sortedlistofhardermathproblems Easiest Hardestofall X LastProbSeen m( ) # Harder(H[1..m],γ ) = H ceiling m &, + % (. * $ γ - Parameter γ=3 gg>challengerate 65 DecreasingProblemDifficulty AtthenextSmestep.Assumeweknowproblemdifficultyofitems. E= Sortedlistofeasiermathproblems Easiestofall Hardest X n( LastProbSeen * $ Easier(E[1..n],γ ) = E ceiling n n -, & )/ + % γ (. Parameter γ=3 66

34 Agenda Introduction Model the Student Model the Domain Personalize Tutoring Assess Learning Stanford scomputersciencecourse Machinelearningtechniqueswereusedtoautonomouslycreatea graphicalmodelofhowstudentsinanintroductoryprogramming courseprogressthroughthehomeworkassignment. Machinelearningalgorithmsfoundpa<ernsinhow(students solvedthecheckerboardkarelproblem.thesepa<ernswere moreinformasveatpredicsnghowwellstudentswouldperform ontheclass(midterm(thanthegradesstudentsreceivedonthe assignment.thealgorithmcapturedameaningfulgeneraltrend abouthowstudentsweresolvingprogrammingproblems. Piech,C.,Sahami,M.,Koller,D.,Cooper,S.,&Blikstein,P.(2012,February). Modelinghowstudentslearntoprogram.InProceedings(of(the(43rd(ACM( technical(symposium(on(computer(science(educa3on(pp.153g160).acm.

35 StudentModelinginComputer Programming Bag(of(Words(Difference:((Researchersfirstbuilthistogramsofthedifferentkey wordsusedinacomputerprogramandusedtheeuclideandistancebetweentwo histogramsasanaïvemeasureofthedissimilarity.thisisakintodistancemeasures oftextcommonlyusedininformasonretrievalsystems. ApplicaSonProgramInterface(API)CallDissimilarity:Theyraneachprogramwith standardinputsandrecordedtheresulsngsequenceofapicalls.theyused NeedlemangWunschglobalDNAalignmenttomeasurethedifferencebetweenthe listsofapicallsgeneratedbythetwoprograms. Piech,C.,Sahami,M.,Koller,D.,Cooper,S.,&Blikstein,P.(2012, February).Modelinghowstudentslearntoprogram.InProceedings(of( the(43rd(acm(technical(symposium(on(computer(science(educa3on (pp.153g160).acm. HiddenMarkovModel ThefirststepintheirstudentmodelingprocesswastolearnahighlevelrepresentaSonof howeachstudentprogressedthroughthecheckerboardkarelassignment.tolearnthis representasontheymodeledastudent sprogressasahiddenmarkovmodel(hmm)[17]. LearningaHMM.EachstatefromtheHMMbecomesanodeintheFSMandtheweightofa directededgefromonenodetoanotherprovidestheprobabilityoftransisoningfromone statetothenext.the(programs(hidden(markov(model(of(state(transi6ons(for(a(given( student.(the(node("codet"(denotes(the(code(snapshot(of(the(student(at(6me(t,(and(the(node( "statet"(denotes(the(hightlevel(milestone(that(the(student(is(in(at(6me(t.(n(is(the(number(of( Piech,C.,Sahami,M.,Koller,D.,Cooper,S.,&Blikstein,P.(2012,February). snapshots(for(the(student.( Modelinghowstudentslearntoprogram.InProceedings(of(the(43rd(ACM( technical(symposium(on(computer(science(educa3on(pp.153g160).acm.

36 DissimilarityMatrix Dissimilarity(matrix(for( clustering(of(2000(snapshots.( Each(row(and(column(in(the( matrix(represents(a(snapshot( and(the(entry(at(row(i,(column(j( represents(how(similar( snapshot(i(and(j(are((dark( means(more(similar)( Clusteringonasampleof2000randomsnapshotsfromthetrainingsetreturneda groupofwellgdefinedsnapshotclusters(seefigure2).thevalueofkthatmaximized silhoue<escore(ameasureofhownaturaltheclusteringwas)was26clusters.a visualinspecsonoftheseclustersconfirmedthatsnapshotswhichclusteredtogether werefuncsonallysimilarpiecesofcode. Piech,C.,Sahami,M.,Koller,D.,Cooper,S.,&Blikstein,P.(2012,February). Modelinghowstudentslearntoprogram.InProceedings(of(the(43rd(ACM( technical(symposium(on(computer(science(educa3on(pp.153g160).acm. Thefinitesetofhighglevelormilestonesthatastudentcouldbein.Astateisdefinedbya setofsnapshotswhereallthesnapshotsinthesetcamefromthesamemilestone. ThetransiSonprobability,ofbeinginastategiventhestatethestudentwasininthe previousunitofsme. Theemissionprobability,ofseeingaspecificsnapshotgiventhatyouareinaparScular state.tocalculatetheemissionprobabilityweinterpretedeachofthestatesasemihng snapshotswithnormallydistributeddissimilarises.inotherwords,giventhedissimilarity betweenaparscularsnapshotofstudentcodeandastate s"representasve"snapshot, wecancalculatetheprobabilitythatthestudentsnapshotcamefromagivenstateusing anormaldistribusonbasedonthedissimilarity. Piech,C.,Sahami,M.,Koller,D.,Cooper,S.,&Blikstein,P.(2012, February).Modelinghowstudentslearntoprogram.InProceedings(of( the(43rd(acm(technical(symposium(on(computer(science(educa3on (pp.153g160).acm.

37 Stanford smooc:teachingmachinelearningtopics Huang,J.,Piech,C.,Nguyen,A.,&Guibas,L.(2013,June).SyntacScand funcsonalvariabilityofamillioncodesubmissionsinamachinelearning mooc.inaied(2013(workshops(proceedings(volume(p.25). ThelandscapeofsoluSonsfor gradientdescentforlinearregression represensngover40,000studentcodesubmissionswithedgesdrawnbetween syntacscallysimilarsubmissionsandcolorscorrespondingtoperformanceona ba<eryofunittests(redsubmissionspassedallunittests). HourofCodeChallengeModeling HowYoungStudentsLearntoProgram

38 Correct(Answer( Node:uniqueparSal soluson. Arc:(NextsoluSon anexpertwould recommend. Code.orgproblemsolvinggraphoflearnedpolicyforhowtosolveasingleopen endedprogrammingassignmentfromover1musers.eachnodeisaunique parsalgsoluson.thenode0isthecorrectanswer. ChrisPiech,StanfordPh.D. student ImprovedRetenSon Code.orggatheredover137millionparSal solusons.notallstudentsmadeitthroughthe ensrehourofcodebutretensonwasquite highrelasvetoothercontemporary openaccesscourses.

39 63KPeerGradingfor7Kstudents Blue(Blob:(( StudentA Red(Squares: Studentswhograded StudentA Red(Circle:( Studentswho weregradedby StudentA. ACourseracoursetoteachHCI.Peergradingnetworkof63Kpeergrades for7kstudents.asinglestudentishighlighted,redsquaresgradedthe student,redcirclesweregradedbythestudent. ChrisPiech, StanfordPh.D. student Squares:QuesSons Circles:(Concepts Edges:(Strong QuesSonConcept RelaSonship Lan,A.S.,Studer,C.,Waters,A.E.,&Baraniuk,R.G.(2013).Joint topicmodelingandfactoranalysisoftextualinformasonand gradedresponsedata.arxiv(preprint(arxiv:

40 Agenda Introduction Model the Student Model the Domain Personalize Tutoring Assess Learning Intelligent( Tutoring( Systems( Longtermgoal Millionsofschoolchildrenwillhaveaccessto whatalexanderthegreatenjoyedasaroyal prerogerasve: thepersonalservicesofatutor aswellinformedasaristotle Studentswillhave instantaccesstovast storesofknowledge throughtheir computerizedtutors PatSuppes,StanfordUniversity,1966 DiedNov2014)

41 Longtermgoal Millionsofschoolchildrenwillhaveaccessto whatalexanderthegreatenjoyedasaroyal prerogerasve: thepersonalservicesofatutor aswellinformedasaristotle Studentswillhave instantaccesstovast storesofknowledge throughtheir computerizedtutors PatSuppes,StanfordUniversity,1966 DiedNov2014) Learning(to(Teach:(Machine( Learning(Techniques To(Improving(Instruc6on ThankYou! AnyQuesSons? NIPS2015WorkshoponHumanPropelled MachineLearning Dec13,2014

Modeling learning patterns of students with a tutoring system using Hidden Markov Models

Modeling learning patterns of students with a tutoring system using Hidden Markov Models Book Title Book Editors IOS Press, 2003 1 Modeling learning patterns of students with a tutoring system using Hidden Markov Models Carole Beal a,1, Sinjini Mitra a and Paul R. Cohen a a USC Information

More information

The Sum is Greater than the Parts: Ensembling Student Knowledge Models in ASSISTments

The Sum is Greater than the Parts: Ensembling Student Knowledge Models in ASSISTments The Sum is Greater than the Parts: Ensembling Student Knowledge Models in ASSISTments S. M. GOWDA, R. S. J. D. BAKER, Z. PARDOS AND N. T. HEFFERNAN Worcester Polytechnic Institute, USA Recent research

More information

Data Mining for Education

Data Mining for Education Data Mining for Education Ryan S.J.d. Baker, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA rsbaker@cmu.edu Article to appear as Baker, R.S.J.d. (in press) Data Mining for Education. To appear

More information

Microsoft Azure Machine learning Algorithms

Microsoft Azure Machine learning Algorithms Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

A STUDY ON EDUCATIONAL DATA MINING

A STUDY ON EDUCATIONAL DATA MINING A STUDY ON EDUCATIONAL DATA MINING Dr.S.Umamaheswari Associate Professor K.S.Divyaa Research Scholar, School of IT and Science Dr.G.R.Damodaran College of Science Tamilnadu ABSTRACT Data mining is called

More information

Feature Engineering and Classifier Ensemble for KDD Cup 2010

Feature Engineering and Classifier Ensemble for KDD Cup 2010 JMLR: Workshop and Conference Proceedings 1: 1-16 KDD Cup 2010 Feature Engineering and Classifier Ensemble for KDD Cup 2010 Hsiang-Fu Yu, Hung-Yi Lo, Hsun-Ping Hsieh, Jing-Kai Lou, Todd G. McKenzie, Jung-Wei

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Learning What Works in its from Non-Traditional Randomized Controlled Trial Data

Learning What Works in its from Non-Traditional Randomized Controlled Trial Data International Journal of Artificial Intelligence in Education 21 (2011) 47 63 47 DOI 10.3233/JAI-2011-017 IOS Press Learning What Works in its from Non-Traditional Randomized Controlled Trial Data Zachary

More information

How To Solve The Kd Cup 2010 Challenge

How To Solve The Kd Cup 2010 Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

More information

Chapter X: Educational Data Mining and Learning Analytics

Chapter X: Educational Data Mining and Learning Analytics Chapter X: Educational Data Mining and Learning Analytics Ryan Shaun Joazeiro de Baker and Paul Salvador Inventado Abstract In recent years, two communities have grown around a joint interest in how big

More information

Machine learning for algo trading

Machine learning for algo trading Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with

More information

SOME PRINCIPLES OF UNSUPERVISED LEARNING AND APPLICATION IN EDUCATION 1

SOME PRINCIPLES OF UNSUPERVISED LEARNING AND APPLICATION IN EDUCATION 1 SOME PRINCIPLES OF UNSUPERVISED LEARNING AND APPLICATION IN EDUCATION 1 In the beginning of the 90 s the idea of using data stored by computers to inform business really emerged under the name business

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Data Mining. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/

Data Mining. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/ Data Mining Dr. Saed Sayad University of Toronto 2010 saed.sayad@utoronto.ca http://chem-eng.utoronto.ca/~datamining/ 1 Data Mining Data mining is about explaining the past and predicting the future by

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Elate A New Student Learning Model Utilizing EDM for Strengthening Math Education

Elate A New Student Learning Model Utilizing EDM for Strengthening Math Education International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Issue-3 E-ISSN: 2347-2693 Elate A New Student Learning Model Utilizing EDM for Strengthening Math Education

More information

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Machine Learning and Data Mining. Fundamentals, robotics, recognition Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,

More information

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence Government of Russian Federation Federal State Autonomous Educational Institution of High Professional Education National Research University «Higher School of Economics» Faculty of Computer Science School

More information

Predictive Data modeling for health care: Comparative performance study of different prediction models

Predictive Data modeling for health care: Comparative performance study of different prediction models Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar

More information

Mapping Question Items to Skills with Non-negative Matrix Factorization

Mapping Question Items to Skills with Non-negative Matrix Factorization Mapping Question s to Skills with Non-negative Matrix Factorization Michel C. Desmarais Polytechnique Montréal michel.desmarais@polymtl.ca ABSTRACT Intelligent learning environments need to assess the

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

Pentaho Data Mining Last Modified on January 22, 2007

Pentaho Data Mining Last Modified on January 22, 2007 Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org

More information

Wheel-spinning: students who fail to master a skill

Wheel-spinning: students who fail to master a skill Wheel-spinning: students who fail to master a skill Joseph E. Beck and Yue Gong Worcester Polytechnic Institute {josephbeck, ygong}@wpi.edu Abstract. The concept of mastery learning is powerful: rather

More information

Predictive Modeling Techniques in Insurance

Predictive Modeling Techniques in Insurance Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics

More information

Mining Data for Student Models

Mining Data for Student Models Mining Data for Student Models Ryan S.J.d. Baker 1 1 Department of Social Science and Policy Studies, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609 USA rsbaker@wpi.edu Abstract.

More information

HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

More information

Descriptive and Predictive Modelling Techniques for Educational Technology

Descriptive and Predictive Modelling Techniques for Educational Technology Descriptive and Predictive Modelling Techniques for Educational Technology Wilhelmiina Hämäläinen Licentiate thesis August 10, 2006 Department of Computer Science University of Joensuu Descriptive and

More information

Data Mining with SQL Server Data Tools

Data Mining with SQL Server Data Tools Data Mining with SQL Server Data Tools Data mining tasks include classification (directed/supervised) models as well as (undirected/unsupervised) models of association analysis and clustering. 1 Data Mining

More information

480093 - TDS - Socio-Environmental Data Science

480093 - TDS - Socio-Environmental Data Science Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 480 - IS.UPC - University Research Institute for Sustainability Science and Technology 715 - EIO - Department of Statistics and

More information

Predictive Modeling and Big Data

Predictive Modeling and Big Data Predictive Modeling and Presented by Eileen Burns, FSA, MAAA Milliman Agenda Current uses of predictive modeling in the life insurance industry Potential applications of 2 1 June 16, 2014 [Enter presentation

More information

Better credit models benefit us all

Better credit models benefit us all Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis

More information

Mixture Modeling of Individual Learning Curves

Mixture Modeling of Individual Learning Curves Mixture Modeling of Individual Learning Curves Matthew Streeter Duolingo, Inc. Pittsburgh, PA matt@duolingo.com ABSTRACT We show that student learning can be accurately modeled using a mixture of learning

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate

More information

8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence 8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor

More information

Studying Auto Insurance Data

Studying Auto Insurance Data Studying Auto Insurance Data Ashutosh Nandeshwar February 23, 2010 1 Introduction To study auto insurance data using traditional and non-traditional tools, I downloaded a well-studied data from http://www.statsci.org/data/general/motorins.

More information

Academic and Business Research Institute Conference, Las Vegas, 2010 1

Academic and Business Research Institute Conference, Las Vegas, 2010 1 Academic and Business Research Institute Conference, Las Vegas, 2010 1 Assisting Higher Education in Assessing, Predicting, and Managing Issues Related to Student Success: A Web-based Software using Data

More information

Current state of learning analytics and educational data mining

Current state of learning analytics and educational data mining Current state of learning analytics and educational data mining George Siemens Ryan S.J.d. Baker August 2013 Poll #1 How far along is your institution in using LA/ EDM at institutional level? We re thinking

More information

E-Learning Using Data Mining. Shimaa Abd Elkader Abd Elaal

E-Learning Using Data Mining. Shimaa Abd Elkader Abd Elaal E-Learning Using Data Mining Shimaa Abd Elkader Abd Elaal -10- E-learning using data mining Shimaa Abd Elkader Abd Elaal Abstract Educational Data Mining (EDM) is the process of converting raw data from

More information

MA2823: Foundations of Machine Learning

MA2823: Foundations of Machine Learning MA2823: Foundations of Machine Learning École Centrale Paris Fall 2015 Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr TAs: Jiaqian Yu jiaqian.yu@centralesupelec.fr

More information

Data Mining for Model Creation. Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA 98370 paul.below@eds.

Data Mining for Model Creation. Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA 98370 paul.below@eds. Sept 03-23-05 22 2005 Data Mining for Model Creation Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA 98370 paul.below@eds.com page 1 Agenda Data Mining and Estimating Model Creation

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges A Basic Guide to Modeling Techniques for All Direct Marketing Challenges Allison Cornia Database Marketing Manager Microsoft Corporation C. Olivia Rud Executive Vice President Data Square, LLC Overview

More information

Big Data Big Knowledge?

Big Data Big Knowledge? EBPI Epidemiology, Biostatistics and Prevention Institute Big Data Big Knowledge? Torsten Hothorn 2015-03-06 The end of theory The End of Theory: The Data Deluge Makes the Scientific Method Obsolete (Chris

More information

Classification Problems

Classification Problems Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Factorization Models for Forecasting Student Performance

Factorization Models for Forecasting Student Performance Factorization Models for Forecasting Student Performance Nguyen Thai-Nghe, Tomáš Horváth and Lars Schmidt-Thieme, University of Hildesheim, Germany Predicting student performance (PSP) is one of the educational

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

Assessing Data Mining: The State of the Practice

Assessing Data Mining: The State of the Practice Assessing Data Mining: The State of the Practice 2003 Herbert A. Edelstein Two Crows Corporation 10500 Falls Road Potomac, Maryland 20854 www.twocrows.com (301) 983-3555 Objectives Separate myth from reality

More information

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

More information

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

α α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =

More information

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore. CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

More information

Designing a learning system

Designing a learning system Lecture Designing a learning system Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x4-8845 http://.cs.pitt.edu/~milos/courses/cs750/ Design of a learning system (first vie) Application or Testing

More information

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Predicting Soccer Match Results in the English Premier League

Predicting Soccer Match Results in the English Premier League Predicting Soccer Match Results in the English Premier League Ben Ulmer School of Computer Science Stanford University Email: ulmerb@stanford.edu Matthew Fernandez School of Computer Science Stanford University

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

Mining Wiki Usage Data for Predicting Final Grades of Students

Mining Wiki Usage Data for Predicting Final Grades of Students Mining Wiki Usage Data for Predicting Final Grades of Students Gökhan Akçapınar, Erdal Coşgun, Arif Altun Hacettepe University gokhana@hacettepe.edu.tr, erdal.cosgun@hacettepe.edu.tr, altunar@hacettepe.edu.tr

More information

Introduction to UTS/OLT Educational Data Mining Projects. Prof. Longbing Cao Director, Advanced Analytics Institute

Introduction to UTS/OLT Educational Data Mining Projects. Prof. Longbing Cao Director, Advanced Analytics Institute Introduction to UTS/OLT Educational Data Mining Projects Prof. Longbing Cao Director, Advanced Analytics Institute Outline Introduction to Educational Data Mining (EDM) IEEE Task Force OLT & UTS LT Projects

More information

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,

More information

Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

More information

LCs for Binary Classification

LCs for Binary Classification Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it

More information

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan

More information

Didacticiel Études de cas

Didacticiel Études de cas 1 Theme Data Mining with R The rattle package. R (http://www.r project.org/) is one of the most exciting free data mining software projects of these last years. Its popularity is completely justified (see

More information

COURSE RECOMMENDER SYSTEM IN E-LEARNING

COURSE RECOMMENDER SYSTEM IN E-LEARNING International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand

More information

Detecting Learning Moment-by-Moment

Detecting Learning Moment-by-Moment International Journal of Artificial Intelligence in Education Volume# (YYYY) Number IOS Press Detecting Learning Moment-by-Moment Ryan S.J.d. Baker, Department of Social Science and Policy Studies, Worcester

More information

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining + Business Intelligence. Integration, Design and Implementation Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution

More information

Predicting end-of-year accountability assessment scores from monthly student records in an online tutoring system

Predicting end-of-year accountability assessment scores from monthly student records in an online tutoring system Predicting end-of-year accountability assessment scores from monthly student records in an online tutoring system Nathaniel O. Anozie and Brian W. Junker Department of Statistics Carnegie Mellon University

More information

Machine Learning. 01 - Introduction

Machine Learning. 01 - Introduction Machine Learning 01 - Introduction Machine learning course One lecture (Wednesday, 9:30, 346) and one exercise (Monday, 17:15, 203). Oral exam, 20 minutes, 5 credit points. Some basic mathematical knowledge

More information

Presentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering

Presentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen 9-October 2015 Presentation by: Ahmad Alsahaf Research collaborator at the Hydroinformatics lab - Politecnico di

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Towards Independent Subspace Analysis in Controlled Dynamical Systems

Towards Independent Subspace Analysis in Controlled Dynamical Systems Towards Independent Subspace Analysis in Controlled Dynamical Systems Neural Information Processing Group, Department of Information Systems, Eötvös Loránd University, Budapest, Hungary ICA Research Network

More information

Audit Analytics. --An innovative course at Rutgers. Qi Liu. Roman Chinchila

Audit Analytics. --An innovative course at Rutgers. Qi Liu. Roman Chinchila Audit Analytics --An innovative course at Rutgers Qi Liu Roman Chinchila A new certificate in Analytic Auditing Tentative courses: Audit Analytics Special Topics in Audit Analytics Forensic Accounting

More information

Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

More information

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

More information

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

More information

Evaluation of Machine Learning Techniques for Green Energy Prediction

Evaluation of Machine Learning Techniques for Green Energy Prediction arxiv:1406.3726v1 [cs.lg] 14 Jun 2014 Evaluation of Machine Learning Techniques for Green Energy Prediction 1 Objective Ankur Sahai University of Mainz, Germany We evaluate Machine Learning techniques

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

DATA MINING IN FINANCE

DATA MINING IN FINANCE DATA MINING IN FINANCE Advances in Relational and Hybrid Methods by BORIS KOVALERCHUK Central Washington University, USA and EVGENII VITYAEV Institute of Mathematics Russian Academy of Sciences, Russia

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Customer Data Mining and Visualization by Generative Topographic Mapping Methods

Customer Data Mining and Visualization by Generative Topographic Mapping Methods Customer Data Mining and Visualization by Generative Topographic Mapping Methods Jinsan Yang and Byoung-Tak Zhang Artificial Intelligence Lab (SCAI) School of Computer Science and Engineering Seoul National

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

A survey on click modeling in web search

A survey on click modeling in web search A survey on click modeling in web search Lianghao Li Hong Kong University of Science and Technology Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models

More information

Assessing the Learning and Transfer of Data Collection Inquiry Skills Using Educational Data Mining on Students Log Files

Assessing the Learning and Transfer of Data Collection Inquiry Skills Using Educational Data Mining on Students Log Files Assessing the Learning and Transfer of Data Collection Inquiry Skills Using Educational Data Mining on Students Log Files Michael A. Sao Pedro, Janice D. Gobert, and Ryan S.J.d. Baker {mikesp, jgobert,

More information

Scalable Machine Learning to Exploit Big Data for Knowledge Discovery

Scalable Machine Learning to Exploit Big Data for Knowledge Discovery Scalable Machine Learning to Exploit Big Data for Knowledge Discovery Una-May O Reilly MIT MIT ILP-EPOCH Taiwan Symposium Big Data: Technologies and Applications Lots of Data Everywhere Knowledge Mining

More information

Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University)

Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University) 260 IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.6, June 2011 Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case

More information