Codewebs: Scalable Homework Search for Massive Open Online Programming Courses
|
|
|
- Marcus Simmons
- 10 years ago
- Views:
Transcription
1 Codewebs: Scalable Homework Search for Massive Open Online Programming Courses Leonidas Guibas Andy Nguyen Chris Piech Jonathan Huang Educause, 11/6/2013
2 Massive Scale Education 4.5 million users 750,000 users 1.2 million users
3 Complex and informative feedback in MOOCs Multiple choice Coding assignments Proofs Essay questions Short Response Easy to automate Limited ability to ask expressive questions or require creativity Hard to grade automatically Long Response Can assign complex assignments and provide complex feedback 3
4 Binary feedback for coding questions Linear Regression submission (Homework 1) for Coursera s ML class Test Inputs function [theta, J_history] = gradientdescent(x, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta % theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha What would a human grader m = length(y); % number of training examples J_history = zeros(num_iters, 1); (TA) do? for iter = 1:num_iters theta = theta-alpha*1/m*(x'*(x*theta-y)); J_history(iter) = computecost(x, y, theta); end Test Outputs Correct / Incorrect? 4
5 Complex, informative feedback function [theta, J_history] = gradientdescent(x, y, theta, alpha, num_iters) m = length(y); J_history = zeros(num_iters, 1); for iter = 1:num_iters hypo = X*theta; newmat = hypo y; trans1 = (X(:,1)); trans1 = trans1 ; newmat1 = trans1 * newmat; temp1 = sum(newmat1); temp1 = (temp1 *alpha)/m; A = [temp1]; theta(1) = theta(1) - A; trans2 = (X(:,2)) ; newmat2 = trans2*newmat; temp2 = sum(newmat2); temp2 = (temp2 *alpha)/m; B = [temp2]; theta(2)= theta(2) - B; J_history(iter) = computecost(x, y, theta); end theta(1) = theta(1) theta(2)= theta(2); Better: theta = theta-(alpha/m) Why?? *X'*(X*theta-y) Correctness Efficiency Style Elegance Good Good Poor Poor 5
6 Technical Challenges sum(x'*(x*theta-y)); sum(((theta *X ) -y) *X); sum(transpose(x*theta-y)*x); Source code similarity Scalability
7 The Data Code Web: Improving Life for Future Tas John, Andy, Chris, Leo
8 Coursera Machine Learning > 1 million submissions
9 Why Care? Code Web: Improving Life for Future Tas John, Andy, Chris, Leo
10 The feedback paradox Fast feedback is key for student success [Dihoff 04] Human feedback is an overwhelming time commitment for teachers [Sadler 06].
11 Cost disease
12 Grand Challenge Automatically provide feedback to students (for programming assignments)
13 Basic Idea Code Web: Improving Life for Future Tas John, Andy, Chris, Leo
14 [Sad Teacher] Input: many ungraded submissions for a programming assignment
15 [Happy Teacher] Output: annotated submissions
16 [Effective [Happy Teacher] Force multiply teacher effort.
17 Build A Homework Search Engine Code Web: Improving Life for Future Tas John, Andy, Chris, Leo
18 Abstract syntax tree representations function A = warmupexercise() A = []; A = eye(5); endfunction ASSIGN IDENT (A) INDEX_EXP ASTs ignore: Whitespace Comments IDENT (eye) ARGUMENT_LIST CONST (5)
19 Abstract syntax tree representations function A = warmupexercise() A = []; A = eye(5); endfunction ASTs ignore: Whitespace Comments
20 Indexing documents by phrases blue sky yellow submarine The bright and blue butterfly hangs on the breeze term/phrase best {1,3} blue {2,4,6} document list What basic queries should code We all something something yellow submarine search engine bright support? {7,8,10,11,12} heat {1,5,13} kernel {2,5,6,9,56} sky {1,2} submarine {2,3,4} woes {10,19,38} yellow {2,4}
21 Find the Feet That Fit Query ASTs that match on context
22 Find the Feet That Fit Query ASTs that match on context
23 Runtime (seconds) Runtime (seconds) Running time for indexing # ASTs indexed Average AST size (# nodes)
24 Subtree frequency Zipf s Law for code phrases starter code elbow Subtree rank
25 Great!
26 So what?
27 Data driven equivalence classes of code Code phrases Web: Improving Life for Future Tas John, Andy, Chris, Leo
28 There is Structure to Code
29 Exponential Solution Space def dosomething() { Part A Part B Part C 500 ways 200 ways 200 ways Without factoring: params = With factoring: params = }
30 Say you have two sub-forests Sub-forest Solution A Solution B
31 Each sub-forest has a complement Solution A Solution B
32 If those complements are equal Solution A Solution B
33 And the programs have the same output Solution A Solution B
34 The sub-forests were analogous in that context Code Block A Code Block B
35 {m} m rows (X) rows (y) size (X, 1) length (y) size (y, 1) length (x (:, 1)) length (X) size (X) (1) {alphaoverm} alpha / {m} 1 / {m} * alpha alpha.* (1 / {m}) alpha./ {m} alpha * inv ({m}) alpha * (1./ {m}) 1 * alpha / {m} alpha * pinv ({m}) alpha * 1./ {m} alpha.* 1 / {m} 1.* alpha./ {m} alpha * (1 / {m}).01 / {m} alpha.* (1./ {m}) alpha * {m} ^ -1 {hypothesis} (X * theta) (theta' * X')' [X] * theta (X * theta (:)) theta(1) + theta (2) * X (:, 2) sum(x.*repmat(theta',{m},1), 2) {residual} (X * theta - y) (theta' * X' - y')' ({hypothesis} - y) ({hypothesis}' - y )' [{hypothesis} - y] sum({hypothesis} - y, 2)
36 % remaining ASTs after canonicalization Reduction in #ASTs via canonicalization equivalence More class equivalence classes 90 means more reduction 80 means fewer ASTs We get better 10 results on the more common 0 ASTs # unique ASTs considered (Ordered by frequency of AST)
37 # submissions covered (out of 40,000) How many submissions can we give feedback to with fixed effort? with 25 ASTs marked with 200 ASTs marked # equivalence classes
38 # submissions covered (out of 40,000) How many submissions can we give feedback to with fixed effort? with 25 ASTs marked with 200 ASTs marked Equivalences + 25 marked No Equivalences marked # equivalence classes
39 Syntactic bug isolation using Codewebs Web: Improving Life for Future Tas John, Andy, Chris, Leo
40 Recall Bug detection evaluation (Difficult to evaluate localization!!) Each point represents a single coding problem in Coursera s ML class regularized backpropagation for neural nets Precision
41 Higher is better F-score Bug detection evaluation with canonicalization without canonicalization # unique ASTs considered
42 Concrete Example Code Web: Improving Life for Future Tas John, Andy, Chris, Leo
43 Case study: The extraneous sum function [theta, J_history] = gradientdescent(x, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta % theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha Syntax based approach: m = length(y); % number of training examples J_history = zeros(num_iters, 1); Attach this message to everyone containing that exact expression for iter = 1:num_iters theta = theta-alpha*1/m*(x'*(x*theta-y)); J_history(iter) (covers = computecost(x, 99 submissions) y, theta); end function [theta, J_history] = gradientdescent(x, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent Dear to learn Lisa Simpson, theta consider the % theta = GRADIENTDESCENT(X, y, theta, dimension alpha, num_iters) of the expression: updates theta by % taking num_iters gradient steps with learning rate alpha m = length(y); % number of training examples J_history = zeros(num_iters, 1); Correct X'*(X*theta-y) and what happens after you call sum on it for iter = 1:num_iters theta = theta-alpha*1/m*sum(x'*(x*theta-y)); J_history(iter) = computecost(x, y, theta); end Incorrect
44 The extraneous sum bug takes many forms theta = theta-alpha*1/m*sum(x'*(x*theta-y)); theta = theta-alpha*1/m*sum(((theta *X ) -y) *X); theta = theta-alpha*1/m*sum(transpose(x*theta-y)*x); (Easier) Output based approach: Attach this message to everyone containing that exact expression (covers 1091 submissions)
45 Codewebs approach to feedback theta = theta-alpha*1/m*sum(x'*(x*theta-y)); Step 1: Find equivalent ways of writing buggy expression using Codewebs engine Step 2: Write a thoughtful/meaningful hint or explanation ~47% improvement over Step 3: Propagate just feedback using an message output based to any submission containing feedback equivalent system!! expression 1091 Output based 1208 Codewebs 1604 Combined # submissions covered by single message
46 Big data as a problem: can we give human quality feedback to a million code submissions? Big data as a solution: structure (clustering, subtree equivalence) of the solution space invisible without dense sampling! Where we re going next: Extensions to other languages Applications to new problem domains Data association problem for variables Dynamic analysis Temporal analysis
47 How do solutions evolve modeling the human creative process Feedback on progress, not just on final solution Think beyond computer science, beyond education
Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. [email protected]
Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang [email protected] http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel
Programming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
Learning Program Embeddings to Propagate Feedback on Student Code
Chris Piech Jonathan Huang Andy Nguyen Mike Phulsuksombati Mehran Sahami Leonidas Guibas [email protected] [email protected] [email protected] [email protected] [email protected]
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
Introduction to Logistic Regression
OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction
Journée Thématique Big Data 13/03/2015
Journée Thématique Big Data 13/03/2015 1 Agenda About Flaminem What Do We Want To Predict? What Is The Machine Learning Theory Behind It? How Does It Work In Practice? What Is Happening When Data Gets
CS/COE 1501 http://cs.pitt.edu/~bill/1501/
CS/COE 1501 http://cs.pitt.edu/~bill/1501/ Lecture 01 Course Introduction Meta-notes These notes are intended for use by students in CS1501 at the University of Pittsburgh. They are provided free of charge
COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers
COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Instructor: ([email protected]) TAs: Pierre-Luc Bacon ([email protected]) Ryan Lowe ([email protected])
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error
14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)
Overview Kyrre Glette kyrrehg@ifi INF3490 Swarm Intelligence Particle Swarm Optimization Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) 3 Swarms in nature Fish, birds,
Designing Programming Exercises with Computer Assisted Instruction *
Designing Programming Exercises with Computer Assisted Instruction * Fu Lee Wang 1, and Tak-Lam Wong 2 1 Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong [email protected]
Introduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
MasteryTrack: System Overview
MasteryTrack: System Overview March 16, 2015 Table of Contents Background Objectives Foundational Principles Mastery is Binary Demonstrating Mastery Scope of Activity Level of Accuracy Time System Elements
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
Big Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
Mining a Change-Based Software Repository
Mining a Change-Based Software Repository Romain Robbes Faculty of Informatics University of Lugano, Switzerland 1 Introduction The nature of information found in software repositories determines what
COS 116 The Computational Universe Laboratory 9: Virus and Worm Propagation in Networks
COS 116 The Computational Universe Laboratory 9: Virus and Worm Propagation in Networks You learned in lecture about computer viruses and worms. In this lab you will study virus propagation at the quantitative
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Machine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou [email protected] Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General
Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University [email protected]
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University [email protected] 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
DYNAMIC QUERY FORMS WITH NoSQL
IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 7, Jul 2014, 157-162 Impact Journals DYNAMIC QUERY FORMS WITH
Cassandra. References:
Cassandra References: Becker, Moritz; Sewell, Peter. Cassandra: Flexible Trust Management, Applied to Electronic Health Records. 2004. Li, Ninghui; Mitchell, John. Datalog with Constraints: A Foundation
Using Adaptive Random Trees (ART) for optimal scorecard segmentation
A FAIR ISAAC WHITE PAPER Using Adaptive Random Trees (ART) for optimal scorecard segmentation By Chris Ralph Analytic Science Director April 2006 Summary Segmented systems of models are widely recognized
Sanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 [email protected] 1. Introduction The field of data mining and knowledgee discovery is emerging as a
Beads Under the Cloud
Beads Under the Cloud Intermediate/Middle School Grades Problem Solving Mathematics Formative Assessment Lesson Designed and revised by Kentucky Department of Education Mathematics Specialists Field-tested
Fall 2012 Q530. Programming for Cognitive Science
Fall 2012 Q530 Programming for Cognitive Science Aimed at little or no programming experience. Improve your confidence and skills at: Writing code. Reading code. Understand the abilities and limitations
Machine Learning and Data Mining -
Machine Learning and Data Mining - Perceptron Neural Networks Nuno Cavalheiro Marques ([email protected]) Spring Semester 2010/2011 MSc in Computer Science Multi Layer Perceptron Neurons and the Perceptron
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
Checking for Dimensional Correctness in Physics Equations
Checking for Dimensional Correctness in Physics Equations C.W. Liew Department of Computer Science Lafayette College [email protected] D.E. Smith Department of Computer Science Rutgers University [email protected]
Big Data Text Mining and Visualization. Anton Heijs
Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark
Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data
CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address
Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014
Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview
Learning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
Chapter 12. Introduction. Introduction. User Documentation and Online Help
Chapter 12 User Documentation and Online Help Introduction When it comes to learning about computer systems many people experience anxiety, frustration, and disappointment Even though increasing attention
Machine Learning Big Data using Map Reduce
Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories
A Python Tour: Just a Brief Introduction CS 303e: Elements of Computers and Programming
A Python Tour: Just a Brief Introduction CS 303e: Elements of Computers and Programming "The only way to learn a new programming language is by writing programs in it." -- B. Kernighan and D. Ritchie "Computers
Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)
Machine Learning Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) What Is Machine Learning? A computer program is said to learn from experience E with respect to some class of
Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research [email protected]
Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research [email protected] Introduction Logistics Prerequisites: basics concepts needed in probability and statistics
Keywords data mining, prediction techniques, decision making.
Volume 5, Issue 4, April 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analysis of Datamining
Lecture 8 February 4
ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt
Recovering Business Rules from Legacy Source Code for System Modernization
Recovering Business Rules from Legacy Source Code for System Modernization Erik Putrycz, Ph.D. Anatol W. Kark Software Engineering Group National Research Council, Canada Introduction Legacy software 000009*
W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015
W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski [email protected] Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems
Machine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
Instant SQL Programming
Instant SQL Programming Joe Celko Wrox Press Ltd. INSTANT Table of Contents Introduction 1 What Can SQL Do for Me? 2 Who Should Use This Book? 2 How To Use This Book 3 What You Should Know 3 Conventions
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
SPECIFICATION BY EXAMPLE. Gojko Adzic. How successful teams deliver the right software. MANNING Shelter Island
SPECIFICATION BY EXAMPLE How successful teams deliver the right software Gojko Adzic MANNING Shelter Island Brief Contents 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Preface xiii Acknowledgments xxii
MONITORING AND DIAGNOSIS OF A MULTI-STAGE MANUFACTURING PROCESS USING BAYESIAN NETWORKS
MONITORING AND DIAGNOSIS OF A MULTI-STAGE MANUFACTURING PROCESS USING BAYESIAN NETWORKS Eric Wolbrecht Bruce D Ambrosio Bob Paasch Oregon State University, Corvallis, OR Doug Kirby Hewlett Packard, Corvallis,
This document contains Chapter 2: Statistics, Data Analysis, and Probability strand from the 2008 California High School Exit Examination (CAHSEE):
This document contains Chapter 2:, Data Analysis, and strand from the 28 California High School Exit Examination (CAHSEE): Mathematics Study Guide published by the California Department of Education. The
A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc.
A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc. Introduction: The Basel Capital Accord, ready for implementation in force around 2006, sets out
Gerard Mc Nulty Systems Optimisation Ltd [email protected]/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I
Gerard Mc Nulty Systems Optimisation Ltd [email protected]/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Data is Important because it: Helps in Corporate Aims Basis of Business Decisions Engineering Decisions Energy
Microsoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql [email protected] http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
Web Data Mining: A Case Study. Abstract. Introduction
Web Data Mining: A Case Study Samia Jones Galveston College, Galveston, TX 77550 Omprakash K. Gupta Prairie View A&M, Prairie View, TX 77446 [email protected] Abstract With an enormous amount of data stored
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits
Outline NP-completeness Examples of Easy vs. Hard problems Euler circuit vs. Hamiltonian circuit Shortest Path vs. Longest Path 2-pairs sum vs. general Subset Sum Reducing one problem to another Clique
Open source business rules management system
JBoss Enterprise BRMS Open source business rules management system What is it? JBoss Enterprise BRMS is an open source business rules management system that enables easy business policy and rules development,
New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
Event driven trading new studies on innovative way. of trading in Forex market. Michał Osmoła INIME live 23 February 2016
Event driven trading new studies on innovative way of trading in Forex market Michał Osmoła INIME live 23 February 2016 Forex market From Wikipedia: The foreign exchange market (Forex, FX, or currency
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. [email protected] Mrs.
CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015
CPSC 340: Machine Learning and Data Mining Mark Schmidt University of British Columbia Fall 2015 Outline 1) Intro to Machine Learning and Data Mining: Big data phenomenon and types of data. Definitions
Data and Analysis. Informatics 1 School of Informatics, University of Edinburgh. Part III Unstructured Data. Ian Stark. Staff-Student Liaison Meeting
Inf1-DA 2010 2011 III: 1 / 89 Informatics 1 School of Informatics, University of Edinburgh Data and Analysis Part III Unstructured Data Ian Stark February 2011 Inf1-DA 2010 2011 III: 2 / 89 Part III Unstructured
Statistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2013 Handout Scanner-Parser Project Thursday, Feb 7 DUE: Wednesday, Feb 20, 9:00 pm This project
Pattern Insight Clone Detection
Pattern Insight Clone Detection TM The fastest, most effective way to discover all similar code segments What is Clone Detection? Pattern Insight Clone Detection is a powerful pattern discovery technology
Recurrent Neural Networks
Recurrent Neural Networks Neural Computation : Lecture 12 John A. Bullinaria, 2015 1. Recurrent Neural Network Architectures 2. State Space Models and Dynamical Systems 3. Backpropagation Through Time
Using Formulas, Functions, and Data Analysis Tools Excel 2010 Tutorial
Using Formulas, Functions, and Data Analysis Tools Excel 2010 Tutorial Excel file for use with this tutorial Tutor1Data.xlsx File Location http://faculty.ung.edu/kmelton/data/tutor1data.xlsx Introduction:
Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web
Classification using Logistic Regression
Classification using Logistic Regression Ingmar Schuster Patrick Jähnichen using slides by Andrew Ng Institut für Informatik This lecture covers Logistic regression hypothesis Decision Boundary Cost function
Sample Fraction Addition and Subtraction Concepts Activities 1 3
Sample Fraction Addition and Subtraction Concepts Activities 1 3 College- and Career-Ready Standard Addressed: Build fractions from unit fractions by applying and extending previous understandings of operations
Experiences with Online Programming Examinations
Experiences with Online Programming Examinations Monica Farrow and Peter King School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh EH14 4AS Abstract An online programming examination
Lecture 1: Course overview, circuits, and formulas
Lecture 1: Course overview, circuits, and formulas Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: John Kim, Ben Lund 1 Course Information Swastik
Chapter 4: Artificial Neural Networks
Chapter 4: Artificial Neural Networks CS 536: Machine Learning Littman (Wu, TA) Administration icml-03: instructional Conference on Machine Learning http://www.cs.rutgers.edu/~mlittman/courses/ml03/icml03/
Audit Analytics. --An innovative course at Rutgers. Qi Liu. Roman Chinchila
Audit Analytics --An innovative course at Rutgers Qi Liu Roman Chinchila A new certificate in Analytic Auditing Tentative courses: Audit Analytics Special Topics in Audit Analytics Forensic Accounting
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
Information Visualization WS 2013/14 11 Visual Analytics
1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and
Big Data & Scripting Part II Streaming Algorithms
Big Data & Scripting Part II Streaming Algorithms 1, 2, a note on sampling and filtering sampling: (randomly) choose a representative subset filtering: given some criterion (e.g. membership in a set),
Analytics on Big Data
Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis
Linear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
Orthogonal Projections and Orthonormal Bases
CS 3, HANDOUT -A, 3 November 04 (adjusted on 7 November 04) Orthogonal Projections and Orthonormal Bases (continuation of Handout 07 of 6 September 04) Definition (Orthogonality, length, unit vectors).
Equity forecast: Predicting long term stock price movement using machine learning
Equity forecast: Predicting long term stock price movement using machine learning Nikola Milosevic School of Computer Science, University of Manchester, UK [email protected] Abstract Long
Simulated learners in peers assessment for introductory programming courses
Simulated learners in peers assessment for introductory programming courses Alexandre de Andrade Barbosa 1,3 and Evandro de Barros Costa 2,3 1 Federal University of Alagoas - Arapiraca Campus, Arapiraca
A Concrete Introduction. to the Abstract Concepts. of Integers and Algebra using Algebra Tiles
A Concrete Introduction to the Abstract Concepts of Integers and Algebra using Algebra Tiles Table of Contents Introduction... 1 page Integers 1: Introduction to Integers... 3 2: Working with Algebra Tiles...
The Goldberg Rao Algorithm for the Maximum Flow Problem
The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
