Two big challenges in machine learning LÉON BOTTOU
|
|
|
- Opal Chandler
- 10 years ago
- Views:
Transcription
1 Two big challenges in machine learning LÉON BOTTOU FACEBOOK AI RESEARCH ICML LILLE
2 Machine Learning in 2015 An established academic discipline o that attracts bright students. Pervasive applications o all over the Internet o and sometimes in the real world. Massive investments. Massive returns. more logos in your ICML booklet
3 Machine Learning in 2015 An established academic discipline o that attracts bright students. Pervasive applications o all over the Internet o and sometimes in the real world. Massive investments. Massive returns. more logos in your ICML booklet
4 Machine Learning in 2015 An established academic discipline o that attracts bright students. Pervasive applications o all over the Internet o and sometimes in the real world. Massive investments. Massive returns. more logos in your ICML booklet
5 Challenges of a new kind 1. Machine learning disrupts software engineering. à Challenging our impact on real world applications. 2. Our experimental paradigm is reaching its limits. à Challenging the speed of our scientific progress. 3. Elements of a solution
6 Machine Learning disrupts software engineering
7 Engineering complex artifacts Counting parts down to the smallest screw Automobile: ~ 30,000 parts Airplane: ~ 3,000,000 parts Specifications and replication The engineer does not design the alloy and the threads of each screw. Standard screws come with known specifications (size, strength, weight, ) Many parts and many assemblies are identical.
8 Abstraction in engineering Thinking with the specifications instead of the details. part specifications instead of part details. assembly specifications instead of assembly details. Abstractions leak! Sometimes one needs to look more closely. Example: engine cooling assembly can have complicated heat inertia. Example: global resource allocation (cost, weight, etc.) Abstraction leaks limits the complexity of what we can build.
9 Abstraction in mathematics and logic Pure abstraction belongs to mathematics and logic When we use a theorem, we do not need to revisit its proof. Mathematical abstractions do not leak. Computer programs are similar to mathematical proofs (in theory) Design with contracts knowing the specification of a module is enough. In practice, there are abstraction leaks ( but less. )
10 Computer programs The closer to logic, the more complex the artifacts Automobile: ~ 30,000 parts Airplane: ~ 3,000,000 parts Processor: ~ 3,000,000,000 transistors (note : counts include many identical parts and assemblies.) MS Office: ~ 40,000,000 LOCS Facebook: ~ 60,000,000 LOCS Debian : ~ 400,000,000 LOCS (note: different metric: lines of codes are rarely identical.)
11 Programming versus learning Two conceptions of computing Frank Rosenblatt Digital computers (that one programs) are everywhere. Programming makes it easy to leverage the work of others.
12 Programming versus learning Programming has real advantages Minsky and Papert, Perceptrons, Remark This book is not just about Rosenblatt s perceptron! What Minsky and Papert call an order- k perceptron covers most machine learning models, including the convolutional neural networks that are changing computer vision.
13 Connectedness Is a shape made of a single connected component? M&P prove that a small- order perceptron cannot say, but a simple algorithm can. Such an algorithm fulfils a clear contract: verifying connectedness.
14 Is connectedness easy for us?
15 What is easy for us? This shape represents a mouse This shape represents a piece of cheese
16 Why is learning gaining momentum? Since connectedness has a clear mathematical specification, we can envision provable algorithms. Since mousiness and cheesiness do not have such a specification, we cannot envision provable algorithms. We must rely on - heuristic specifications - or learning techniques. Connectedness? Mousiness? Cheesiness?
17 Why is learning gaining momentum? We must rely on - heuristic specifications (e.g. rules) - or learning techniques. Big data and big computing power When the volume of data increases - defining heuristic specifications becomes harder. - training learning systems becomes more effective.
18 Programming versus learning Two conceptions of computing Frank Rosenblatt
19 Integrating machine learning in large software systems Machine learning systems must* work with digital software because digital computers are everywhere. Use trained models as software modules. Use learning algorithms as software modules. * at least until we solve AI
20 Trained models as software modules DeepVisotron,, detects 1000 object categories with less than 1% errors. What is the nature of the contract? This does not mean that one rolls a dice for each picture. This statement refers to a specific testing set. The error guarantee is lost if the image distribution changes.
21 Weak contracts A smart programmer makes an inventive use of a sorting routine. The sorting routine fulfils the contract by sorting the data (however strange.) The code of the smart programmer does what was intended. A smart programmer makes an inventive use of a trained object recognizer. The object recognizer receives data that does not resemble the testing data and outputs nonsense. The code of the smart programmer does not work.
22 Weak contracts A smart programmer makes an inventive use of a sorting routine. The sorting routine fulfils the contract by sorting the data (however strange.) The code of the smart programmer does what was intended. A smart programmer makes an inventive use of a trained object recognizer. The object recognizer receives data that does not resemble the testing data and outputs nonsense. The code of the smart programmer does not work. Collecting new data and retraining may help or may not!
23 Learning algorithms as software modules
24 Example 1 Click curves Many web sites offer lists of items ( search results, recommendations, ads ) and select them by modeling click probabilities.
25 A common click model P click context, item, pos = F pos G(context, item) Position effect click curve Item effect clickability Why separating position effect and item effect? Click probabilities of this form allow a greedy placement algorithm: 1. Sort items by clickability G(context, item). 2. Allocate them greedily to the best positions.
26 A common click model Why separating position effect and item effect? Click probabilities of this form work well in auction theory results. There are easy ways to estimate F(position) and G(context, item). How incorrect is this model? All ML models are incorrect. Model misspecification should only degrade the performance gracefully. How bad can things go?
27 How bad things can go? Assume there are only two contexts C1 and C2 and two positions P1,P2 Assume the true click probability is P click context, item, pos = F context, pos G (context, item) F P1 P2 C C G R1 R2 R3 R4 C C Our model estimates some intermediate click curve F pos
28 How bad can things go in context C1? True position effect steeper than estimated. F P1 < F C1, P1 overestimate clickability of items shown in P1 F P2 > F C1, P2 underestimate clickability of items shown in P2 Combined with the greedy placement algorithm. After retraining, whatever we placed in position P1 looks better than it really is. Therefore we keep placing it in position P1. After retraining, whatever we placed in position P2 looks worse than it really is. Therefore we keep placing it in position P2 (or stop showing it.)
29 How bad can things go in context C1? True position effect steeper than estimated. F P1 < F C1, P1 overestimate clickability of items shown in P1 F P2 > F C1, P2 underestimate clickability of items shown in P2 Combined with the greedy placement algorithm. After retraining, whatever we placed in position P1 looks better than it really is. Therefore we keep placing it in position P1. After retraining, whatever we placed in position P2 looks worse than it really is. Therefore we keep placing it in position P2 (or stop showing it.)
30 How bad can things go in context C2? True position effect less steep than estimated. F P1 > F C1, P1 underestimate clickability of items shown in P1 F P2 < F C1, P2 overestimate clickability of items shown in P2 Combined with the greedy placement algorithm. After retraining, whatever we placed in position P1 looks worse than it really is. Therefore we might move it down to position P2. After retraining, whatever we placed in position P2 looks better than it really is. Therefore we might move it up position P1.
31 How bad can things go in context C2? True position effect less steep than estimated. F P1 > F C1, P1 underestimate clickability of items shown in P1 F P2 < F C1, P2 overestimate clickability of items shown in P2 Combined with the greedy placement algorithm. After retraining, whatever we placed in position P1 looks worse than it really is. Therefore we might move it down to position P2. After retraining, whatever we placed in position P2 looks better than it really is. Therefore we might move it up position P1.
32 Feedback loops in machine learning Information (signal) feedback loops are everywhere. They are central to adaptation and learning Norbert Wiener, Cybernetics, 1948 See (Bottou et al., JMLR 2013) for a possible treatment of causal loops.
33 Example 2 Team work Red Team Owns the code that selects content to recommend. Blue Team Owns the code that selects fonts and colors for all page components.
34 Red team The red team knows the bandit literature There are lots of potential recommendations to select from. One can only model the click probabilities of the recommendations that have been shown often enough in each context. Therefore the red team code performs ε- greedy exploration, showing a small proportion of random recommendations. Although exploration has a cost, such the system cannot discover the most relevant recommendations without it.
35 Blue team The blue team also knows machine learning. The job of the blue team is to emphasize things that the user will like. The blue team code runs a click probability model to spot what users like. The click probability model easily discovers that some of the recommendations generated by the red team are not very good. Therefore these recommendations will be shown with a smaller font.
36 Blue team The blue team also knows machine learning. The job of the blue team is to emphasize things that the user will like. The blue team code runs a click probability model to spot what users like. The click probability model easily discovers that some of the recommendations generated by the red team are not very good. Therefore these recommendations will be shown with a smaller font.
37 Red team The red team is very content! Analysis of the exploration traffic shows that there isn t much to improve. à they can safely reduce the exploration level! Whenever they try new features in the click model, randomized testing shows a reduction in performance à it is amazing, but it seems they got their model right the first time!
38 Red team The red team is very content! Analysis of the exploration traffic shows that there isn t much to improve. à they can safely reduce the exploration level! Whenever they try new features in the click model, randomized testing shows a reduction in performance à it is amazing, but it seems they got their model right the first time!
39 Paying attention A manager of the two teams should have paid more attention. The software architect should have paid more attention. Micro- management does not work because it does not scale
40 beyond two teams
41 Conclusion of part 1 Machine learning in large software systems Use trained models as software modules. à problematic because trained models offer weak contracts. Use learning algorithms as software modules. à problematic because the output of the learning algorithm depends on the training data which itself depends on every other module. Working around these difficulties could save billions
42 Our experimental paradigm is reaching its limits
43 Minsky and Papert 1968, F.A.Q. Are we dealing with an exact science (like mathematics) or an empirical science (like physics)?
44 Essential epistemology Exact sciences Experimental sciences Engineering Deals with Axioms & Theorems Facts & Theories Artifacts Truth is Forever Temporary It works. Examples Mathematics C.S. theory Physics Biology Lots
45 Essential epistemology Exact sciences Deals&with Exact& sciences Axioms&& Theorems Experimental sciences Facts&& Theories Engineering Artifacts Truth is Forever Temporary It&works. Examples Mathematics C.S. theory Physics Biology C.S. Exact sciences Deal with axioms and theorems. Axioms are neither true or false. Theorems are true once proven. They remain true forever. Examples o Mathematics o Theoretical C.S. (e.g., algorithms, type theory, ) o Theoretical Physics (e.g., string theory)
46 Essential epistemology Experimental sciences Deals&with Exact& sciences Axioms&& Theorems Experimental sciences Facts&& Theories Engineering Artifacts Truth is Forever Temporary It&works. Examples Mathematics C.S. theory Physics Biology The word true has a different meaning. C.S. Experimental sciences Deal with facts and theories. Facts are true when they can be replicated. Theories are true when they predict the facts. New facts can invalidate theories that were previously considered true (K. Popper) Examples o Physics, Biology, o Social sciences, Experimental psychology,
47 Essential epistemology Engineering Deals&with Exact& sciences Axioms&& Theorems Experimental sciences Facts&& Theories Engineering Artifacts Truth is Forever Temporary It&works. Engineering Deals with artifacts. Examples Mathematics C.S. theory Physics Biology C.S. A claim is true when the artifact works, (the artifact fulfils a contract.) Examples o Mechanical engineering, o Nuclear engineering, o Computer science.
48 Machine learning? What is machine learning? An engineering discipline? An exact science? An experimental science? Yes: applications abound. Yes: statistical/algorithmic learning theory. Yes: papers often report on experiments. This is all good
49 Machine learning? What is machine learning? An engineering discipline? An exact science? An experimental science? Yes: applications abound. Yes: statistical/algorithmic learning theory. Yes: most papers include experiments. This is all good as long as we don t mix the genres. My software works really well, therefore my theory is true. This is NP- complete, therefore this experiment is wrong.
50 ML as an experimental science Some problems of interest do not have a clear specification. They are not amenable to provable algorithms. Machine learning replaces the missing specification by lots of data. Important aspects of the data cannot be described by a compact mathematical statement (otherwise we have a specification!) Therefore experimentation is necessary!
51 ML as an experimental science Progress during the last few decades has been driven by a single experimental paradigm!
52 ML as an experimental science Progress during the last few decades has been driven by a single experimental paradigm! This is unusual for an experimental science! Physicists, biologists, experimental psychologists,..., must work a lot harder to design experiments and interpret their results.
53 Where does the data come from? How to improve the performance of a ML application? Get better algorithm? + Get better features? +++ Get better data? Data collection is hard work The data distribution must match the operational conditions. Rely on manual data curation to avoid dataset bias.
54 The impact of dataset bias Training/testing on biased datasets gives unrealistic results. E.g. : Torralba and Efros, Unbiased look at dataset bias, CVPR 2011.
55 Increased ambitions: Big Data Too much data for manual curation Big data exists because data collection is automated. Nobody checks that the data distribution matches the operational conditions of the system. à All large scale datasets are biased. Training/testing on a big dataset can give unrealistic results.
56 Increased ambitions: A.I. Statistical machine learning Building a classifier that works well under a certain distribution. Machine learning for artificial intelligence Building a classifier that recognizes a concept. Concepts exist independently of the pattern distribution. Training data will never cover the whole range of possible inputs. In fact, a system that recognizes a concept fulfils a stronger contract than a classifier that works well under a certain distribution.
57 Concepts Statistics Example: detection of action giving a phone call (Oquab et al., CVPR 2014)
58 Concepts Statistics Computer vision is not a statistical problem Car examples in ImageNet Is this less of a car because the context is wrong?
59 Concepts Statistics Convolutional networks can be fooled. (Nguyen et al, CVPR 2015)
60 Concepts Statistics Caption generation Eight independent papers in According to the BLEU score, some of these systems perform better than humans They met in Berkeley in January Evaluation is very difficult. Example of difficulty : caption generation vs retrieval From a talk of ( Zitnick 14 )
61 Conclusion of part 2 Training/testing only goes so far Training/testing is an unusually convenient experimental paradigm à it makes experimentation easier than in other sciences à it has played a role in the fast progress of ML. but may not fulfil our increased ambition (big data, AI.) à the end of an anomaly? Working around this could save decades.
62 Elements of a solution
63 1. Process challenges Challenges of a new kind We are used to face technical challenges. These are process challenges.
64 1. Process challenges Engineering process There is a rich literature about the software engineering process. What should be the machine learning engineering process?
65 1. Process challenges Scientific process We cannot solely rely on training/testing experiments. We can understand more things with ad hoc experiments. Example in question answering : the BABI tasks (Bordes et al., 2015) o questions that require combining one/two/more factoids. o questions that require spatial/temporal reasoning. o etc. What are the best practices in other experimental sciences?
66 1. Process challenges A constructive proposition Machine learning papers rarely show the limits of their approach. - For fear that reviewers will use them against their work. Reviewers should instead demand that authors discuss these limits. - Additional credits if they illustrate these limits with experiments. - This would make the papers more informative - and would therefore facilitate scientific progress.
67 2. ML with stronger contracts How to better resist distribution shifts? Optimize measure of coverage instead of average accuracy. - Selective classification (El Yaniv s, ) - KWIK (Li & Littman) - Conformal learning (Vovk, ) Very interesting properties when the data is Zipf distributed
68 2. ML with stronger contracts Zipf distributed patterns way enough data to train not enough data to train Queries sorted in frequency order
69 2. ML with stronger contracts Doubling the size of a Zipf distributed dataset. Diminishing returns for average accuracy improvements. 2x No diminishing returns on number of queries for which we can learn correct answers.
70 3 How to package the work of others Digital computers : software Learning machines : Trained module as a software component? Trained components only offer weak contracts. Training software? If you can get the training data and replicate the rig. Recent example : AlexNet. Task specific trained features Nearly as good.
71 Example: face recognition Interesting task: Recognizing the faces of 10 6 persons. How many labeled images per person can we obtain? Auxiliary task: Do these faces belong to the same person? Two faces in the same picture usually are different persons. Two faces in successive frames are often the same person. (Matt Miller, NEC, 2006)
72 Example: NLP tagging (Collobert, Weston, et al., )
73 Example: object recognition Dogs in ImageNet (~10 6 dogs) Dogs in Pascal VOC (only ~10 4 imgages)
74 Example: object recognition Several CVPR 2014 papers figure from (Oquab et al., 2014)
75 Conclusion
76 Two process challenges Machine learning disrupts software engineering. à Challenging our impact on real world applications. Our experimental paradigm is reaching its limits. à Challenging the speed of our scientific progress.
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
Intelligent Heuristic Construction with Active Learning
Intelligent Heuristic Construction with Active Learning William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather E H U N I V E R S I T Y T O H F G R E D I N B U Space is BIG! Hubble Ultra-Deep Field
Why is Internal Audit so Hard?
Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets
CSE 517A MACHINE LEARNING INTRODUCTION
CSE 517A MACHINE LEARNING INTRODUCTION Spring 2016 Marion Neumann Contents in these slides may be subject to copyright. Some materials are adopted from Killian Weinberger. Thanks, Killian! Machine Learning
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected]
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected] Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual
Learning to Process Natural Language in Big Data Environment
CCF ADL 2015 Nanchang Oct 11, 2015 Learning to Process Natural Language in Big Data Environment Hang Li Noah s Ark Lab Huawei Technologies Part 1: Deep Learning - Present and Future Talk Outline Overview
Mining the Software Change Repository of a Legacy Telephony System
Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski [email protected] Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training
Find what matters. Information Alchemy Turning Your Building Data Into Money
Find what matters Information Alchemy Turning Your Building Data Into Money version 1.1 Feb 2012 CONTENTS Information Alchemy Transforming Data Into Value... 2 How Does My Building Really Perform?... 2
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
Strategic Plan Proposal: Learning science by experiencing science: A proposal for new active learning courses in Psychology
Strategic Plan Proposal: Learning science by experiencing science: A proposal for new active learning courses in Psychology Contacts: Jacob Feldman, ([email protected], 848-445-1621) Eileen Kowler
What is Data Science? Data, Databases, and the Extraction of Knowledge Renée T., @becomingdatasci, November 2014
What is Data Science? { Data, Databases, and the Extraction of Knowledge Renée T., @becomingdatasci, November 2014 Let s start with: What is Data? http://upload.wikimedia.org/wikipedia/commons/f/f0/darpa
Simple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
Sanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 [email protected] 1. Introduction The field of data mining and knowledgee discovery is emerging as a
T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN
T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN Goal is to process 360 degree images and detect two object categories 1. Pedestrians,
Measurement and Metrics Fundamentals. SE 350 Software Process & Product Quality
Measurement and Metrics Fundamentals Lecture Objectives Provide some basic concepts of metrics Quality attribute metrics and measurements Reliability, validity, error Correlation and causation Discuss
Statistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski [email protected] Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems
Dan French Founder & CEO, Consider Solutions
Dan French Founder & CEO, Consider Solutions CONSIDER SOLUTIONS Mission Solutions for World Class Finance Footprint Financial Control & Compliance Risk Assurance Process Optimization CLIENTS CONTEXT The
INTRUSION PREVENTION AND EXPERT SYSTEMS
INTRUSION PREVENTION AND EXPERT SYSTEMS By Avi Chesla [email protected] Introduction Over the past few years, the market has developed new expectations from the security industry, especially from the intrusion
Introduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
8. KNOWLEDGE BASED SYSTEMS IN MANUFACTURING SIMULATION
- 1-8. KNOWLEDGE BASED SYSTEMS IN MANUFACTURING SIMULATION 8.1 Introduction 8.1.1 Summary introduction The first part of this section gives a brief overview of some of the different uses of expert systems
Using Retrocausal Practice Effects to Predict On-Line Roulette Spins. Michael S. Franklin & Jonathan Schooler UCSB, Department of Psychology.
Using Retrocausal Practice Effects to Predict On-Line Roulette Spins Michael S. Franklin & Jonathan Schooler UCSB, Department of Psychology Summary Modern physics suggest that time may be symmetric, thus
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
Lecture 6: Classification & Localization. boris. [email protected]
Lecture 6: Classification & Localization boris. [email protected] 1 Agenda ILSVRC 2014 Overfeat: integrated classification, localization, and detection Classification with Localization Detection. 2 ILSVRC-2014
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
Introduction to computer science
Introduction to computer science Michael A. Nielsen University of Queensland Goals: 1. Introduce the notion of the computational complexity of a problem, and define the major computational complexity classes.
Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.
Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative
Fall 2012 Q530. Programming for Cognitive Science
Fall 2012 Q530 Programming for Cognitive Science Aimed at little or no programming experience. Improve your confidence and skills at: Writing code. Reading code. Understand the abilities and limitations
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three
CHAPTER 7 GENERAL PROOF SYSTEMS
CHAPTER 7 GENERAL PROOF SYSTEMS 1 Introduction Proof systems are built to prove statements. They can be thought as an inference machine with special statements, called provable statements, or sometimes
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 313]
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 313] File Structures A file is a collection of data stored on mass storage (e.g., disk or tape) Why on mass storage? too big to fit
Keywords the Most Important Item in SEO
This is one of several guides and instructionals that we ll be sending you through the course of our Management Service. Please read these instructionals so that you can better understand what you can
Why Study NP- hardness. NP Hardness/Completeness Overview. P and NP. Scaling 9/3/13. Ron Parr CPS 570. NP hardness is not an AI topic
Why Study NP- hardness NP Hardness/Completeness Overview Ron Parr CPS 570 NP hardness is not an AI topic It s important for all computer scienhsts Understanding it will deepen your understanding of AI
CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS
CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS In today's scenario data warehouse plays a crucial role in order to perform important operations. Different indexing techniques has been used and analyzed using
MACHINE LEARNING BASICS WITH R
MACHINE LEARNING [Hands-on Introduction of Supervised Machine Learning Methods] DURATION 2 DAY The field of machine learning is concerned with the question of how to construct computer programs that automatically
Learning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
Master's projects at ITMO University. Daniil Chivilikhin PhD Student @ ITMO University
Master's projects at ITMO University Daniil Chivilikhin PhD Student @ ITMO University General information Guidance from our lab's researchers Publishable results 2 Research areas Research at ITMO Evolutionary
Chapter 1: Health & Safety Management Systems (SMS) Leadership and Organisational Safety Culture
Chapter 1: Health & Safety Management Systems (SMS) Leadership and Organisational Safety Culture 3 29 Safety Matters! A Guide to Health & Safety at Work Chapter outline Leadership and Organisational Safety
From Particles To Electronic Trading. Simon Bevan
From Particles To Electronic Trading Simon Bevan May 13th, 2015 Introduction For the first few slides we will aim to give you a feeling of what high frequency trading means and the arguments for and against
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
Data Science and Prediction*
Data Science and Prediction* Vasant Dhar Professor Editor-in-Chief, Big Data Co-Director, Center for Business Analytics, NYU Stern Faculty, Center for Data Science, NYU *Article in Communications of the
White Paper: 5GL RAD Development
White Paper: 5GL RAD Development After 2.5 hours of training, subjects reduced their development time by 60-90% A Study By: 326 Market Street Harrisburg, PA 17101 Luis Paris, Ph.D. Associate Professor
APPENDIX N. Data Validation Using Data Descriptors
APPENDIX N Data Validation Using Data Descriptors Data validation is often defined by six data descriptors: 1) reports to decision maker 2) documentation 3) data sources 4) analytical method and detection
Neural Networks in Data Mining
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V6 PP 01-06 www.iosrjen.org Neural Networks in Data Mining Ripundeep Singh Gill, Ashima Department
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Big datasets - promise or Big data, shmig data. D.A. Forsyth, UIUC
Big datasets - promise or Big data, shmig data D.A. Forsyth, UIUC Conclusion Q: What do big datasets tell us? A: Not much, if the emphasis is on size Collecting datasets is highly creative rather than
Improving Decision Making and Managing Knowledge
Improving Decision Making and Managing Knowledge Decision Making and Information Systems Information Requirements of Key Decision-Making Groups in a Firm Senior managers, middle managers, operational managers,
Core Curriculum to the Course:
Core Curriculum to the Course: Environmental Science Law Economy for Engineering Accounting for Engineering Production System Planning and Analysis Electric Circuits Logic Circuits Methods for Electric
Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
Case Study: Direct Mail Subscriptions Tests
Case Study: Direct Mail Subscriptions Tests Fast, low-cost creative and price tests leading to a 70% jump in response Introduction Magazine publishers depend on direct mail to grow their subscriber base.
LONG BEACH CITY COLLEGE MEMORANDUM
LONG BEACH CITY COLLEGE MEMORANDUM DATE: May 5, 2000 TO: Academic Senate Equivalency Committee FROM: John Hugunin Department Head for CBIS SUBJECT: Equivalency statement for Computer Science Instructor
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
The Assumption(s) of Normality
The Assumption(s) of Normality Copyright 2000, 2011, J. Toby Mordkoff This is very complicated, so I ll provide two versions. At a minimum, you should know the short one. It would be great if you knew
Driving Insurance World through Science - 1 - Murli D. Buluswar Chief Science Officer
Driving Insurance World through Science - 1 - Murli D. Buluswar Chief Science Officer What is The Science Team s Mission? 2 What Gap Do We Aspire to Address? ü The insurance industry is data rich but ü
Bootstrapping Big Data
Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu
Computer Algorithms. NP-Complete Problems. CISC 4080 Yanjun Li
Computer Algorithms NP-Complete Problems NP-completeness The quest for efficient algorithms is about finding clever ways to bypass the process of exhaustive search, using clues from the input in order
CS4025: Pragmatics. Resolving referring Expressions Interpreting intention in dialogue Conversational Implicature
CS4025: Pragmatics Resolving referring Expressions Interpreting intention in dialogue Conversational Implicature For more info: J&M, chap 18,19 in 1 st ed; 21,24 in 2 nd Computing Science, University of
Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
Unit 19: Probability Models
Unit 19: Probability Models Summary of Video Probability is the language of uncertainty. Using statistics, we can better predict the outcomes of random phenomena over the long term from the very complex,
ARTIFICIAL INTELLIGENCE METHODS IN EARLY MANUFACTURING TIME ESTIMATION
1 ARTIFICIAL INTELLIGENCE METHODS IN EARLY MANUFACTURING TIME ESTIMATION B. Mikó PhD, Z-Form Tool Manufacturing and Application Ltd H-1082. Budapest, Asztalos S. u 4. Tel: (1) 477 1016, e-mail: [email protected]
Effect of Using Neural Networks in GA-Based School Timetabling
Effect of Using Neural Networks in GA-Based School Timetabling JANIS ZUTERS Department of Computer Science University of Latvia Raina bulv. 19, Riga, LV-1050 LATVIA [email protected] Abstract: - The school
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
Basic Testing Concepts and Terminology
T-76.5613 Software Testing and Quality Assurance Lecture 2, 13.9.2006 Basic Testing Concepts and Terminology Juha Itkonen SoberIT Contents Realities and principles of Testing terminology and basic concepts
Sales Performance Management Using Salesforce.com and Tableau 8 Desktop Professional & Server
Sales Performance Management Using Salesforce.com and Tableau 8 Desktop Professional & Server Author: Phil Gilles Sales Operations Analyst, Tableau Software March 2013 p2 Executive Summary Managing sales
A pixlogic White Paper 4984 El Camino Real Suite 205 Los Altos, CA 94022 T. 650-967-4067 [email protected] www.pixlogic.com
A pixlogic White Paper 4984 El Camino Real Suite 205 Los Altos, CA 94022 T. 650-967-4067 [email protected] www.pixlogic.com Intelligent License Plate Recognition for Security Applications Dec-2011 Contents
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
Understanding the IEC61131-3 Programming Languages
profile Drive & Control Technical Article Understanding the IEC61131-3 Programming Languages It was about 120 years ago when Mark Twain used the phrase more than one way to skin a cat. In the world of
Fact sheet. Tractable. Automating visual recognition tasks with Artificial Intelligence
Fact sheet Tractable Automating visual recognition tasks with Artificial Intelligence Tractable enables computers to see better than humans We develop artificial neural networks capable of recognising
Module 1. Introduction to Software Engineering. Version 2 CSE IIT, Kharagpur
Module 1 Introduction to Software Engineering Lesson 2 Structured Programming Specific Instructional Objectives At the end of this lesson the student will be able to: Identify the important features of
0 Introduction to Data Analysis Using an Excel Spreadsheet
Experiment 0 Introduction to Data Analysis Using an Excel Spreadsheet I. Purpose The purpose of this introductory lab is to teach you a few basic things about how to use an EXCEL 2010 spreadsheet to do
Machine Learning and Data Mining -
Machine Learning and Data Mining - Perceptron Neural Networks Nuno Cavalheiro Marques ([email protected]) Spring Semester 2010/2011 MSc in Computer Science Multi Layer Perceptron Neurons and the Perceptron
Healthcare data analytics. Da-Wei Wang Institute of Information Science [email protected]
Healthcare data analytics Da-Wei Wang Institute of Information Science [email protected] Outline Data Science Enabling technologies Grand goals Issues Google flu trend Privacy Conclusion Analytics
How To Get A Computer Science Degree At Appalachian State
118 Master of Science in Computer Science Department of Computer Science College of Arts and Sciences James T. Wilkes, Chair and Professor Ph.D., Duke University [email protected] http://www.cs.appstate.edu/
Intelledox Designer WCA G 2.0
Intelledox Designer WCA G 2.0 Best Practice Guide Intelledox Designer WCAG 2.0 Best Practice Guide Version 1.0 Copyright 2011 Intelledox Pty Ltd All rights reserved. Intelledox Pty Ltd owns the Intelledox
Plotting Data with Microsoft Excel
Plotting Data with Microsoft Excel Here is an example of an attempt to plot parametric data in a scientifically meaningful way, using Microsoft Excel. This example describes an experience using the Office
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
3. Mathematical Induction
3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)
ORACLE ENTERPRISE DATA QUALITY PRODUCT FAMILY
ORACLE ENTERPRISE DATA QUALITY PRODUCT FAMILY The Oracle Enterprise Data Quality family of products helps organizations achieve maximum value from their business critical applications by delivering fit
Pedestrian Detection with RCNN
Pedestrian Detection with RCNN Matthew Chen Department of Computer Science Stanford University [email protected] Abstract In this paper we evaluate the effectiveness of using a Region-based Convolutional
Challenges of Cloud Scale Natural Language Processing
Challenges of Cloud Scale Natural Language Processing Mark Dredze Johns Hopkins University My Interests? Information Expressed in Human Language Machine Learning Natural Language Processing Intelligent
Big Data Text Mining and Visualization. Anton Heijs
Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark
Cross-Validation. Synonyms Rotation estimation
Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical
Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios
Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are
Direct Marketing of Insurance. Integration of Marketing, Pricing and Underwriting
Direct Marketing of Insurance Integration of Marketing, Pricing and Underwriting As insurers move to direct distribution and database marketing, new approaches to the business, integrating the marketing,
The Cyber Threat Profiler
Whitepaper The Cyber Threat Profiler Good Intelligence is essential to efficient system protection INTRODUCTION As the world becomes more dependent on cyber connectivity, the volume of cyber attacks are
Masters in Information Technology
Computer - Information Technology MSc & MPhil - 2015/6 - July 2015 Masters in Information Technology Programme Requirements Taught Element, and PG Diploma in Information Technology: 120 credits: IS5101
t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon
t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. [email protected] www.excelmasterseries.com
Consolidated Clinical Document Architecture and its Meaningful Use Nick Mahurin, chief executive officer, InfraWare, Terre Haute, Ind.
Consolidated Clinical Document Architecture and its Meaningful Use Nick Mahurin, chief executive officer, InfraWare, Terre Haute, Ind. Helping Doctors Regain Their Voice SM A Whitepaper for the Healthcare
Prediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
Neural Network Applications in Stock Market Predictions - A Methodology Analysis
Neural Network Applications in Stock Market Predictions - A Methodology Analysis Marijana Zekic, MS University of Josip Juraj Strossmayer in Osijek Faculty of Economics Osijek Gajev trg 7, 31000 Osijek
Motivation and Contents Overview
Motivation and Contents Overview Software Engineering Winter Semester 2011/2012 Department of Computer Science cs.uni-salzburg.at Dr. Stefan Resmerita 2 Course Contents Goals Learning about commonly used
Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.
White Paper Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. Using LSI for Implementing Document Management Systems By Mike Harrison, Director,
