T Non-discriminatory Machine Learning
|
|
|
- Valentine Fisher
- 10 years ago
- Views:
Transcription
1 T Non-discriminatory Machine Learning Seminar 1 Indrė Žliobaitė Aalto University School of Science, Department of Computer Science Helsinki Institute for Information Technology (HIIT) University of Helsinki 16 September, 2015
2 Public attention
3 Digital universe Communication data Mobility data, traffic Transactional data Multimedia Internet of things In 2013, 2/3 of the digital universe bits were created or captured by consumers and workers enterprises had liability or responsibility for 85% of the digital universe IDC survey 2014 for EMC In 2013, 22% of this is candidate for analysis 5% is actually analyzed By % will be candidate for analysis
4
5
6 Obama reports 2014 on Big data Decisions informed by big data could have discriminatory effects even in the absence of discriminatory intent Policy recommendation: to expand technical expertise to stop discrimination
7 Policy making Finland: new non-discrimination act came into force in Jan 2015 New EU non-discrimination directive is in preparation expands discrimination grounds and areas; widens definitions Obama report 2014 on big data expands the scope of protection: public and private activities, ethnic origin, age, nationality, language, religion, belief, opinion, health, disability, sexual orientation or other personal characteristics decisions informed by big data could have discriminatory effects even in the absence of discriminatory intent Increasing attention to digital discrimination Fairness, transparency and accountability in machine learning workshops, projects, public statements
8 Why care? Human decision makers may discriminate occasionally Algorithms would discriminate systematically and continuously Algorithms are often considered to be inherently objective But models are as good as data and modeling assumptions Algorithms may capture human biases, and may exaggerate Why care? To protect vulnerable people? Law requires? As computer scientists we are held accountable for algorithm performance, and need to be able to control and explain what is happening
9 Research attention
10
11 book journal special issue (2)
12 Research background
13 AirBnB case
14 AirBnB case Harvard study: non-black hosts charge approximately 12% more than black hosts for the equivalent rental in New York city Out of control of the company: the crowd discriminates What if AirBnB learns a price recommender on this data??
15 Other examples Big data Personalized pricing, recommendations, personalized ads CV screening, salary estimation Personalized medicine Navigation and route planning Learning support (education), sports and welling More traditional applications Credit scoring, insurance Spam filtering Crime prediction, profiling University acceptance, funding decisions
16 Machine learning and discrimination Discrimination inferior treatment based on ascribed group rather than individual merits Machine learning enforcing constraints defined by legislation not judging what is morally wrong or right Direct vs. indirect discrimination twins test redlining
17 Can algorithms discriminate? Algorithms can discriminate when data is incorrect due to discriminatory decisions in the past population is changing over time data is incomplete (omitted variable bias) sampling bias Typically indirect discrimination Algorithms can discriminate even when the protected characteristic is not part of the equations
18 Machine learning and discrimination y polarized Discrimination inferior treatment based on ascribed group rather than individual merits X Predictive models y f(x) s
19 Source: "Home Owners' Loan Corporation Philadelphia redlining map". Licensed under Public Domain via Wikipedia
20 Solutions?
21 Removing protected characteristic y f(x,s) y f(x)
22 Sneetches Dr. Seuss,
23 Removing protected characteristic does not solve the problem if s is correlated with X desired: y f(x) what happens: y f(x,s*), s* f(x) X s y X s y X s y
24 Removing protected characteristic does not solve the problem if s is correlated with X desired: y f(x) what happens: y f(x,s*), s* f(x) X s y No problem X s Problem! y X s y No problem
25 Removing protected characteristic does not solve the problem if s is correlated with X desired: y f(x) what happens: y f(x,s*), s* f(x) X y No problem s Problem! X X y y No problem
26 Kamiran et al 2010
27 Computational discrimination research: discovery vs. prevention Discovery by statistics, economics communities since 80s, typically in mortgage lending, insurance or job admission/lay-offs typically based on regression (look at coefficients) or statistical hypothesis testing (equality of means) Prevention by machine learning/ data mining community relatively new topic, since mostly focused on classification so far a lot of challenges ahead
28 Measuring Very open topic! Basic measure D = p(+ s) p(+ not s) or p(+ s)/p(+ not s), or p(+ s)/p(+),... Taking into account explanatory features easy to measure for discovery (hypothesis testing), difficult for prevention p(+ X,s) p(+ X,not s) Taking into account different decision thresholds normalized measures Romei and Ruggieri 2014, Mancuhan and Clifton 2014, Žliobaitė 2015
29 Prevention solutions Preprocessing Modify input data X, s or y Resample input data Regularization Postprocessing Modify models Modify outputs
30 Modify input data Modify y - massaging Kamiran and Calders 2009
31 Modify input data Modify X massaging Any attributes in X that could be used to predict s are changed such that a fairness constraint is satisfied approach is similar to sanitizing datasets for privacy preservation Feldman et al 2014
32 Resample Preferential sampling Kamiran and Calders 2010
33 Regularization Data subset due to split Regular tree induction Decision tree Entropy wrt class label Non-discriminatory tree Entropy wrt protected characteristic Tree induction IGC - IGS Kamiran et a 2010
34 Postprocessing Modifying model Relabel tree leaves to remove the most discrimination with the least damage to the accuracy Kamiran et a 2010
35 Prevention solutions Preprocessing Modify input data X, s or y Resample input data Regulatization Postprocessing Modify models Modify outputs From legal perspective Decision manipulation very bad Data manipulation quite bad Protected characteristic should not be used in decision making
36 Challenges ahead Impact challenges What is the scope of potentially discriminatory applications? Businesses are reluctant to collaborate, afraid of negative publicity Public is not concerned thinking that algorithms are always objective Research challenges Defining the right discrimination measures and optimization criteria Translating legal requirements into mathematical constraints and back Transparency and interpretability of the solutions is critical stakeholders need to understand and trust the solutions
37 Thanks!
Data Mining. Toon Calders TU Eindhoven
The Dangers of Data Mining Toon Calders TU Eindhoven Motivation for Data Mining: the Data Flood Huge amounts of data are available in digital form Internet IP Traffic logs Scientific data Customer profiles
Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Why do statisticians "hate" us?
Why do statisticians "hate" us? David Hand, Heikki Mannila, Padhraic Smyth "Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data
BOR 6335 Data Mining. Course Description. Course Bibliography and Required Readings. Prerequisites
BOR 6335 Data Mining Course Description This course provides an overview of data mining and fundamentals of using RapidMiner and OpenOffice open access software packages to develop data mining models.
Obtaining Value from Big Data
Obtaining Value from Big Data Course Notes in Transparency Format technology basics for data scientists Spring - 2014 Jordi Torres, UPC - BSC www.jorditorres.eu @JordiTorresBCN Data deluge, is it enough?
Understanding Characteristics of Caravan Insurance Policy Buyer
Understanding Characteristics of Caravan Insurance Policy Buyer May 10, 2007 Group 5 Chih Hau Huang Masami Mabuchi Muthita Songchitruksa Nopakoon Visitrattakul Executive Summary This report is intended
Data Mining for Fun and Profit
Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools
Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
Transparency of Hospital Productivity Benchmarking
Transparency of Productivity Benchmarking (Research-in-Progress) S. Laine, Department of Computer Science and Engineering, Aalto University Laine, Sami, Niemi, Erkka (2013), Transparency of Productivity
Random Forest Based Imbalanced Data Cleaning and Classification
Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem
Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1
Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Data Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
Workshop Discussion Notes: Housing
Workshop Discussion Notes: Housing Data & Civil Rights October 30, 2014 Washington, D.C. http://www.datacivilrights.org/ This document was produced based on notes taken during the Housing workshop of the
Application of Predictive Analytics to Higher Degree Research Course Completion Times
Application of Predictive Analytics to Higher Degree Research Course Completion Times Application of Decision Theory to PhD Course Completions (2006 2013) Rachna 1 I Dhand, Senior Strategic Information
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Response to Critiques of Mortgage Discrimination and FHA Loan Performance
A Response to Comments Response to Critiques of Mortgage Discrimination and FHA Loan Performance James A. Berkovec Glenn B. Canner Stuart A. Gabriel Timothy H. Hannan Abstract This response discusses the
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
Index Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
Chapter 7: Data Mining
Chapter 7: Data Mining Overview Topics discussed: The Need for Data Mining and Business Value The Data Mining Process: Define Business Objectives Get Raw Data Identify Relevant Predictive Variables Gain
CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection
Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
CODE FOR MANAGEMENT PRACTICES: EQUALISING OPPORTUNITIES PART 1: CREATION OF EMPLOYMENT RELATIONSHIP 1. JOB DESCRIPTION
CODE FOR MANAGEMENT : EQUALISING OPPORTUNITIES PART 1: CREATION OF EMPLOYMENT RELATIONSHIP 1. JOB DESCRIPTION Job descriptions must clearly define responsibilities and outputs. Job descriptions can however
Predicting borrowers chance of defaulting on credit loans
Predicting borrowers chance of defaulting on credit loans Junjie Liang ([email protected]) Abstract Credit score prediction is of great interests to banks as the outcome of the prediction algorithm
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
EQUALITY AND DIVERSITY POLICY & PROCEDURE MICHAEL W HALSALL (SOLICITORS)
EQUALITY AND DIVERSITY POLICY & PROCEDURE MICHAEL W HALSALL (SOLICITORS) JANUARY 2010 Michael W Halsall Anti-Discrimination Policy Introduction Michael W. Halsall Solicitors serves a diverse client base.
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
Data Mining is the process of knowledge discovery involving finding
using analytic services data mining framework for classification predicting the enrollment of students at a university a case study Data Mining is the process of knowledge discovery involving finding hidden
not possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
Benchmarking of different classes of models used for credit scoring
Benchmarking of different classes of models used for credit scoring We use this competition as an opportunity to compare the performance of different classes of predictive models. In particular we want
A Comparison of Variable Selection Techniques for Credit Scoring
1 A Comparison of Variable Selection Techniques for Credit Scoring K. Leung and F. Cheong and C. Cheong School of Business Information Technology, RMIT University, Melbourne, Victoria, Australia E-mail:
Anomaly detection. Problem motivation. Machine Learning
Anomaly detection Problem motivation Machine Learning Anomaly detection example Aircraft engine features: = heat generated = vibration intensity Dataset: New engine: (vibration) (heat) Density estimation
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Information Governance Strategy
Information Governance Strategy Document Status Draft Version: V2.1 DOCUMENT CHANGE HISTORY Initiated by Date Author Information Governance Requirements September 2007 Information Governance Group Version
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Sentiment analysis using emoticons
Sentiment analysis using emoticons Royden Kayhan Lewis Moharreri Steven Royden Ware Lewis Kayhan Steven Moharreri Ware Department of Computer Science, Ohio State University Problem definition Our aim was
Data Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Introduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
Introduction to Learning & Decision Trees
Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
Cross Validation. Dr. Thomas Jensen Expedia.com
Cross Validation Dr. Thomas Jensen Expedia.com About Me PhD from ETH Used to be a statistician at Link, now Senior Business Analyst at Expedia Manage a database with 720,000 Hotels that are not on contract
Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012
Digital Collections as Big Data Leslie Johnston, Library of Congress Digital Preservation 2012 Data is not just generated by satellites, identified during experiments, or collected during surveys. Datasets
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
NSF Workshop on Big Data Security and Privacy
NSF Workshop on Big Data Security and Privacy Report Summary Bhavani Thuraisingham The University of Texas at Dallas (UTD) February 19, 2015 Acknowledgement NSF SaTC Program for support Chris Clifton and
Data Mining with R. Decision Trees and Random Forests. Hugh Murrell
Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge
How To Develop Software
Software Development Basics Dr. Axel Kohlmeyer Associate Dean for Scientific Computing College of Science and Technology Temple University, Philadelphia http://sites.google.com/site/akohlmey/ [email protected]
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Application of Predictive Model for Elementary Students with Special Needs in New Era University
Application of Predictive Model for Elementary Students with Special Needs in New Era University Jannelle ds. Ligao, Calvin Jon A. Lingat, Kristine Nicole P. Chiu, Cym Quiambao, Laurice Anne A. Iglesia
BIG DATA WITHIN THE LARGE ENTERPRISE 9/19/2013. Navigating Implementation and Governance
BIG DATA WITHIN THE LARGE ENTERPRISE 9/19/2013 Navigating Implementation and Governance Purpose of Today s Talk John Adler - Data Management Group Madina Kassengaliyeva - Think Big Analytics Growing data
Predicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin
Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)
Data Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
Title: Lending Club Interest Rates are closely linked with FICO scores and Loan Length
Title: Lending Club Interest Rates are closely linked with FICO scores and Loan Length Introduction: The Lending Club is a unique website that allows people to directly borrow money from other people [1].
Data Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)
Machine Learning Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) What Is Machine Learning? A computer program is said to learn from experience E with respect to some class of
Machine Learning in Hospital Billing Management. 1. George Mason University 2. INOVA Health System
Machine Learning in Hospital Billing Management Janusz Wojtusiak 1, Che Ngufor 1, John M. Shiver 1, Ronald Ewald 2 1. George Mason University 2. INOVA Health System Introduction The purpose of the described
Weather forecast prediction: a Data Mining application
Weather forecast prediction: a Data Mining application Ms. Ashwini Mandale, Mrs. Jadhawar B.A. Assistant professor, Dr.Daulatrao Aher College of engg,karad,[email protected],8407974457 Abstract
Data Mining for Business Analytics
Data Mining for Business Analytics Lecture 2: Introduction to Predictive Modeling Stern School of Business New York University Spring 2014 MegaTelCo: Predicting Customer Churn You just landed a great analytical
Introduction to Pattern Recognition
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University [email protected] CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
TIETS34 Seminar: Data Mining on Biometric identification
TIETS34 Seminar: Data Mining on Biometric identification Youming Zhang Computer Science, School of Information Sciences, 33014 University of Tampere, Finland [email protected] Course Description Content
Data Mining Introduction
Data Mining Introduction Bob Stine Dept of Statistics, School University of Pennsylvania www-stat.wharton.upenn.edu/~stine What is data mining? An insult? Predictive modeling Large, wide data sets, often
Learning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Context Aware Predictive Analytics: Motivation, Potential, Challenges
Context Aware Predictive Analytics: Motivation, Potential, Challenges Mykola Pechenizkiy Seminar 31 October 2011 University of Bournemouth, England http://www.win.tue.nl/~mpechen/projects/capa Outline
TURKISH ORACLE USER GROUP
TURKISH ORACLE USER GROUP Data Mining in 30 Minutes Husnu Sensoy Global Maksimum Data & Information Tech. Founder VLDB Expert Agenda Who am I? Different problems of Data Mining In database data mining?!?
Healthcare Data Mining: Prediction Inpatient Length of Stay
3rd International IEEE Conference Intelligent Systems, September 2006 Healthcare Data Mining: Prediction Inpatient Length of Peng Liu, Lei Lei, Junjie Yin, Wei Zhang, Wu Naijun, Elia El-Darzi 1 Abstract
Statistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski [email protected] Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training
Healthcare data analytics. Da-Wei Wang Institute of Information Science [email protected]
Healthcare data analytics Da-Wei Wang Institute of Information Science [email protected] Outline Data Science Enabling technologies Grand goals Issues Google flu trend Privacy Conclusion Analytics
High Productivity Data Processing Analytics Methods with Applications
High Productivity Data Processing Analytics Methods with Applications Dr. Ing. Morris Riedel et al. Adjunct Associate Professor School of Engineering and Natural Sciences, University of Iceland Research
An Introduction to Machine Learning and Natural Language Processing Tools
An Introduction to Machine Learning and Natural Language Processing Tools Presented by: Mark Sammons, Vivek Srikumar (Many slides courtesy of Nick Rizzolo) 8/24/2010-8/26/2010 Some reasonably reliable
Data Mining and Automatic Quality Assurance of Survey Data
Autumn 2011 Bachelor Project for: Ledian Selimaj (081088) Data Mining and Automatic Quality Assurance of Survey Data Abstract Data mining is an emerging field in the computer science world, which stands
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
CRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining
Mining Process CRISP - DM Cross-Industry Standard Process for Mining (CRISP-DM) European Community funded effort to develop framework for data mining tasks Goals: Cross-Industry Standard Process for Mining
UNITED STATES JUDO REFEREES CODE OF ETHICS, STANDARDS AND CONDUCT
UNITED STATES JUDO REFEREES CODE OF ETHICS, STANDARDS AND CONDUCT TABLE OF CONTENTS INTRODUCTION... 2 GENERAL PRINCIPLES... 4 A: Competence... 4 B: Integrity... 4 C: Professional Responsibility... 4 D:
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of
