Machine Learning at DIKU



Similar documents
Machine Learning and Financial Advice

Scalable Developments for Big Data Analytics in Remote Sensing

A fast multi-class SVM learning method for huge databases

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

High Productivity Data Processing Analytics Methods with Applications

Knowledge Discovery from patents using KMX Text Analytics

SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH

MASTER OF SCIENCE IN Computing & Data Analytics. (M.Sc. CDA)

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning

Predicting Customer Default Times using Survival Analysis Methods in SAS

MEng, BSc Computer Science with Artificial Intelligence

An intelligent tool for expediting and automating data mining steps. Ourania Hatzi, Nikolaos Zorbas, Mara Nikolaidou and Dimosthenis Anagnostopoulos

Machine Learning for Cyber Security Intelligence

MEng, BSc Applied Computer Science

Decision Trees from large Databases: SLIQ

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

Steven C.H. Hoi School of Information Systems Singapore Management University

How To Write A New Book On Data Science

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

Maschinelles Lernen mit MATLAB

Data-Driven Decisions: Role of Operations Research in Business Analytics

Random forest algorithm in big data environment

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Expedia

Comparison of Data Mining Techniques used for Financial Data Analysis

University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task

ARTIFICIAL INTELLIGENCE METHODS IN EARLY MANUFACTURING TIME ESTIMATION

Support Vector Machines with Clustering for Training with Very Large Datasets

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye MSRC

Azure Machine Learning, SQL Data Mining and R

Annotated bibliographies for presentations in MUMT 611, Winter 2006

Statistical Challenges with Big Data in Management Science

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

Big Data Analytics. Lucas Rego Drumond

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

MASTER OF SCIENCE IN Computing & Data Analytics. (M.Sc. CDA)

Creditworthiness Analysis in E-Financing Businesses - A Cross-Business Approach

MASTER OF SCIENCE IN Computing & Data Analytics. (M.Sc. CDA)

imtech Curriculum Presentation

Football Match Winner Prediction

Electrical and Computer Engineering Undergraduate Advising Manual

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

How To Get A Computer Engineering Degree

International Journal of Innovative Research in Computer and Communication Engineering

Learning to Process Natural Language in Big Data Environment

ISSN: CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

The Data Mining Process

Data Mining. Nonlinear Classification

Supervised Learning (Big Data Analytics)

Decision Trees for Mining Data Streams Based on the Gaussian Approximation

DISIT Lab, competence and project idea on bigdata. reasoning

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Machine Learning: Overview

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Data Mining Analytics for Business Intelligence and Decision Support

Application of Event Based Decision Tree and Ensemble of Data Driven Methods for Maintenance Action Recommendation

Massive Labeled Solar Image Data Benchmarks for Automated Feature Recognition

IT services for analyses of various data samples

An Introduction to Data Mining

How To Predict Web Site Visits

DATA MINING AND REPORTING IN HEALTHCARE

An Overview of Knowledge Discovery Database and Data mining Techniques

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Mining Wiki Usage Data for Predicting Final Grades of Students

Steven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore

Advanced In-Database Analytics

Ming-Wei Chang. Machine learning and its applications to natural language processing, information retrieval and data mining.

Scalable Machine Learning to Exploit Big Data for Knowledge Discovery

Statistics in Retail Finance. Chapter 2: Statistical models of default

A Logistic Regression Approach to Ad Click Prediction

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

Position Classification Flysheet for Computer Science Series, GS Table of Contents

Typical programme structures for MSc programmes in the School of Computing Science

Core Curriculum to the Course:

Machine Learning for Fraud Detection

Data Science, Predictive Analytics & Big Data Analytics Solutions. Service Presentation

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Lecture 8 February 4

CS570 Data Mining Classification: Ensemble Methods

Semantic Concept Based Retrieval of Software Bug Report with Feedback

MSc Finance & Business Analytics Programme Design. Academic Year

Proposal for the Theme on Big Data. Analytics. Qiang Yang, HKUST Jiannong Cao, PolyU Qi-man Shao, CUHK. May 2015

PREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA

What is Visual Analytics?

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Classification of Bad Accounts in Credit Card Industry

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

Increasing Marketing ROI with Optimized Prediction

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

How To Get A Masters Degree In Logistics And Supply Chain Management

How To Do Data Mining In R

ISSN: (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies

How To Identify A Churner

Introduction to Data Mining

THE KEY ADVANTAGES OF BUSINESS INTELLIGENCE AND ANALYTICS

Transcription:

Faculty of Science Machine Learning at DIKU Christian Igel Department of Computer Science igel@diku.dk Slide 1/12

Machine learning Machine learning is a branch of computer science and applied statistics covering software that improves its performance at a given task based on sample data or experience. Slide 2/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Why machine learning? Computer systems are required for tasks for which solutions cannot be specified in the traditional way, e.g., because the designer s knowledge is limited, and/or the sheer complexity and variability precludes an accurate description. Slide 3/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Why machine learning? Computer systems are required for tasks for which solutions cannot be specified in the traditional way, e.g., because the designer s knowledge is limited, and/or the sheer complexity and variability precludes an accurate description. However, large amounts of data describing the task are often available or can be automatically obtained. Slide 3/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Why machine learning? Computer systems are required for tasks for which solutions cannot be specified in the traditional way, e.g., because the designer s knowledge is limited, and/or the sheer complexity and variability precludes an accurate description. However, large amounts of data describing the task are often available or can be automatically obtained. To take proper advantage of this information, we need systems that self-adapt and automatically improve based on sample data Slide 3/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Why machine learning? Computer systems are required for tasks for which solutions cannot be specified in the traditional way, e.g., because the designer s knowledge is limited, and/or the sheer complexity and variability precludes an accurate description. However, large amounts of data describing the task are often available or can be automatically obtained. To take proper advantage of this information, we need systems that self-adapt and automatically improve based on sample data systems that learn. Slide 3/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Why machine learning? Computer systems are required for tasks for which solutions cannot be specified in the traditional way, e.g., because the designer s knowledge is limited, and/or the sheer complexity and variability precludes an accurate description. However, large amounts of data describing the task are often available Machine or can learning be automatically turns data obtained. into knowledge To take proper advantage of this information, we need systems that self-adapt and automatically improve based on sample data systems that learn. Slide 3/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Machine learning research at DIKU We are concerned with the design and analysis of adaptive systems for pattern recognition (data mining, time series prediction), data modeling, and behaviour generation (decision making). Our fields of expertise include state-of-the-art classification, regression, and density estimation techniques, efficient and robust learning algorithms for large scale problems, and computational intelligence methods for non-linear optimisation including vector optimisation and multi-criteria decision making. Slide 4/12 Christian Igel Machine Learning at DIKU igel@diku.dk

DIKU researchers in learning systems Machine Learning Lab http://image.diku.dk/mllab Image Group http://www.diku.dk/forskning/billedgruppen DIKU faculty doing machine learning, information retrieval, and pattern recognition: Corinna Cortes (head of Google Research New York, adjunct), Marleen De Bruijne, Sune Darkner, Aasa Feragen, Christian Igel (head of ML Lab), Francois Lauze, Christina Lioma, Mads Nielsen (head of Image Group), Marco Loog (TU Delft, adjunct) Søren Olsen, Jon Sporring, Kim Steenstrup Pedersen,... Slide 5/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Important themes in our work Autonomous learning Technical systems should learn robustly and autonomously, e.g., not requiring an expert to select learning algorithm and hyperparameters, appropriate data representation, etc. Slide 6/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Important themes in our work Autonomous learning Technical systems should learn robustly and autonomously, e.g., not requiring an expert to select learning algorithm and hyperparameters, appropriate data representation, etc. Scalability of adaptive systems We need learning algorithms able to handle large amounts of data as well as to generalise from few training examples. Slide 6/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Exemplary method: Support Vector Machines (SVMs) Slide 7/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Exemplary method: Support Vector Machines (SVMs) Slide 7/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Exemplary method: Support Vector Machines (SVMs) Φ Φ Slide 7/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Exemplary method: Support Vector Machines (SVMs) Φ Φ Slide 7/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Exemplary method: Support Vector Machines (SVMs) Φ Φ Slide 7/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Exemplary method: Support Vector Machines (SVMs) Φ Φ Slide 7/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Exemplary method: Support Vector Machines (SVMs) Φ Φ Slide 7/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Scaling up SVMs SVMs give excellent results in practice and are well understood theoretically Slide 8/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Scaling up SVMs SVMs give excellent results in practice and are well understood theoretically but how to make them applicable to big data? Slide 8/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Scaling up SVMs SVMs give excellent results in practice and are well understood theoretically but how to make them applicable to big data? We address this issue by 1 new optimization algorithms, Dogan, Glasmachers, Igel: Fast Training of Multi-class Support Vector Machines, submitted 2 new (e.g., cascaded) learning architectures, Prasoon et al.: Cascaded classifier for large-scale data applied to automatic segmentation of articular cartilage. SPIE Medical Imaging, 2012 3 parallelization. Slide 8/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Example: Cartilage segmentation 60000 55000 50000 45000 40000 time 35000 30000 25000 20000 15000 10000 0 2 4 6 8 10 12 14 16 number of cores Slide 9/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Business example: Credit scoring A credit score measures the creditworthiness of a client. Good Client granted loan Loan evaluation Client applies for loan Application evaluation Bad Client declined loan figures in this section provided by Kasper Nybo Hansen Slide 10/12 Christian Igel Machine Learning at DIKU igel@diku.dk

Results from MSc thesis Accuracy LDA LOG K NN RF CART C4.5 SVM Mod. RF 0.76 0.78 0.80 0.82 0.84 0.86 0.88 0.846 0.835 0.833 Slide 11/12 Christian Igel Machine Learning at DIKU igel@diku.dk

When theory and practice meet... Roth, Igel, Handmann: IJCIA 4, 2004 Winter et al.: IEEE TEC 12, 2008 Winter et al.: UMB 35, 2009 Markounikau, Igel, Jancke: PLoS Comp Biol 6, 2010 Mayr et al.: Analytical Chemistry 75, 2003 Pellecchia et al.: IEEE Intelligent Sys 20, 2005 Suttorp, Igel: Multi-objective Machine Learning Ch. 9, Springer, 2006 Igel et al.: IEEE/ACM TCBB 4, 2007 Mersch et al.: IJNS 17, 2007 Slide 12/12 Christian Igel Machine Learning at DIKU igel@diku.dk