Presentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering

Size: px
Start display at page:

Download "Presentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering"

Transcription

1 Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen 9-October 2015 Presentation by: Ahmad Alsahaf Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering

2 A long History of Animal Breeding

3 A long History of Animal Breeding First step in animal breeding: Domestication

4 Breeding for desirable traits Longevity, fertility, more wool, milk, eggs, meat, etc.

5 Breeding for desirable traits Longevity, fertility, more wool, milk, eggs, meat, etc. Fasters race horses, better hunting dogs, cuter kittens.

6 Early History 1900 s Selective breeding relied on observable traits and human intuition 1930 s

7 Early History Selective breeding relied on observable traits and human intuition 1900 s 1930 s Rediscovery of Mendel s law of inheritance Gregor Mendal ( )

8 Early History Selective breeding relied on observable traits and human intuition 1900 s Rediscovery of Mendel s law of inheritance Biometrician, Karl Pearson, and the rejection of Mendel s laws 1930 s Karl Pearson ( ) 1940 s

9 Early History Selective breeding relied on observable traits and human intuition 1900 s Rediscovery of Mendel s law of inheritance Biometrician, Karl Pearson, and the rejection of Mendel s laws 1930 s Animal Breeding plans 1937 book by Lush. First application of statistics and quantitative genetics to animal breeding (cattle) 1940 s

10 Early History Selective breeding relied on observable traits and human intuition 1900 s Rediscovery of Mendel s law of inheritance Biometrician, Karl Pearson, and the rejection of Mendel s laws 1930 s Animal Breeding plans 1937 book by Lush. First application of statistics and quantitative genetics to animal breeding (cattle) 1940 s Artificial insemination became common practice in dairy cattle

11 The Dairy Cattle Example One of the sectors of the animal industry that benefitted most from selective breeding, and the use of data in it. Pedigree recordshave been kept well Few and easilymeasureable traits (Milk/protein/fat yields,feed efficiency) Bulls deemed good can be fully utilized Advanced artificial insemination technology

12 The Holstein Friesian Dairy Cattle Breed

13 Genotype Vs. Phenotype

14 Progeny Testing Test bull: Genetic information not available Bull s milk producing daughters Artificial Insemination Measure the quality of the milk Determine the economic value of the bull

15 Progeny Testing Test bull: Genetic information now AVAILABLE Bull s milk producing daughters Artificial Insemination 50,000 70,000 Genetic Markers Measure the quality of the milk Determine the economic value of the bull

16 Machine Learning Examples 1. Using classification models (supervised learning) to detect problems in artificial insemination. Grzesiak, Wilhelm, et al. "Detection of cows with insemination problems using selected classification models." Computers and electronics in agriculture 74.2 (2010):

17 Machine Learning Examples 1. Using classification models (supervised learning) to detect problems in artificial insemination. Lactation number % HF Genome Sex of the calf Age of cow AI season Health metric % of fat/protein in Milk In 1200 cows nominal phenotypes, categorical phenotypes, environmental factors Good cow Bad cow

18 Machine Learning Examples 1. Using classification models (supervised learning) to detect problems in artificial insemination. Lactation number % HF Genome Sex of the calf Age of cow AI season Health metric % of fat/protein in Milk In 1200 cows nominal phenotypes, categorical phenotypes, environmental factors Linear Classifiers Logistic Regression Artificial Neural Networks Multivariate adaptive regression splines Good cow Bad cow

19 Machine Learning Examples 1. Using classification models (supervised learning) to detect problems in artificial insemination. Lactation number % HF Genome Sex of the calf Age of cow AI season Health metric % of fat/protein in Milk In 1200 cows Genetic information, nominal phenotypes, categorical phenotypes, environmental factors Logistical and Economical implications Of the classification outcome False positives Vs. False negatives Good cow Bad cow

20 Machine Learning Examples 2. Clustering dairy cows based on their phenotypic traits ile Analizi, Kümeleme Yöntemleri. "Principal component and clustering analysis of functional traits in Swiss dairy cattle." Turk. J. Vet. Anim. Sci. (2008). 3. Prediction of insemination outcome Shahinfar, Saleh, et al. "Prediction of insemination outcomes in Holstein dairy cattle using alternative machine learning algorithms." Journal of dairy science (2014) 4. Predicting the lactation yield of dairy cows using multiple regression or neural networks Grzesiak, W., et al. "A comparison of neural network and multiple regression predictions for 305-day lactation yield using partial lactation records." Canadian journal of animal science (2003) Phenotype-Phenotype prediction studies # Data ML methods 1 10 phenotypes and environmental factors ANN, Logistig reg. 2 5 phenotypes Hierachial clustering, PCA 3 26 phenotypes and environmental factors naïve Bayes, decision trees 4 7 phenotypes ANN, multiple regression

21 Machine Learning with High Dimensional Genetic Data Genome Wide Association Studies A unit of genetic variation or a Genetic Marker SNP s (single nucleotide polymorphism) The goal is to associate an SNP (or several) with a phenotype, e.g. a disease This is typically done by GWAS (Genome Wide Association Studies). Which SNP s (or other markers) occur frequently within a population that has the trait of interest.

22 Machine Learning with High Dimensional Genetic Data Why Machine Learning? Quantitative traits (e.g. Milk yield, disease, longevity) are controlled by multiple markers. Machine Learning can associate multiple genetic markers to a phenotype AND find complex interactions between markers. Machine Learning can facilitate dealing with redundant and irrelevant variables.

23 Example: From genotype to Milk yield 1. Using Neural Networks Gianola, Daniel, et al. "Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat." BMC genetics 12.1 (2011): 87. Input: n: 297 cows p: 35,798 SNPS i.e. Small n, large p problem Output: Milk yield Protein yield Fat yield

24 Example: From genotype to Milk yield 1. Using Neural Networks Gianola, Daniel, et al. "Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat." BMC genetics 12.1 (2011): 87. Input: n: 297 cows p: 35,798 SNPS i.e. Small n, large p problem Output: Milk yield Protein yield Fat yield Dealing with dimenionality: Bayesian regularized back propagation; commonly used to avoid overfitting in BP. 297 variables derived from the original 35,798 Using genome derived (SNP) relationships between the cows as inputs instead of the SNP s themselves. By constructing a matrix of genomic relationships that s analogous to a covariance matrix and is based on allele frequency in the population

25 Example: From genotype to Milk yield 1. Using Neural Networks Gianola, Daniel, et al. "Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat." BMC genetics 12.1 (2011): 87. Results: Effective number of parameters

26 Example: From genotype to Milk yield 1. Using Neural Networks Gianola, Daniel, et al. "Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat." BMC genetics 12.1 (2011): 87. Results: Mean Squared Error of the predictions

27 Example: Genotype to Feed efficiency 1. Using Random Forests (Decision Trees) Yao, Chen, et al. "Random Forests approach for identifying additive and epistatic single nucleotide polymorphisms associated with residual feed intake in dairy cattle." Journal of dairy science (2013) Input: 395 Holstein cows 42,275 SNPS i.e. Small n, large p problem Output: Residual Feed Intake of the cow Adjusted for environmental and external factors

28 Example: Genotype to Feed efficiency 1. Using Random Forests (Decision Trees) Yao, Chen, et al. "Random Forests approach for identifying additive and epistatic single nucleotide polymorphisms associated with residual feed intake in dairy cattle." Journal of dairy science (2013) Methods: Decision trees A predictive model with a tree structure based on if-else statement. At each node, pick the best split (best question to ask).

29 Example: Genotype to Feed efficiency 1. Using Random Forests (Decision Trees) Yao, Chen, et al. "Random Forests approach for identifying additive and epistatic single nucleotide polymorphisms associated with residual feed intake in dairy cattle." Journal of dairy science (2013) Methods: Random Forests algorithm (ensemble method). The output is the averaged outcome of all weak learners in the ensemble (decision trees)

30 Example: Genotype to Feed efficiency 1. Using Random Forests (Decision Trees) Yao, Chen, et al. "Random Forests approach for identifying additive and epistatic single nucleotide polymorphisms associated with residual feed intake in dairy cattle." Journal of dairy science (2013) Dealing with dimensionality: Bootstrapping for each tree in the forest At each note of each tree, choosing the best split out of a subset of p variables, not all of them (100, 1000)

31 Example: Genotype to Feed efficiency 1. Using Random Forests (Decision Trees) Yao, Chen, et al. "Random Forests approach for identifying additive and epistatic single nucleotide polymorphisms associated with residual feed intake in dairy cattle." Journal of dairy science (2013) Results and Findings: Ranking SNP s according to their importance to the phenotype output. (implicit feature ranking capability of decision trees). B A C Identifying pairs epistatic genes through the RF structure, as they will tend to fall into the same branches of trees. (Parent-child) D E

32 Dipartimento di Elettronica, Informazione e Bioingegneria Master of Science in Automation and Control Engineering Dec 2014 Supervisor: Andra Castelletti Co-supervisor: Stefano Galelli Co-supervisor: Matteo Giuliani Master Thesis by: Ahmad Alsahaf

33 What is Model-order reduction (Emulation Modelling)? Such that: the emulator is less computationally intensive than the PB model; the input-output behavior reproduces accurately the PB model behaviour; the emulator is credible from the user/analyst s point of view. (Physically inrerpretable)

34 What is Model-order reduction (Emulation Modelling)? Such that: the emulator is less computationally intensive than the PB model; the input-output behavior reproduces accurately the PB model behaviour; the emulator is credible from the user/analyst s point of view. (Physical interpretability)

35 Recursive Variable Selection - A feature selection algorithm >2% State variables Exogenous inputs Control variables Output variable

36 PCA vs Sparse PCA Coefficients heat map

37 PCA Vs. Sparse and Weighted PCA Emulator performance Emulator performance Explained variance PCA WPCA SPCA R Number of Principle Components Emulator structure: Extra-trees (Geurts et al., 2006) Ref: Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine learning, 63(1), 3-42.

38

Genomic Selection in. Applied Training Workshop, Sterling. Hans Daetwyler, The Roslin Institute and R(D)SVS

Genomic Selection in. Applied Training Workshop, Sterling. Hans Daetwyler, The Roslin Institute and R(D)SVS Genomic Selection in Dairy Cattle AQUAGENOME Applied Training Workshop, Sterling Hans Daetwyler, The Roslin Institute and R(D)SVS Dairy introduction Overview Traditional breeding Genomic selection Advantages

More information

The impact of genomic selection on North American dairy cattle breeding organizations

The impact of genomic selection on North American dairy cattle breeding organizations The impact of genomic selection on North American dairy cattle breeding organizations Jacques Chesnais, George Wiggans and Filippo Miglior The Semex Alliance, USDA and Canadian Dairy Network 2000 09 Genomic

More information

Robust procedures for Canadian Test Day Model final report for the Holstein breed

Robust procedures for Canadian Test Day Model final report for the Holstein breed Robust procedures for Canadian Test Day Model final report for the Holstein breed J. Jamrozik, J. Fatehi and L.R. Schaeffer Centre for Genetic Improvement of Livestock, University of Guelph Introduction

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Abbreviation key: NS = natural service breeding system, AI = artificial insemination, BV = breeding value, RBV = relative breeding value

Abbreviation key: NS = natural service breeding system, AI = artificial insemination, BV = breeding value, RBV = relative breeding value Archiva Zootechnica 11:2, 29-34, 2008 29 Comparison between breeding values for milk production and reproduction of bulls of Holstein breed in artificial insemination and bulls in natural service J. 1,

More information

GENOMIC SELECTION: THE FUTURE OF MARKER ASSISTED SELECTION AND ANIMAL BREEDING

GENOMIC SELECTION: THE FUTURE OF MARKER ASSISTED SELECTION AND ANIMAL BREEDING GENOMIC SELECTION: THE FUTURE OF MARKER ASSISTED SELECTION AND ANIMAL BREEDING Theo Meuwissen Institute for Animal Science and Aquaculture, Box 5025, 1432 Ås, Norway, theo.meuwissen@ihf.nlh.no Summary

More information

Evaluations for service-sire conception rate for heifer and cow inseminations with conventional and sexed semen

Evaluations for service-sire conception rate for heifer and cow inseminations with conventional and sexed semen J. Dairy Sci. 94 :6135 6142 doi: 10.3168/jds.2010-3875 American Dairy Science Association, 2011. Evaluations for service-sire conception rate for heifer and cow inseminations with conventional and sexed

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

Genetic improvement: a major component of increased dairy farm profitability

Genetic improvement: a major component of increased dairy farm profitability Genetic improvement: a major component of increased dairy farm profitability Filippo Miglior 1,2, Jacques Chesnais 3 & Brian Van Doormaal 2 1 2 Canadian Dairy Network 3 Semex Alliance Agri-Food Canada

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

MINISTRY OF LIVESTOCK DEVELOPMENT SMALLHOLDER DAIRY COMMERCIALIZATION PROGRAMME. Artificial Insemination (AI) Service

MINISTRY OF LIVESTOCK DEVELOPMENT SMALLHOLDER DAIRY COMMERCIALIZATION PROGRAMME. Artificial Insemination (AI) Service MINISTRY OF LIVESTOCK DEVELOPMENT SMALLHOLDER DAIRY COMMERCIALIZATION PROGRAMME Artificial Insemination (AI) Service 1 1.0 Introduction The fertility of a dairy cattle is very important for a dairy farmer

More information

NAV routine genetic evaluation of Dairy Cattle

NAV routine genetic evaluation of Dairy Cattle NAV routine genetic evaluation of Dairy Cattle data and genetic models NAV December 2013 Second edition 1 Genetic evaluation within NAV Introduction... 6 NTM - Nordic Total Merit... 7 Traits included in

More information

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

PaRFR : Parallel Random Forest Regression on Hadoop for Multivariate Quantitative Trait Loci Mapping. Version 1.0, Oct 2012

PaRFR : Parallel Random Forest Regression on Hadoop for Multivariate Quantitative Trait Loci Mapping. Version 1.0, Oct 2012 PaRFR : Parallel Random Forest Regression on Hadoop for Multivariate Quantitative Trait Loci Mapping Version 1.0, Oct 2012 This document describes PaRFR, a Java package that implements a parallel random

More information

vision evolving guidelines

vision evolving guidelines vision To foster a collective, industry supported strategy for the future of the Holstein Breed which will act as a tool for Canadian dairy producers to maximize profitability and genetic improvement.

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

Genomics: how well does it work?

Genomics: how well does it work? Is genomics working? Genomics: how well does it work? Jacques Chesnais and Nicolas Caron, Semex Alliance The only way to find out is to do some validations Two types of validation - Backward validation

More information

Terms: The following terms are presented in this lesson (shown in bold italics and on PowerPoint Slides 2 and 3):

Terms: The following terms are presented in this lesson (shown in bold italics and on PowerPoint Slides 2 and 3): Unit B: Understanding Animal Reproduction Lesson 4: Understanding Genetics Student Learning Objectives: Instruction in this lesson should result in students achieving the following objectives: 1. Explain

More information

Basics of Marker Assisted Selection

Basics of Marker Assisted Selection asics of Marker ssisted Selection Chapter 15 asics of Marker ssisted Selection Julius van der Werf, Department of nimal Science rian Kinghorn, Twynam Chair of nimal reeding Technologies University of New

More information

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

More information

COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers

COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Instructor: (jpineau@cs.mcgill.ca) TAs: Pierre-Luc Bacon (pbacon@cs.mcgill.ca) Ryan Lowe (ryan.lowe@mail.mcgill.ca)

More information

Machine learning for algo trading

Machine learning for algo trading Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with

More information

Data Mining for Knowledge Management. Classification

Data Mining for Knowledge Management. Classification 1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

More information

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

More information

Genomic selection in dairy cattle: Integration of DNA testing into breeding programs

Genomic selection in dairy cattle: Integration of DNA testing into breeding programs Genomic selection in dairy cattle: Integration of DNA testing into breeding programs Jonathan M. Schefers* and Kent A. Weigel* *Department of Dairy Science, University of Wisconsin, Madison 53706; and

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

Breeding for Carcass Traits in Dairy Cattle

Breeding for Carcass Traits in Dairy Cattle HELSINGIN YLIOPISTON KOTIELÄINTIETEEN LAITOKSEN JULKAISUJA UNIVERSITY OF HELSINKI, DEPT. OF ANIMAL SCIENCE, PUBLICATIONS 53 Breeding for Carcass Traits in Dairy Cattle Anna-Elisa Liinamo Academic dissertation

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,

More information

RATES OF CONCEPTION BY ARTIFICIAL INSEMINATION OF. 1 Miss. Rohini Paramsothy Faculty of Agriculture University of Jaffna

RATES OF CONCEPTION BY ARTIFICIAL INSEMINATION OF. 1 Miss. Rohini Paramsothy Faculty of Agriculture University of Jaffna RATES OF CONCEPTION BY ARTIFICIAL INSEMINATION OF DAIRY COWS IN JAFFNA DISTRICT 1 Miss. Rohini Paramsothy Faculty of Agriculture University of Jaffna INTRODUCTION Conception rates of dairy cows are influenced

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report

Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report G. Banos 1, P.A. Mitkas 2, Z. Abas 3, A.L. Symeonidis 2, G. Milis 2 and U. Emanuelson 4 1 Faculty

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

INTRODUCTION. The identification system of dairy cattle; The recording of production of dairy cattle; Laboratory analysis; Data processing.

INTRODUCTION. The identification system of dairy cattle; The recording of production of dairy cattle; Laboratory analysis; Data processing. POLISH FEDERATION OF CATTLE BREEDERS AND DAIRY FARMERS INTRODUCTION Polish Federation of Cattle Breeders and Dairy Farmers was established in 1995 as a merger of 20 regional breeding organizations from

More information

A Content based Spam Filtering Using Optical Back Propagation Technique

A Content based Spam Filtering Using Optical Back Propagation Technique A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Genetic parameters for female fertility and milk production traits in first-parity Czech Holstein cows

Genetic parameters for female fertility and milk production traits in first-parity Czech Holstein cows Genetic parameters for female fertility and milk production traits in first-parity Czech Holstein cows V. Zink 1, J. Lassen 2, M. Štípková 1 1 Institute of Animal Science, Prague-Uhříněves, Czech Republic

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort xavier.conort@gear-analytics.com Session Number: TBR14 Insurance has always been a data business The industry has successfully

More information

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S. AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

More information

Random forest algorithm in big data environment

Random forest algorithm in big data environment Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest

More information

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Biomedical Big Data and Precision Medicine

Biomedical Big Data and Precision Medicine Biomedical Big Data and Precision Medicine Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago October 8, 2015 1 Explosion of Biomedical Data 2 Types

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

GROSS MARGINS : HILL SHEEP 2004/2005

GROSS MARGINS : HILL SHEEP 2004/2005 GROSS MARGINS GROSS MARGINS : HILL SHEEP 2004/2005 All flocks Top third Number of flocks in sample 242 81 Average size of flock (ewes and ewe lambs) 849 684 Lambs reared per ewe 1.10 1.25 ENTERPRISE OUTPUT

More information

SUMMARY Contribution to the cow s breeding study in one of the small and middle sizes exploitation in Dobrogea

SUMMARY Contribution to the cow s breeding study in one of the small and middle sizes exploitation in Dobrogea SUMMARY The master s degree named Contribution to the cow s breeding study in one of the small and middle sizes exploitation in Dobrogea elaborated by engineer Gheorghe Neaga, coordinated by the collegiate

More information

GENETIC DATA ANALYSIS

GENETIC DATA ANALYSIS GENETIC DATA ANALYSIS 1 Genetic Data: Future of Personalized Healthcare To achieve personalization in Healthcare, there is a need for more advancements in the field of Genomics. The human genome is made

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Better credit models benefit us all

Better credit models benefit us all Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis

More information

Lecture 6. Artificial Neural Networks

Lecture 6. Artificial Neural Networks Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm

More information

MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS. Biostatistics

MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS. Biostatistics MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS title- course code: Program name: Contingency Tables and Log Linear Models Level Biostatistics Hours/week Ther. Recite. Lab. Others Total Master of Sci.

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

More information

Microsoft Azure Machine learning Algorithms

Microsoft Azure Machine learning Algorithms Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation

More information

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d. EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models

More information

Factors for success in big data science

Factors for success in big data science Factors for success in big data science Damjan Vukcevic Data Science Murdoch Childrens Research Institute 16 October 2014 Big Data Reading Group (Department of Mathematics & Statistics, University of Melbourne)

More information

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant Statistical Analysis NBAF-B Metabolomics Masterclass Mark Viant 1. Introduction 2. Univariate analysis Overview of lecture 3. Unsupervised multivariate analysis Principal components analysis (PCA) Interpreting

More information

1. About dairy cows. Breed of dairy cows

1. About dairy cows. Breed of dairy cows 1. About dairy cows Breed of dairy cows Holstein Holstein is a typical dairy cow, and 99% of dairy cows in Japan are Holsteins. They are originally from the Netherlands and Holstein region of Germany.

More information

Scope for the Use of Pregnancy Confirmation Data in Genetic Evaluation for Reproductive Performance

Scope for the Use of Pregnancy Confirmation Data in Genetic Evaluation for Reproductive Performance Scope for the Use of Pregnancy Confirmation Data in Genetic Evaluation for Reproductive Performance J. Jamrozik and G.J. Kistemaker Canadian Dairy Network The data on cow's pregnancy diagnostics has been

More information

Major Advances in Globalization and Consolidation of the Artificial Insemination Industry

Major Advances in Globalization and Consolidation of the Artificial Insemination Industry J. Dairy Sci. 89:1362 1368 American Dairy Science Association, 2006. Major Advances in Globalization and Consolidation of the Artificial Insemination Industry D. A. Funk ABS Global, Inc., DeForest, WI

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

Data Mining Techniques Chapter 6: Decision Trees

Data Mining Techniques Chapter 6: Decision Trees Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................

More information

Milk protein genetic variation in Butana cattle

Milk protein genetic variation in Butana cattle Milk protein genetic variation in Butana cattle Ammar Said Ahmed Züchtungsbiologie und molekulare Genetik, Humboldt Universität zu Berlin, Invalidenstraβe 42, 10115 Berlin, Deutschland 1 Outline Background

More information

Ensemble Data Mining Methods

Ensemble Data Mining Methods Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

Lecture 9: Introduction to Pattern Analysis

Lecture 9: Introduction to Pattern Analysis Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns

More information

Artificial Insemination (AI) in Cattle

Artificial Insemination (AI) in Cattle Artificial Insemination (AI) in Cattle Most dairy cows are bred by AI Less common in beef cattle Commonly, bulls are used for all breeding under pasture conditions Less commonly, bulls are used as clean-up

More information

Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

More information

School of Nursing. Presented by Yvette Conley, PhD

School of Nursing. Presented by Yvette Conley, PhD Presented by Yvette Conley, PhD What we will cover during this webcast: Briefly discuss the approaches introduced in the paper: Genome Sequencing Genome Wide Association Studies Epigenomics Gene Expression

More information

A Methodology for Predictive Failure Detection in Semiconductor Fabrication

A Methodology for Predictive Failure Detection in Semiconductor Fabrication A Methodology for Predictive Failure Detection in Semiconductor Fabrication Peter Scheibelhofer (TU Graz) Dietmar Gleispach, Günter Hayderer (austriamicrosystems AG) 09-09-2011 Peter Scheibelhofer (TU

More information

Sustainability of dairy cattle breeding systems utilising artificial insemination in less developed countries - examples of problems and prospects

Sustainability of dairy cattle breeding systems utilising artificial insemination in less developed countries - examples of problems and prospects Philipsson Sustainability of dairy cattle breeding systems utilising artificial insemination in less developed countries - examples of problems and prospects J. Philipsson Department of Animal Breeding

More information

Alison Van Eenennaam, Ph.D.

Alison Van Eenennaam, Ph.D. Is the Market Ready for Milk from Cloned Cows? 3/15/06 Alison Van Eenennaam, Ph.D. Cooperative Extension Specialist Animal Biotechnology and Genomics alvaneenennaam@ucdavis.edu ODI OUTLINE What is a clone?

More information

The All-Breed Animal Model Bennet Cassell, Extension Dairy Scientist, Genetics and Management

The All-Breed Animal Model Bennet Cassell, Extension Dairy Scientist, Genetics and Management publication 404-086 The All-Breed Animal Model Bennet Cassell, Extension Dairy Scientist, Genetics and Management Introduction The all-breed animal model is the genetic-evaluation system used to evaluate

More information

Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

More information

Staying good while playing God Looking after animal welfare when applying biotechnology

Staying good while playing God Looking after animal welfare when applying biotechnology Staying good while playing God Looking after animal welfare when applying biotechnology Peter Sandøe, Stine B. Christiansen and Christian Gamborg University of Copenhagen Animal breeding was, until the

More information

Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario jdu43@uwo.ca

Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario jdu43@uwo.ca Data Mining in CRM & Direct Marketing Jun Du The University of Western Ontario jdu43@uwo.ca Outline Why CRM & Marketing Goals in CRM & Marketing Models and Methodologies Case Study: Response Model Case

More information

What is the Cattle Data Base

What is the Cattle Data Base Farming and milk production in Denmark By Henrik Nygaard, Advisory Manager, hen@landscentret.dk Danish Cattle Federation, Danish Agricultural Advisory Centre, The national Centre, Udkaersvej 15, DK-8200

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

An example of bioinformatics application on plant breeding projects in Rijk Zwaan An example of bioinformatics application on plant breeding projects in Rijk Zwaan Xiangyu Rao 17-08-2012 Introduction of RZ Rijk Zwaan is active worldwide as a vegetable breeding company that focuses on

More information

Data Mining On Diabetics

Data Mining On Diabetics Data Mining On Diabetics Janani Sankari.M 1,Saravana priya.m 2 Assistant Professor 1,2 Department of Information Technology 1,Computer Engineering 2 Jeppiaar Engineering College,Chennai 1, D.Y.Patil College

More information

CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION

CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION N PROBLEM DEFINITION Opportunity New Booking - Time of Arrival Shortest Route (Distance/Time) Taxi-Passenger Demand Distribution Value Accurate

More information

Data Mining Analysis of HIV-1 Protease Crystal Structures

Data Mining Analysis of HIV-1 Protease Crystal Structures Data Mining Analysis of HIV-1 Protease Crystal Structures Gene M. Ko, A. Srinivas Reddy, Sunil Kumar, and Rajni Garg AP0907 09 Data Mining Analysis of HIV-1 Protease Crystal Structures Gene M. Ko 1, A.

More information

8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence 8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

More information

Data Mining mit der JMSL Numerical Library for Java Applications

Data Mining mit der JMSL Numerical Library for Java Applications Data Mining mit der JMSL Numerical Library for Java Applications Stefan Sineux 8. Java Forum Stuttgart 07.07.2005 Agenda Visual Numerics JMSL TM Numerical Library Neuronale Netze (Hintergrund) Demos Neuronale

More information

Beef - Key performance indicators. Mary Vickers

Beef - Key performance indicators. Mary Vickers Beef - Key performance indicators Mary Vickers Today Suckler herd KPIs Update on new project Responses KPIs for finishing systems What is a KPI? a business metric used to evaluate factors that are crucial

More information

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann

More information

How To Make A Credit Risk Model For A Bank Account

How To Make A Credit Risk Model For A Bank Account TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions

More information

Event driven trading new studies on innovative way. of trading in Forex market. Michał Osmoła INIME live 23 February 2016

Event driven trading new studies on innovative way. of trading in Forex market. Michał Osmoła INIME live 23 February 2016 Event driven trading new studies on innovative way of trading in Forex market Michał Osmoła INIME live 23 February 2016 Forex market From Wikipedia: The foreign exchange market (Forex, FX, or currency

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Dr. G van der Veen (BVSc) Technical manager: Ruminants gerjan.vanderveen@zoetis.com

Dr. G van der Veen (BVSc) Technical manager: Ruminants gerjan.vanderveen@zoetis.com Dr. G van der Veen (BVSc) Technical manager: Ruminants gerjan.vanderveen@zoetis.com GENETICS NUTRITION MANAGEMENT Improved productivity and quality GENETICS Breeding programs are: Optimize genetic progress

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Feature Subset Selection in E-mail Spam Detection

Feature Subset Selection in E-mail Spam Detection Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature

More information