Presentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering
|
|
- Winifred Barrett
- 8 years ago
- Views:
Transcription
1 Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen 9-October 2015 Presentation by: Ahmad Alsahaf Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering
2 A long History of Animal Breeding
3 A long History of Animal Breeding First step in animal breeding: Domestication
4 Breeding for desirable traits Longevity, fertility, more wool, milk, eggs, meat, etc.
5 Breeding for desirable traits Longevity, fertility, more wool, milk, eggs, meat, etc. Fasters race horses, better hunting dogs, cuter kittens.
6 Early History 1900 s Selective breeding relied on observable traits and human intuition 1930 s
7 Early History Selective breeding relied on observable traits and human intuition 1900 s 1930 s Rediscovery of Mendel s law of inheritance Gregor Mendal ( )
8 Early History Selective breeding relied on observable traits and human intuition 1900 s Rediscovery of Mendel s law of inheritance Biometrician, Karl Pearson, and the rejection of Mendel s laws 1930 s Karl Pearson ( ) 1940 s
9 Early History Selective breeding relied on observable traits and human intuition 1900 s Rediscovery of Mendel s law of inheritance Biometrician, Karl Pearson, and the rejection of Mendel s laws 1930 s Animal Breeding plans 1937 book by Lush. First application of statistics and quantitative genetics to animal breeding (cattle) 1940 s
10 Early History Selective breeding relied on observable traits and human intuition 1900 s Rediscovery of Mendel s law of inheritance Biometrician, Karl Pearson, and the rejection of Mendel s laws 1930 s Animal Breeding plans 1937 book by Lush. First application of statistics and quantitative genetics to animal breeding (cattle) 1940 s Artificial insemination became common practice in dairy cattle
11 The Dairy Cattle Example One of the sectors of the animal industry that benefitted most from selective breeding, and the use of data in it. Pedigree recordshave been kept well Few and easilymeasureable traits (Milk/protein/fat yields,feed efficiency) Bulls deemed good can be fully utilized Advanced artificial insemination technology
12 The Holstein Friesian Dairy Cattle Breed
13 Genotype Vs. Phenotype
14 Progeny Testing Test bull: Genetic information not available Bull s milk producing daughters Artificial Insemination Measure the quality of the milk Determine the economic value of the bull
15 Progeny Testing Test bull: Genetic information now AVAILABLE Bull s milk producing daughters Artificial Insemination 50,000 70,000 Genetic Markers Measure the quality of the milk Determine the economic value of the bull
16 Machine Learning Examples 1. Using classification models (supervised learning) to detect problems in artificial insemination. Grzesiak, Wilhelm, et al. "Detection of cows with insemination problems using selected classification models." Computers and electronics in agriculture 74.2 (2010):
17 Machine Learning Examples 1. Using classification models (supervised learning) to detect problems in artificial insemination. Lactation number % HF Genome Sex of the calf Age of cow AI season Health metric % of fat/protein in Milk In 1200 cows nominal phenotypes, categorical phenotypes, environmental factors Good cow Bad cow
18 Machine Learning Examples 1. Using classification models (supervised learning) to detect problems in artificial insemination. Lactation number % HF Genome Sex of the calf Age of cow AI season Health metric % of fat/protein in Milk In 1200 cows nominal phenotypes, categorical phenotypes, environmental factors Linear Classifiers Logistic Regression Artificial Neural Networks Multivariate adaptive regression splines Good cow Bad cow
19 Machine Learning Examples 1. Using classification models (supervised learning) to detect problems in artificial insemination. Lactation number % HF Genome Sex of the calf Age of cow AI season Health metric % of fat/protein in Milk In 1200 cows Genetic information, nominal phenotypes, categorical phenotypes, environmental factors Logistical and Economical implications Of the classification outcome False positives Vs. False negatives Good cow Bad cow
20 Machine Learning Examples 2. Clustering dairy cows based on their phenotypic traits ile Analizi, Kümeleme Yöntemleri. "Principal component and clustering analysis of functional traits in Swiss dairy cattle." Turk. J. Vet. Anim. Sci. (2008). 3. Prediction of insemination outcome Shahinfar, Saleh, et al. "Prediction of insemination outcomes in Holstein dairy cattle using alternative machine learning algorithms." Journal of dairy science (2014) 4. Predicting the lactation yield of dairy cows using multiple regression or neural networks Grzesiak, W., et al. "A comparison of neural network and multiple regression predictions for 305-day lactation yield using partial lactation records." Canadian journal of animal science (2003) Phenotype-Phenotype prediction studies # Data ML methods 1 10 phenotypes and environmental factors ANN, Logistig reg. 2 5 phenotypes Hierachial clustering, PCA 3 26 phenotypes and environmental factors naïve Bayes, decision trees 4 7 phenotypes ANN, multiple regression
21 Machine Learning with High Dimensional Genetic Data Genome Wide Association Studies A unit of genetic variation or a Genetic Marker SNP s (single nucleotide polymorphism) The goal is to associate an SNP (or several) with a phenotype, e.g. a disease This is typically done by GWAS (Genome Wide Association Studies). Which SNP s (or other markers) occur frequently within a population that has the trait of interest.
22 Machine Learning with High Dimensional Genetic Data Why Machine Learning? Quantitative traits (e.g. Milk yield, disease, longevity) are controlled by multiple markers. Machine Learning can associate multiple genetic markers to a phenotype AND find complex interactions between markers. Machine Learning can facilitate dealing with redundant and irrelevant variables.
23 Example: From genotype to Milk yield 1. Using Neural Networks Gianola, Daniel, et al. "Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat." BMC genetics 12.1 (2011): 87. Input: n: 297 cows p: 35,798 SNPS i.e. Small n, large p problem Output: Milk yield Protein yield Fat yield
24 Example: From genotype to Milk yield 1. Using Neural Networks Gianola, Daniel, et al. "Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat." BMC genetics 12.1 (2011): 87. Input: n: 297 cows p: 35,798 SNPS i.e. Small n, large p problem Output: Milk yield Protein yield Fat yield Dealing with dimenionality: Bayesian regularized back propagation; commonly used to avoid overfitting in BP. 297 variables derived from the original 35,798 Using genome derived (SNP) relationships between the cows as inputs instead of the SNP s themselves. By constructing a matrix of genomic relationships that s analogous to a covariance matrix and is based on allele frequency in the population
25 Example: From genotype to Milk yield 1. Using Neural Networks Gianola, Daniel, et al. "Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat." BMC genetics 12.1 (2011): 87. Results: Effective number of parameters
26 Example: From genotype to Milk yield 1. Using Neural Networks Gianola, Daniel, et al. "Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat." BMC genetics 12.1 (2011): 87. Results: Mean Squared Error of the predictions
27 Example: Genotype to Feed efficiency 1. Using Random Forests (Decision Trees) Yao, Chen, et al. "Random Forests approach for identifying additive and epistatic single nucleotide polymorphisms associated with residual feed intake in dairy cattle." Journal of dairy science (2013) Input: 395 Holstein cows 42,275 SNPS i.e. Small n, large p problem Output: Residual Feed Intake of the cow Adjusted for environmental and external factors
28 Example: Genotype to Feed efficiency 1. Using Random Forests (Decision Trees) Yao, Chen, et al. "Random Forests approach for identifying additive and epistatic single nucleotide polymorphisms associated with residual feed intake in dairy cattle." Journal of dairy science (2013) Methods: Decision trees A predictive model with a tree structure based on if-else statement. At each node, pick the best split (best question to ask).
29 Example: Genotype to Feed efficiency 1. Using Random Forests (Decision Trees) Yao, Chen, et al. "Random Forests approach for identifying additive and epistatic single nucleotide polymorphisms associated with residual feed intake in dairy cattle." Journal of dairy science (2013) Methods: Random Forests algorithm (ensemble method). The output is the averaged outcome of all weak learners in the ensemble (decision trees)
30 Example: Genotype to Feed efficiency 1. Using Random Forests (Decision Trees) Yao, Chen, et al. "Random Forests approach for identifying additive and epistatic single nucleotide polymorphisms associated with residual feed intake in dairy cattle." Journal of dairy science (2013) Dealing with dimensionality: Bootstrapping for each tree in the forest At each note of each tree, choosing the best split out of a subset of p variables, not all of them (100, 1000)
31 Example: Genotype to Feed efficiency 1. Using Random Forests (Decision Trees) Yao, Chen, et al. "Random Forests approach for identifying additive and epistatic single nucleotide polymorphisms associated with residual feed intake in dairy cattle." Journal of dairy science (2013) Results and Findings: Ranking SNP s according to their importance to the phenotype output. (implicit feature ranking capability of decision trees). B A C Identifying pairs epistatic genes through the RF structure, as they will tend to fall into the same branches of trees. (Parent-child) D E
32 Dipartimento di Elettronica, Informazione e Bioingegneria Master of Science in Automation and Control Engineering Dec 2014 Supervisor: Andra Castelletti Co-supervisor: Stefano Galelli Co-supervisor: Matteo Giuliani Master Thesis by: Ahmad Alsahaf
33 What is Model-order reduction (Emulation Modelling)? Such that: the emulator is less computationally intensive than the PB model; the input-output behavior reproduces accurately the PB model behaviour; the emulator is credible from the user/analyst s point of view. (Physically inrerpretable)
34 What is Model-order reduction (Emulation Modelling)? Such that: the emulator is less computationally intensive than the PB model; the input-output behavior reproduces accurately the PB model behaviour; the emulator is credible from the user/analyst s point of view. (Physical interpretability)
35 Recursive Variable Selection - A feature selection algorithm >2% State variables Exogenous inputs Control variables Output variable
36 PCA vs Sparse PCA Coefficients heat map
37 PCA Vs. Sparse and Weighted PCA Emulator performance Emulator performance Explained variance PCA WPCA SPCA R Number of Principle Components Emulator structure: Extra-trees (Geurts et al., 2006) Ref: Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine learning, 63(1), 3-42.
38
Genomic Selection in. Applied Training Workshop, Sterling. Hans Daetwyler, The Roslin Institute and R(D)SVS
Genomic Selection in Dairy Cattle AQUAGENOME Applied Training Workshop, Sterling Hans Daetwyler, The Roslin Institute and R(D)SVS Dairy introduction Overview Traditional breeding Genomic selection Advantages
More informationThe impact of genomic selection on North American dairy cattle breeding organizations
The impact of genomic selection on North American dairy cattle breeding organizations Jacques Chesnais, George Wiggans and Filippo Miglior The Semex Alliance, USDA and Canadian Dairy Network 2000 09 Genomic
More informationRobust procedures for Canadian Test Day Model final report for the Holstein breed
Robust procedures for Canadian Test Day Model final report for the Holstein breed J. Jamrozik, J. Fatehi and L.R. Schaeffer Centre for Genetic Improvement of Livestock, University of Guelph Introduction
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationAbbreviation key: NS = natural service breeding system, AI = artificial insemination, BV = breeding value, RBV = relative breeding value
Archiva Zootechnica 11:2, 29-34, 2008 29 Comparison between breeding values for milk production and reproduction of bulls of Holstein breed in artificial insemination and bulls in natural service J. 1,
More informationGENOMIC SELECTION: THE FUTURE OF MARKER ASSISTED SELECTION AND ANIMAL BREEDING
GENOMIC SELECTION: THE FUTURE OF MARKER ASSISTED SELECTION AND ANIMAL BREEDING Theo Meuwissen Institute for Animal Science and Aquaculture, Box 5025, 1432 Ås, Norway, theo.meuwissen@ihf.nlh.no Summary
More informationEvaluations for service-sire conception rate for heifer and cow inseminations with conventional and sexed semen
J. Dairy Sci. 94 :6135 6142 doi: 10.3168/jds.2010-3875 American Dairy Science Association, 2011. Evaluations for service-sire conception rate for heifer and cow inseminations with conventional and sexed
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
More informationGenetic improvement: a major component of increased dairy farm profitability
Genetic improvement: a major component of increased dairy farm profitability Filippo Miglior 1,2, Jacques Chesnais 3 & Brian Van Doormaal 2 1 2 Canadian Dairy Network 3 Semex Alliance Agri-Food Canada
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationMINISTRY OF LIVESTOCK DEVELOPMENT SMALLHOLDER DAIRY COMMERCIALIZATION PROGRAMME. Artificial Insemination (AI) Service
MINISTRY OF LIVESTOCK DEVELOPMENT SMALLHOLDER DAIRY COMMERCIALIZATION PROGRAMME Artificial Insemination (AI) Service 1 1.0 Introduction The fertility of a dairy cattle is very important for a dairy farmer
More informationNAV routine genetic evaluation of Dairy Cattle
NAV routine genetic evaluation of Dairy Cattle data and genetic models NAV December 2013 Second edition 1 Genetic evaluation within NAV Introduction... 6 NTM - Nordic Total Merit... 7 Traits included in
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationApplied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
More informationPaRFR : Parallel Random Forest Regression on Hadoop for Multivariate Quantitative Trait Loci Mapping. Version 1.0, Oct 2012
PaRFR : Parallel Random Forest Regression on Hadoop for Multivariate Quantitative Trait Loci Mapping Version 1.0, Oct 2012 This document describes PaRFR, a Java package that implements a parallel random
More informationvision evolving guidelines
vision To foster a collective, industry supported strategy for the future of the Holstein Breed which will act as a tool for Canadian dairy producers to maximize profitability and genetic improvement.
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
More informationGenomics: how well does it work?
Is genomics working? Genomics: how well does it work? Jacques Chesnais and Nicolas Caron, Semex Alliance The only way to find out is to do some validations Two types of validation - Backward validation
More informationTerms: The following terms are presented in this lesson (shown in bold italics and on PowerPoint Slides 2 and 3):
Unit B: Understanding Animal Reproduction Lesson 4: Understanding Genetics Student Learning Objectives: Instruction in this lesson should result in students achieving the following objectives: 1. Explain
More informationBasics of Marker Assisted Selection
asics of Marker ssisted Selection Chapter 15 asics of Marker ssisted Selection Julius van der Werf, Department of nimal Science rian Kinghorn, Twynam Chair of nimal reeding Technologies University of New
More informationCS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationCOMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers
COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Instructor: (jpineau@cs.mcgill.ca) TAs: Pierre-Luc Bacon (pbacon@cs.mcgill.ca) Ryan Lowe (ryan.lowe@mail.mcgill.ca)
More informationMachine learning for algo trading
Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with
More informationData Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
More informationComparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
More informationGenomic selection in dairy cattle: Integration of DNA testing into breeding programs
Genomic selection in dairy cattle: Integration of DNA testing into breeding programs Jonathan M. Schefers* and Kent A. Weigel* *Department of Dairy Science, University of Wisconsin, Madison 53706; and
More informationNon-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationBreeding for Carcass Traits in Dairy Cattle
HELSINGIN YLIOPISTON KOTIELÄINTIETEEN LAITOKSEN JULKAISUJA UNIVERSITY OF HELSINKI, DEPT. OF ANIMAL SCIENCE, PUBLICATIONS 53 Breeding for Carcass Traits in Dairy Cattle Anna-Elisa Liinamo Academic dissertation
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationGerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
More informationPredictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD
Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,
More informationRATES OF CONCEPTION BY ARTIFICIAL INSEMINATION OF. 1 Miss. Rohini Paramsothy Faculty of Agriculture University of Jaffna
RATES OF CONCEPTION BY ARTIFICIAL INSEMINATION OF DAIRY COWS IN JAFFNA DISTRICT 1 Miss. Rohini Paramsothy Faculty of Agriculture University of Jaffna INTRODUCTION Conception rates of dairy cows are influenced
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationQuality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report
Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report G. Banos 1, P.A. Mitkas 2, Z. Abas 3, A.L. Symeonidis 2, G. Milis 2 and U. Emanuelson 4 1 Faculty
More informationMS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationINTRODUCTION. The identification system of dairy cattle; The recording of production of dairy cattle; Laboratory analysis; Data processing.
POLISH FEDERATION OF CATTLE BREEDERS AND DAIRY FARMERS INTRODUCTION Polish Federation of Cattle Breeders and Dairy Farmers was established in 1995 as a merger of 20 regional breeding organizations from
More informationA Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationGenetic parameters for female fertility and milk production traits in first-parity Czech Holstein cows
Genetic parameters for female fertility and milk production traits in first-parity Czech Holstein cows V. Zink 1, J. Lassen 2, M. Štípková 1 1 Institute of Animal Science, Prague-Uhříněves, Czech Republic
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationBOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort xavier.conort@gear-analytics.com Session Number: TBR14 Insurance has always been a data business The industry has successfully
More informationAUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
More informationRandom forest algorithm in big data environment
Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest
More informationSupervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationBiomedical Big Data and Precision Medicine
Biomedical Big Data and Precision Medicine Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago October 8, 2015 1 Explosion of Biomedical Data 2 Types
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationGROSS MARGINS : HILL SHEEP 2004/2005
GROSS MARGINS GROSS MARGINS : HILL SHEEP 2004/2005 All flocks Top third Number of flocks in sample 242 81 Average size of flock (ewes and ewe lambs) 849 684 Lambs reared per ewe 1.10 1.25 ENTERPRISE OUTPUT
More informationSUMMARY Contribution to the cow s breeding study in one of the small and middle sizes exploitation in Dobrogea
SUMMARY The master s degree named Contribution to the cow s breeding study in one of the small and middle sizes exploitation in Dobrogea elaborated by engineer Gheorghe Neaga, coordinated by the collegiate
More informationGENETIC DATA ANALYSIS
GENETIC DATA ANALYSIS 1 Genetic Data: Future of Personalized Healthcare To achieve personalization in Healthcare, there is a need for more advancements in the field of Genomics. The human genome is made
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationBetter credit models benefit us all
Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis
More informationLecture 6. Artificial Neural Networks
Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm
More informationMEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS. Biostatistics
MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS title- course code: Program name: Contingency Tables and Log Linear Models Level Biostatistics Hours/week Ther. Recite. Lab. Others Total Master of Sci.
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
More informationMicrosoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
More informationEXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models
More informationFactors for success in big data science
Factors for success in big data science Damjan Vukcevic Data Science Murdoch Childrens Research Institute 16 October 2014 Big Data Reading Group (Department of Mathematics & Statistics, University of Melbourne)
More informationStatistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant
Statistical Analysis NBAF-B Metabolomics Masterclass Mark Viant 1. Introduction 2. Univariate analysis Overview of lecture 3. Unsupervised multivariate analysis Principal components analysis (PCA) Interpreting
More information1. About dairy cows. Breed of dairy cows
1. About dairy cows Breed of dairy cows Holstein Holstein is a typical dairy cow, and 99% of dairy cows in Japan are Holsteins. They are originally from the Netherlands and Holstein region of Germany.
More informationScope for the Use of Pregnancy Confirmation Data in Genetic Evaluation for Reproductive Performance
Scope for the Use of Pregnancy Confirmation Data in Genetic Evaluation for Reproductive Performance J. Jamrozik and G.J. Kistemaker Canadian Dairy Network The data on cow's pregnancy diagnostics has been
More informationMajor Advances in Globalization and Consolidation of the Artificial Insemination Industry
J. Dairy Sci. 89:1362 1368 American Dairy Science Association, 2006. Major Advances in Globalization and Consolidation of the Artificial Insemination Industry D. A. Funk ABS Global, Inc., DeForest, WI
More informationData quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
More informationData Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
More informationMilk protein genetic variation in Butana cattle
Milk protein genetic variation in Butana cattle Ammar Said Ahmed Züchtungsbiologie und molekulare Genetik, Humboldt Universität zu Berlin, Invalidenstraβe 42, 10115 Berlin, Deutschland 1 Outline Background
More informationEnsemble Data Mining Methods
Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods
More informationLearning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
More informationLecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
More informationArtificial Insemination (AI) in Cattle
Artificial Insemination (AI) in Cattle Most dairy cows are bred by AI Less common in beef cattle Commonly, bulls are used for all breeding under pasture conditions Less commonly, bulls are used as clean-up
More informationUsing multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
More informationSchool of Nursing. Presented by Yvette Conley, PhD
Presented by Yvette Conley, PhD What we will cover during this webcast: Briefly discuss the approaches introduced in the paper: Genome Sequencing Genome Wide Association Studies Epigenomics Gene Expression
More informationA Methodology for Predictive Failure Detection in Semiconductor Fabrication
A Methodology for Predictive Failure Detection in Semiconductor Fabrication Peter Scheibelhofer (TU Graz) Dietmar Gleispach, Günter Hayderer (austriamicrosystems AG) 09-09-2011 Peter Scheibelhofer (TU
More informationSustainability of dairy cattle breeding systems utilising artificial insemination in less developed countries - examples of problems and prospects
Philipsson Sustainability of dairy cattle breeding systems utilising artificial insemination in less developed countries - examples of problems and prospects J. Philipsson Department of Animal Breeding
More informationAlison Van Eenennaam, Ph.D.
Is the Market Ready for Milk from Cloned Cows? 3/15/06 Alison Van Eenennaam, Ph.D. Cooperative Extension Specialist Animal Biotechnology and Genomics alvaneenennaam@ucdavis.edu ODI OUTLINE What is a clone?
More informationThe All-Breed Animal Model Bennet Cassell, Extension Dairy Scientist, Genetics and Management
publication 404-086 The All-Breed Animal Model Bennet Cassell, Extension Dairy Scientist, Genetics and Management Introduction The all-breed animal model is the genetic-evaluation system used to evaluate
More informationData Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
More informationStaying good while playing God Looking after animal welfare when applying biotechnology
Staying good while playing God Looking after animal welfare when applying biotechnology Peter Sandøe, Stine B. Christiansen and Christian Gamborg University of Copenhagen Animal breeding was, until the
More informationData Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario jdu43@uwo.ca
Data Mining in CRM & Direct Marketing Jun Du The University of Western Ontario jdu43@uwo.ca Outline Why CRM & Marketing Goals in CRM & Marketing Models and Methodologies Case Study: Response Model Case
More informationWhat is the Cattle Data Base
Farming and milk production in Denmark By Henrik Nygaard, Advisory Manager, hen@landscentret.dk Danish Cattle Federation, Danish Agricultural Advisory Centre, The national Centre, Udkaersvej 15, DK-8200
More informationChapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
More informationAn example of bioinformatics application on plant breeding projects in Rijk Zwaan
An example of bioinformatics application on plant breeding projects in Rijk Zwaan Xiangyu Rao 17-08-2012 Introduction of RZ Rijk Zwaan is active worldwide as a vegetable breeding company that focuses on
More informationData Mining On Diabetics
Data Mining On Diabetics Janani Sankari.M 1,Saravana priya.m 2 Assistant Professor 1,2 Department of Information Technology 1,Computer Engineering 2 Jeppiaar Engineering College,Chennai 1, D.Y.Patil College
More informationCAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION
CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION N PROBLEM DEFINITION Opportunity New Booking - Time of Arrival Shortest Route (Distance/Time) Taxi-Passenger Demand Distribution Value Accurate
More informationData Mining Analysis of HIV-1 Protease Crystal Structures
Data Mining Analysis of HIV-1 Protease Crystal Structures Gene M. Ko, A. Srinivas Reddy, Sunil Kumar, and Rajni Garg AP0907 09 Data Mining Analysis of HIV-1 Protease Crystal Structures Gene M. Ko 1, A.
More information8. Machine Learning Applied Artificial Intelligence
8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name
More informationData Mining mit der JMSL Numerical Library for Java Applications
Data Mining mit der JMSL Numerical Library for Java Applications Stefan Sineux 8. Java Forum Stuttgart 07.07.2005 Agenda Visual Numerics JMSL TM Numerical Library Neuronale Netze (Hintergrund) Demos Neuronale
More informationBeef - Key performance indicators. Mary Vickers
Beef - Key performance indicators Mary Vickers Today Suckler herd KPIs Update on new project Responses KPIs for finishing systems What is a KPI? a business metric used to evaluate factors that are crucial
More informationReference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors
Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann
More informationHow To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
More informationEvent driven trading new studies on innovative way. of trading in Forex market. Michał Osmoła INIME live 23 February 2016
Event driven trading new studies on innovative way of trading in Forex market Michał Osmoła INIME live 23 February 2016 Forex market From Wikipedia: The foreign exchange market (Forex, FX, or currency
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationDr. G van der Veen (BVSc) Technical manager: Ruminants gerjan.vanderveen@zoetis.com
Dr. G van der Veen (BVSc) Technical manager: Ruminants gerjan.vanderveen@zoetis.com GENETICS NUTRITION MANAGEMENT Improved productivity and quality GENETICS Breeding programs are: Optimize genetic progress
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationFeature Subset Selection in E-mail Spam Detection
Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature
More information