GOBII. Genomic & Open-source Breeding Informatics Initiative

Size: px
Start display at page:

Download "GOBII. Genomic & Open-source Breeding Informatics Initiative"

Transcription

1 GOBII Genomic & Open-source Breeding Informatics Initiative

2 My Background BS Animal Science, University of Tennessee MS Animal Breeding, University of Georgia Random regression models for longitudinal traits PhD Statistical Genetics, University of Georgia Feature selection and prediction algorithms Dow AgroSciences Quantitative Geneticist ( ) Quantitative Genetics Group Leader ( ) Development and implementation of global trial analysis system Development and implementation of genomic selection into NA corn breeding program

3 Genomic Data More Data, More Information? Genomic data is becoming increasingly more cost effective to generate. High Volume and High Dimensional data Need effective data management tools Analysis pipelines to turn data into information Genomic information does not replace phenotypic information Must have quality multi-year and multi-environment data to take full advantage of genomic information. Must be able to integrate genomic and phenotypic information Must have well designed training datasets to achieve needed prediction accuracies

4 Genomic Selection Selection Intensity Selection Accuracy Phenotype Environment Genotype R = irs g L Genetic Standard Deviation Generation Interval Train Potential Advantages of Genomic Selection Predict i,s g r L Early discarding, first stage screening based on genomic information Incorporate genomic information into early stage trials and multi-year evaluations Early recycling, reduce stages to variety release

5 r Accuracy Key Drivers Genetic Architecture and Heritability Model Training Population Data When properly implemented, is genomic selection accurate enough to drive increased genetic gain? Yes*

6 Z. Lin et al. Crop & Pasture Science 2014

7 Frequency Histogram of Accuracy Accuracy

8 Correlation = Discarding: Lose ~0.5% Picking Winners Advance ~33%

9 Correlation = Discarding: Lose ~9% 2 1 Picking Winners Advance ~20%

10 Correlation = Discarding: Lose ~21% Picking Winners Advance ~8%

11 Training i,s g, L Modifying the Funnel Widen the funnel Discard lines with low likelihood of success or absence of key traits based on genomic information. Can increase lines screened without increases in yield trial plot load (heavier nursery plot load) Increase selection intensity Prediction Early Stage Screening Characterization Release Shorten the funnel As accuracies of genomic predictions increase there is the possibility to replace the first stage of screening with GS and make recycling decisions earlier. Reduce the generation interval

12 Key Components Breeding Strategy Phenotypic Information Data Management (BMS) Analysis Pipelines Skilled Breeders Genomic Information Data Management (GOBII)

13 GOBII Mission To work closely with CGIAR centers to develop open-source capabilities and enable the implementation genomic and marker assisted selection for staple crops in the developing world. Vision Effective deployment of genomic information in breeding programs has the potential to significantly increase genetic gain in key crop performance traits. This can lead to staple crop varieties with improved yields and better adaption to growing conditions in South Asia and Sub-Saharan Africa, bringing us closer to providing a sustainable and reliable food supply

14 Key Components Breeding Strategy Phenotypic Information Data Management (BMS) Analysis Pipelines Skilled Breeders Genomic Information Data Management (GOBII)

15 Execution and Implementation Many Transformative Efforts Fail Many failed initiatives have great strategies They fall apart in the execution Need to have clearly define objectives Define the most critical elements and focus on those (must haves). Clearly defined deliverables aligned to those critical elements Action Avoid planning paralysis Engagement Commitment

16 Initial Phase Strategy Prioritize initial deliverables based on Urgency of the need across CG centers Technical feasibility look for low hanging fruit Dependencies on other deliverables Leverage existing components to the fullest extent possible Direct all user interaction through an API (focus on BRAPI) allowing the development team to switch out components on the back end with minimal user disruption Quickly piecing together a system to meet immediate needs of users should buy time to develop a truly nextgen solution for Phase 2 implementation

17 Sequence Data File Store Meta Data DB Pipeline: Genomic Variant Calls and imputation BRAPI LIMS Marker Variant DB Client Side Application and GUI Field Trial Management System

18 Work Packages WP1 Breeding Workflow Mapping/Project Prioritizations WP2 Data Warehouse/DataMart WP3 Server Application Data Analysis Pipelines WP4 Genomic API/ETL WP5 Client Application(s) Breeder Tools

19 Breeding Workflow Mapping/Project Prioritizations Breeding processes and strategy for each breeding program Line development process and timelines Key decision points Key traits GS and MAS strategies Understand marker workflows How marker data is pulled and filtered Common marker analyses Where markers are deployed in the breeding process Set initial prioritizations Understand critical marker needs that are not being met with current systems

20 Data Warehouse/DataMart Sequence Data Compressed FASTQ files Meta Data Relational database linking sample information to compressed FASTQ files. Sample and marker meta information Support basic BRAPI marker statistic calls Physical and Linkage map information Support BRAPI genomic maps calls Marker Calls Set up initial solution using currently implemented marker DBs Support BRAPI allele matrix Call Select and mock up and test large matrix store db solutions postgresql/citus, monet, Canssandra, Hbase, MongoDB

21 Server Application Variant Calling and Imputation Pipeline Leverage Existing Pipeline(s) File Selection Tool Based on SQL queries of meta data Analysis G matrix calculations (Possibly using TASSEL implementations) Calculations of LD PCoA decompositions

22 Genomic API/ETL API BRAPI implementation via web interface Custom GOBII API calls when needed ETL Mapping common queries to DB schemas Pull large blocks of data filtering on sample and marker characteristics Pull lines carrying haplotypes of interest. Client Application(s) Visualizations PCoA Connection to Flapjack LD matrices and LD decay SNP Calling pipeline File selection tool

23 Thank You

Marker-Assisted Backcrossing. Marker-Assisted Selection. 1. Select donor alleles at markers flanking target gene. Losing the target allele

Marker-Assisted Backcrossing. Marker-Assisted Selection. 1. Select donor alleles at markers flanking target gene. Losing the target allele Marker-Assisted Backcrossing Marker-Assisted Selection CS74 009 Jim Holland Target gene = Recurrent parent allele = Donor parent allele. Select donor allele at markers linked to target gene.. Select recurrent

More information

Delivering the power of the world s most successful genomics platform

Delivering the power of the world s most successful genomics platform Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE

More information

Investigating the genetic basis for intelligence

Investigating the genetic basis for intelligence Investigating the genetic basis for intelligence Steve Hsu University of Oregon and BGI www.cog-genomics.org Outline: a multidisciplinary subject 1. What is intelligence? Psychometrics 2. g and GWAS: a

More information

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Report on the Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Background and Goals of the Workshop June 5 6, 2012 The use of genome sequencing in human research is growing

More information

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis Goal: This tutorial introduces several websites and tools useful for determining linkage disequilibrium

More information

Basics of Marker Assisted Selection

Basics of Marker Assisted Selection asics of Marker ssisted Selection Chapter 15 asics of Marker ssisted Selection Julius van der Werf, Department of nimal Science rian Kinghorn, Twynam Chair of nimal reeding Technologies University of New

More information

GENOMIC SELECTION: THE FUTURE OF MARKER ASSISTED SELECTION AND ANIMAL BREEDING

GENOMIC SELECTION: THE FUTURE OF MARKER ASSISTED SELECTION AND ANIMAL BREEDING GENOMIC SELECTION: THE FUTURE OF MARKER ASSISTED SELECTION AND ANIMAL BREEDING Theo Meuwissen Institute for Animal Science and Aquaculture, Box 5025, 1432 Ås, Norway, theo.meuwissen@ihf.nlh.no Summary

More information

A Strategy for Plant Breeding Data Management in International Agricultural Research

A Strategy for Plant Breeding Data Management in International Agricultural Research A Strategy for Plant Breeding Data Management in International Agricultural Research Introduction Exchange of germplasm boosted crop improvement for subsistence agriculture during the 70s and 80s, and

More information

Development and Implementation

Development and Implementation International Presentation Crop Title Information Goes Here System : Development and Implementation presentation subtitle. Graham Mclaren GCP The ICIS Vision Connecting Islands of data o Connecting germplasm

More information

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated

More information

Computational Requirements

Computational Requirements Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Computational Requirements Steve Sherry, Lisa Brooks, Paul Flicek, Anton Nekrutenko, Kenna Shaw, Heidi Sofia High-density

More information

Introductory to Advanced Training Course Five Day Course Information and Agenda October, 2015

Introductory to Advanced Training Course Five Day Course Information and Agenda October, 2015 Introductory to Advanced Training Course Five Day Course Information and Agenda October, 2015 Agronomix Software, Inc. Winnipeg, MB, Canada www.agronomix.com Who Should Attend? This course is designed

More information

i2b2 Clinical Research Chart

i2b2 Clinical Research Chart i2b2 Clinical Research Chart Shawn Murphy MD, Ph.D. Griffin Weber MD, Ph.D. Michael Mendis Vivian Gainer MS Lori Phillips MS Rajesh Kuttan Wensong Pan MS Henry Chueh MD Susanne Churchill Ph.D. John Glaser

More information

The impact of genomic selection on North American dairy cattle breeding organizations

The impact of genomic selection on North American dairy cattle breeding organizations The impact of genomic selection on North American dairy cattle breeding organizations Jacques Chesnais, George Wiggans and Filippo Miglior The Semex Alliance, USDA and Canadian Dairy Network 2000 09 Genomic

More information

The key linkage of Strategy, Process and Requirements

The key linkage of Strategy, Process and Requirements Business Systems Business Functions The key linkage of Strategy, Process and Requirements Leveraging value from strategic business architecture By: Frank Kowalkowski, Knowledge Consultants, Inc.. Gil Laware,

More information

Genomic Selection in. Applied Training Workshop, Sterling. Hans Daetwyler, The Roslin Institute and R(D)SVS

Genomic Selection in. Applied Training Workshop, Sterling. Hans Daetwyler, The Roslin Institute and R(D)SVS Genomic Selection in Dairy Cattle AQUAGENOME Applied Training Workshop, Sterling Hans Daetwyler, The Roslin Institute and R(D)SVS Dairy introduction Overview Traditional breeding Genomic selection Advantages

More information

Accelerating variant calling

Accelerating variant calling Accelerating variant calling Mauricio Carneiro GSA Broad Institute Intel Genomic Sequencing Pipeline Workshop Mount Sinai 12/10/2013 This is the work of many Genome sequencing and analysis team Mark DePristo

More information

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc. Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse

More information

New Directions and Changing Faces for the USDA Sunflower Genetics Research Programs. Brent Hulke, Ph.D. Research Geneticist

New Directions and Changing Faces for the USDA Sunflower Genetics Research Programs. Brent Hulke, Ph.D. Research Geneticist New Directions and Changing Faces for the USDA Sunflower Genetics Research Programs Brent Hulke, Ph.D. Research Geneticist Brent s background Grew up on dairy farm in southern MN Agronomy BS from South

More information

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT Building Bioinformatics Capacity in Africa Nicky Mulder CBIO Group, UCT Outline What is bioinformatics? Why do we need IT infrastructure? What e-infrastructure does it require? How we are developing this

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

PROPOSAL To Develop an Enterprise Scale Disease Modeling Web Portal For Ascel Bio Updated March 2015

PROPOSAL To Develop an Enterprise Scale Disease Modeling Web Portal For Ascel Bio Updated March 2015 Enterprise Scale Disease Modeling Web Portal PROPOSAL To Develop an Enterprise Scale Disease Modeling Web Portal For Ascel Bio Updated March 2015 i Last Updated: 5/8/2015 4:13 PM3/5/2015 10:00 AM Enterprise

More information

Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report

Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report G. Banos 1, P.A. Mitkas 2, Z. Abas 3, A.L. Symeonidis 2, G. Milis 2 and U. Emanuelson 4 1 Faculty

More information

How To Find Rare Variants In The Human Genome

How To Find Rare Variants In The Human Genome UNIVERSITÀ DEGLI STUDI DI SASSARI Scuola di Dottorato in Scienze Biomediche XXV CICLO DOTTORATO DI RICERCA IN SCIENZE BIOMEDICHE INDIRIZZO DI GENETICA MEDICA, MALATTIE METABOLICHE E NUTRIGENOMICA Direttore:

More information

Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the

Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the Chapter 5 Analysis of Prostate Cancer Association Study Data 5.1 Risk factors for Prostate Cancer Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the disease has

More information

SAP HANA Enabling Genome Analysis

SAP HANA Enabling Genome Analysis SAP HANA Enabling Genome Analysis Joanna L. Kelley, PhD Postdoctoral Scholar, Stanford University Enakshi Singh, MSc HANA Product Management, SAP Labs LLC Outline Use cases Genomics review Challenges in

More information

Enhancing Functionality of EHRs for Genomic Research, Including E- Phenotying, Integrating Genomic Data, Transportable CDS, Privacy Threats

Enhancing Functionality of EHRs for Genomic Research, Including E- Phenotying, Integrating Genomic Data, Transportable CDS, Privacy Threats Enhancing Functionality of EHRs for Genomic Research, Including E- Phenotying, Integrating Genomic Data, Transportable CDS, Privacy Threats Genomic Medicine 8 meeting Alexa McCray Christopher G Chute Rex

More information

SNPbrowser Software v3.5

SNPbrowser Software v3.5 Product Bulletin SNP Genotyping SNPbrowser Software v3.5 A Free Software Tool for the Knowledge-Driven Selection of SNP Genotyping Assays Easily visualize SNPs integrated with a physical map, linkage disequilibrium

More information

(1-p) 2. p(1-p) From the table, frequency of DpyUnc = ¼ (p^2) = #DpyUnc = p^2 = 0.0004 ¼(1-p)^2 + ½(1-p)p + ¼(p^2) #Dpy + #DpyUnc

(1-p) 2. p(1-p) From the table, frequency of DpyUnc = ¼ (p^2) = #DpyUnc = p^2 = 0.0004 ¼(1-p)^2 + ½(1-p)p + ¼(p^2) #Dpy + #DpyUnc Advanced genetics Kornfeld problem set_key 1A (5 points) Brenner employed 2-factor and 3-factor crosses with the mutants isolated from his screen, and visually assayed for recombination events between

More information

URGI and ELIXIR France for plants and food

URGI and ELIXIR France for plants and food URGI and ELIXIR France for plants and food Elixir - SME & Innovation event, Data Driven Innovation. 19 th march 2015 A L I M E N T A T I O N A G R I C U L T U R E E N V I R O N N E M E N T URGI: Unité

More information

How To Use Data Analysis To Get More Information From A Computer Or Cell Phone To A Computer

How To Use Data Analysis To Get More Information From A Computer Or Cell Phone To A Computer Applying Big Data approaches to Competitive Intelligence challenges THOMSON REUTERS IP & SCIENCE PHARMA CI EUROPE CONFERENCE & EXHIBITION TIM MILLER 19 FEBRUARY 2014 BIG DATA, NOT JUST ABOUT VOLUMES Patient

More information

Research Roadmap for the Future. National Grape and Wine Initiative March 2013

Research Roadmap for the Future. National Grape and Wine Initiative March 2013 Research Roadmap for the Future National Grape and Wine Initiative March 2013 Objective of Today s Meeting Our mission drives the roadmap Our Mission Drive research to maximize productivity, sustainability

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically

More information

Data search and visualization tools at the Comparative Evolutionary Genomics of Cotton Web resource

Data search and visualization tools at the Comparative Evolutionary Genomics of Cotton Web resource Data search and visualization tools at the Comparative Evolutionary Genomics of Cotton Web resource Alan R. Gingle Andrew H. Paterson Joshua A. Udall Jonathan F. Wendel 1 CEGC project goals set the context

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Combining Data from Different Genotyping Platforms. Gonçalo Abecasis Center for Statistical Genetics University of Michigan

Combining Data from Different Genotyping Platforms. Gonçalo Abecasis Center for Statistical Genetics University of Michigan Combining Data from Different Genotyping Platforms Gonçalo Abecasis Center for Statistical Genetics University of Michigan The Challenge Detecting small effects requires very large sample sizes Combined

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Software Cost. Discounted STS Rate Units Total $0.00 $0.00 $0.00 $0.00 Total $0.00

Software Cost. Discounted STS Rate Units Total $0.00 $0.00 $0.00 $0.00 Total $0.00 Cost Form This cost form has been provided to assist respondents in submitting costs associated by deliverable. Remember that all costs are to be the firm, fixed price of the deliverable and project total.

More information

Big Data and the Data Lake. February 2015

Big Data and the Data Lake. February 2015 Big Data and the Data Lake February 2015 My Vision: Our Mission Data Intelligence is a broad term that describes the real, meaningful insights that can be extracted from your data truths that you can act

More information

Structure of the presentation

Structure of the presentation Integration of Legacy Data (SLIMS) and Laboratory Information Management System (LIMS) through Development of a Data Warehouse Presenter N. Chikobi 2011.06.29 Structure of the presentation Background Preliminary

More information

Work Package 13.5: Authors: Paul Flicek and Ilkka Lappalainen. 1. Introduction

Work Package 13.5: Authors: Paul Flicek and Ilkka Lappalainen. 1. Introduction Work Package 13.5: Report summarising the technical feasibility of the European Genotype Archive to collect, store, and use genotype data stored in European biobanks in a manner that complies with all

More information

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

More information

Microsoft Business Intelligence Platform

Microsoft Business Intelligence Platform Microsoft Business Intelligence Platform Agenda Welcome / Introductions Business Intelligence (BI) Overview Microsoft BI Stack Overview SharePoint BI Demo Q & A P 2 Firm Overview Founded in 1997. Offices:

More information

Cheminformatics and Pharmacophore Modeling, Together at Last

Cheminformatics and Pharmacophore Modeling, Together at Last Application Guide Cheminformatics and Pharmacophore Modeling, Together at Last SciTegic Pipeline Pilot Bridging Accord Database Explorer and Discovery Studio Carl Colburn Shikha Varma-O Brien Introduction

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Global Alliance. Ewan Birney Associate Director EMBL-EBI

Global Alliance. Ewan Birney Associate Director EMBL-EBI Global Alliance Ewan Birney Associate Director EMBL-EBI Our world is changing Research to Medical Research English as language Lightweight legal Identical/similar systems Open data Publications Grant-funding

More information

Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER

Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER JMP Genomics Step-by-Step Guide to Bi-Parental Linkage Mapping Introduction JMP Genomics offers several tools for the creation of linkage maps

More information

Presentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering

Presentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen 9-October 2015 Presentation by: Ahmad Alsahaf Research collaborator at the Hydroinformatics lab - Politecnico di

More information

HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM. Aniket Bochare - aniketb1@umbc.edu. CMSC 601 - Presentation

HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM. Aniket Bochare - aniketb1@umbc.edu. CMSC 601 - Presentation HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM Aniket Bochare - aniketb1@umbc.edu CMSC 601 - Presentation Date-04/25/2011 AGENDA Introduction and Background Framework Heterogeneous

More information

Federal Interagency Traumatic Brain Injury Research (FITBIR)

Federal Interagency Traumatic Brain Injury Research (FITBIR) Federal Interagency Traumatic Brain Injury Research (FITBIR) Matthew J. McAuliffe, PhD Co-director FITBIR Chief, Biomedical Imaging Research Services Section (BIRSS) email: Matthew.McAuliffe@nih.gov (301)

More information

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect A very short talk about Apache Kylin Business Intelligence meets Big Data Fabian Wilckens EMEA Solutions Architect 1 The challenge today 2 Very quickly: OLAP Online Analytical Processing How many beers

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

i2b2 Clinical Research Chart

i2b2 Clinical Research Chart i2b2 Clinical Research Chart Shawn Murphy MD, Ph.D. Griffin Weber MD, Ph.D. Michael Mendis Vivian Gainer MS Lori Phillips MS Rajesh Kuttan Wensong Pan MS Henry Chueh MD Susanne Churchill Ph.D. John Glaser

More information

Online Supplement to Polygenic Influence on Educational Attainment. Genotyping was conducted with the Illumina HumanOmni1-Quad v1 platform using

Online Supplement to Polygenic Influence on Educational Attainment. Genotyping was conducted with the Illumina HumanOmni1-Quad v1 platform using Online Supplement to Polygenic Influence on Educational Attainment Construction of Polygenic Score for Educational Attainment Genotyping was conducted with the Illumina HumanOmni1-Quad v1 platform using

More information

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features Charlie Berger, MS Eng, MBA Sr. Director Product Management, Data Mining and Advanced Analytics charlie.berger@oracle.com www.twitter.com/charliedatamine

More information

Introductory genetics for veterinary students

Introductory genetics for veterinary students Introductory genetics for veterinary students Michel Georges Introduction 1 References Genetics Analysis of Genes and Genomes 7 th edition. Hartl & Jones Molecular Biology of the Cell 5 th edition. Alberts

More information

<Insert Picture Here> The Evolution Of Clinical Data Warehousing

<Insert Picture Here> The Evolution Of Clinical Data Warehousing The Evolution Of Clinical Data Warehousing Srinivas Karri Principal Consultant Agenda Value of Clinical Data Clinical Data warehousing & The Big Data Challenge

More information

GenomeStudio Data Analysis Software

GenomeStudio Data Analysis Software GenomeStudio Analysis Software Illumina has created a comprehensive suite of data analysis tools to support a wide range of genetic analysis assays. This single software package provides data visualization

More information

GRIN-Global Project. the global plant genebank information management system

GRIN-Global Project. the global plant genebank information management system GRIN-Global Project the global plant genebank information management system So what is GRIN-Global? GRIN-Global (GG) is a software suite that enables genebanks to store and manage information associated

More information

IBM WebSphere DataStage Online training from Yes-M Systems

IBM WebSphere DataStage Online training from Yes-M Systems Yes-M Systems offers the unique opportunity to aspiring fresher s and experienced professionals to get real time experience in ETL Data warehouse tool IBM DataStage. Course Description With this training

More information

Modernizing Healthcare

Modernizing Healthcare Modernizing Healthcare Vision Mission: Transforming how healthcare information is created, consumed, and utilized to increase efficiency and improve outcomes. Physicians as programmers Built by physicians

More information

Pedigree Based Analysis using FlexQTL TM software

Pedigree Based Analysis using FlexQTL TM software Pedigree Based Analysis using FlexQTL TM software Marco Bink Eric van de Weg Roeland Voorrips Hans Jansen Outline Current Status: QTL mapping in pedigreed populations IBD probability of founder alleles

More information

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v2.2.0. 1.1 SMRT Analysis v2.2.0 Overview. Notes:

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v2.2.0. 1.1 SMRT Analysis v2.2.0 Overview. Notes: SMRT Analysis v2.2.0 Overview 100 338 400 01 1. SMRT Analysis v2.2.0 1.1 SMRT Analysis v2.2.0 Overview Welcome to Pacific Biosciences' SMRT Analysis v2.2.0 Overview 1.2 Contents This module will introduce

More information

Complex Event Processing (CEP) Why and How. Richard Hallgren BUGS 2013-05-30

Complex Event Processing (CEP) Why and How. Richard Hallgren BUGS 2013-05-30 Complex Event Processing (CEP) Why and How Richard Hallgren BUGS 2013-05-30 Objectives Understand why and how CEP is important for modern business processes Concepts within a CEP solution Overview of StreamInsight

More information

Integration of genomic data into electronic health records

Integration of genomic data into electronic health records Integration of genomic data into electronic health records Daniel Masys, MD Affiliate Professor Biomedical & Health Informatics University of Washington, Seattle Major portion of today s lecture is based

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

Genomic selection in dairy cattle: Integration of DNA testing into breeding programs

Genomic selection in dairy cattle: Integration of DNA testing into breeding programs Genomic selection in dairy cattle: Integration of DNA testing into breeding programs Jonathan M. Schefers* and Kent A. Weigel* *Department of Dairy Science, University of Wisconsin, Madison 53706; and

More information

Oracle RAC Services Appendix

Oracle RAC Services Appendix 1 Overview Oracle RAC Services Appendix As usage of the Blackboard Academic Suite grows and the system reaches a mission critical level, customers must evaluate the overall effectiveness, stability and

More information

OpenCB a next generation big data analytics and visualisation platform for the Omics revolution

OpenCB a next generation big data analytics and visualisation platform for the Omics revolution OpenCB a next generation big data analytics and visualisation platform for the Omics revolution Development at the University of Cambridge - Closing the Omics / Moore s law gap with Dell & Intel Ignacio

More information

SpreadSheet Inside. Xenomorph White Paper. Spreadsheet flexibility, database consistency

SpreadSheet Inside. Xenomorph White Paper. Spreadsheet flexibility, database consistency SpreadSheet Inside Spreadsheet flexibility, database consistency This paper illustrates how the TimeScape SpreadSheet Inside can bring unstructured spreadsheet data and complex calculations within a centralised

More information

BIOINFORMATICS Supporting competencies for the pharma industry

BIOINFORMATICS Supporting competencies for the pharma industry BIOINFORMATICS Supporting competencies for the pharma industry ABOUT QFAB QFAB is a bioinformatics service provider based in Brisbane, Australia operating nationwide and internationally. QFAB was established

More information

Genomics and the EHR. Mark Hoffman, Ph.D. Vice President Research Solutions Cerner Corporation

Genomics and the EHR. Mark Hoffman, Ph.D. Vice President Research Solutions Cerner Corporation Genomics and the EHR Mark Hoffman, Ph.D. Vice President Research Solutions Cerner Corporation Overview EHR from Commercial Perspective What can be done TODAY? What could be done TOMORROW? What are some

More information

Cloud Integration and the Big Data Journey - Common Use-Case Patterns

Cloud Integration and the Big Data Journey - Common Use-Case Patterns Cloud Integration and the Big Data Journey - Common Use-Case Patterns A White Paper August, 2014 Corporate Technologies Business Intelligence Group OVERVIEW The advent of cloud and hybrid architectures

More information

Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.2 Graphical User Interface (GUI) Manual

Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.2 Graphical User Interface (GUI) Manual Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.2 Graphical User Interface (GUI) Manual Department of Epidemiology and Biostatistics Wolstein Research Building 2103 Cornell Rd Case Western

More information

Sisense. Product Highlights. www.sisense.com

Sisense. Product Highlights. www.sisense.com Sisense Product Highlights Introduction Sisense is a business intelligence solution that simplifies analytics for complex data by offering an end-to-end platform that lets users easily prepare and analyze

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

Open source framework for data-flow visual analytic tools for large databases

Open source framework for data-flow visual analytic tools for large databases Open source framework for data-flow visual analytic tools for large databases D5.6 v1.0 WP5 Visual Analytics: D5.6 Open source framework for data flow visual analytic tools for large databases Dissemination

More information

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013 Big Data Use Case How Rackspace is using Private Cloud for Big Data Bryan Thompson May 8th, 2013 Our Big Data Problem Consolidate all monitoring data for reporting and analytical purposes. Every device

More information

Predictive Analytics

Predictive Analytics Predictive Analytics How many of you used predictive today? 2015 SAP SE. All rights reserved. 2 2015 SAP SE. All rights reserved. 3 How can you apply predictive to your business? Predictive Analytics is

More information

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing

More information

Milk protein genetic variation in Butana cattle

Milk protein genetic variation in Butana cattle Milk protein genetic variation in Butana cattle Ammar Said Ahmed Züchtungsbiologie und molekulare Genetik, Humboldt Universität zu Berlin, Invalidenstraβe 42, 10115 Berlin, Deutschland 1 Outline Background

More information

Issues in Data Storage and Data Management in Large- Scale Next-Gen Sequencing

Issues in Data Storage and Data Management in Large- Scale Next-Gen Sequencing Issues in Data Storage and Data Management in Large- Scale Next-Gen Sequencing Matthew Trunnell Manager, Research Computing Broad Institute Overview The Broad Institute Major challenges Current data workflow

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed

More information

Acceleration for Personalized Medicine Big Data Applications

Acceleration for Personalized Medicine Big Data Applications Acceleration for Personalized Medicine Big Data Applications Zaid Al-Ars Computer Engineering (CE) Lab Delft Data Science Delft University of Technology 1" Introduction Definition & relevance Personalized

More information

Smarter Healthcare@IBM Research. Joseph M. Jasinski, Ph.D. Distinguished Engineer IBM Research

Smarter Healthcare@IBM Research. Joseph M. Jasinski, Ph.D. Distinguished Engineer IBM Research Smarter Healthcare@IBM Research Joseph M. Jasinski, Ph.D. Distinguished Engineer IBM Research Our researchers work on a wide spectrum of topics Basic Science Industry specific innovation Nanotechnology

More information

Course Catalog. www.airweb.org/academy

Course Catalog. www.airweb.org/academy www.airweb.org/academy Course Catalog 2015 Hosted by the Association for Institutional Research, Data and Decisions Academy courses provide self-paced, online professional development for institutional

More information

Prerequisites. Course Outline

Prerequisites. Course Outline MS-55040: Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot Description This three-day instructor-led course will introduce the students to the concepts of data mining,

More information

Ellucian BPM Solutions Roadmap

Ellucian BPM Solutions Roadmap Ellucian BPM Solutions Roadmap Roadmap Framing and Confidentiality Ellucian s roadmaps provide a general overview of our anticipated future offerings. The information contained in Ellucian s roadmaps is

More information

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19 PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations

More information

Good Agile Testing Practices and Traits How does Agile Testing work?

Good Agile Testing Practices and Traits How does Agile Testing work? Agile Testing Best Practices Introduction The testing phase of software development sometimes gets the short shrift from developers and IT managers. Yet testing is the only way to determine whether an

More information

OpenChorus: Building a Tool-Chest for Big Data Science

OpenChorus: Building a Tool-Chest for Big Data Science OpenChorus: Building a Tool-Chest for Big Data Science Milind Bhandarkar Chief Scientist, Machine Learning Platforms EMC Greenplum 1 Agenda! Tools for Data Science! Data Science Workflow! Greenplum OpenChorus!

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

Fluency With Information Technology CSE100/IMT100

Fluency With Information Technology CSE100/IMT100 Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999

More information

IDL. Get the answers you need from your data. IDL

IDL. Get the answers you need from your data. IDL Get the answers you need from your data. IDL is the preferred computing environment for understanding complex data through interactive visualization and analysis. IDL Powerful visualization. Interactive

More information

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015 Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours

More information

INRA's Big Data perspectives and implementation challenges. Pascal Neveu UMR MISTEA INRA - Montpellier

INRA's Big Data perspectives and implementation challenges. Pascal Neveu UMR MISTEA INRA - Montpellier INRA's Big Data perspectives and implementation challenges UMR MISTEA INRA - Montpellier Agronomic Sciences Raises integrated issues and challenges: How to adapt agriculture to climate change? How agriculture

More information

Marketing Automation Request for Proposal

Marketing Automation Request for Proposal Marketing Automation Request for Proposal Choosing the right marketing automation system isn t easy. This is why we created this sample RFP, consisting entirely of actual questions from real RFPs submitted

More information

STATISTICA Solutions for Financial Risk Management Management and Validated Compliance Solutions for the Banking Industry (Basel II)

STATISTICA Solutions for Financial Risk Management Management and Validated Compliance Solutions for the Banking Industry (Basel II) STATISTICA Solutions for Financial Risk Management Management and Validated Compliance Solutions for the Banking Industry (Basel II) With the New Basel Capital Accord of 2001 (BASEL II) the banking industry

More information