Databases, data mining and analysis pipelines Part 5: BioMarts
|
|
- Nathan Hensley
- 5 years ago
- Views:
Transcription
1 Databases, data mining and analysis pipelines Part 5: BioMarts Amel GHOUILA, PhD LTCII, Institut Pasteur de Tunis May, Amel GHOUILA (IPT) 1 May, / 35
2 Plan 1 Introduction 2 BioMarts 3 Biomart on line 4 BiomaRt package 5 Data mining tools Amel GHOUILA (IPT) 2 May, / 35
3 Introduction Plan 1 Introduction 2 BioMarts 3 Biomart on line 4 BiomaRt package 5 Data mining tools Amel GHOUILA (IPT) 3 May, / 35
4 Introduction Biological Data Management Biological data management is a challenging task Biological concepts are complex and not always well defined Amel GHOUILA (IPT) 4 May, / 35
5 Introduction Biological Web Databases Amel GHOUILA (IPT) 5 May, / 35
6 Introduction Biological Data Management Biological data management is a challenging task Biological concepts are complex and not always well defined Amel GHOUILA (IPT) 6 May, / 35
7 BioMarts Plan 1 Introduction 2 BioMarts 3 Biomart on line 4 BiomaRt package 5 Data mining tools Amel GHOUILA (IPT) 7 May, / 35
8 BioMarts What is Biomart A Biomart is a Datamart an indexing and extraction system a data mart is a subset of the data warehouse the information in the database is not organized in a way that makes it easy for organizations to find what they need Amel GHOUILA (IPT) 8 May, / 35
9 BioMarts The BioMart project A joint project between : European Bioinformatics Institute (EBI) Cold Spring Harbor Laboratory (CSHL) The BioMart project ( was initiated to adress manny challenges The BioMart software is based on two fundamentals concepts : data agnostic modelling data federation Amel GHOUILA (IPT) 9 May, / 35
10 BioMarts The BioMart project Data agnostic modelling modelling simplifies the difficult and time-consuming task of data modelling using a predefined, query-optimized relational schema that can be used to represent any kind of data Amel GHOUILA (IPT) 9 May, / 35
11 BioMarts The BioMart project Data federation organization of multiple disparate and distributed database systems into what appears to be a single integrated virtual database possibilty of accessing and cross reference data from many data sources using a single user interface without the need of collation the data into one location Amel GHOUILA (IPT) 9 May, / 35
12 BioMarts Advantages of the BioMart project Data mining or advance search adapting data warehousing ideas to create a universal system for Biological data Management and gives biologists the ability to create complex, customized datasets through a web interface new innovative way of creating large multi-database repositories that avoid the need to store all the data in a single location unified access to disparate, geographically distributed data sources proves that large-scale projects involving NGS data can be managed efficiently in a distributed environment interactive several levels of query optimization to efficiently manage large data sets Amel GHOUILA (IPT) 10 May, / 35
13 BioMarts BioMart idea Amel GHOUILA (IPT) 11 May, / 35
14 BioMarts Building BioMart databases Amel GHOUILA (IPT) 12 May, / 35
15 BioMarts Fixed schema transformation Amel GHOUILA (IPT) 13 May, / 35
16 BioMarts BioMart architecture Amel GHOUILA (IPT) 14 May, / 35
17 Biomart on line Plan 1 Introduction 2 BioMarts 3 Biomart on line 4 BiomaRt package 5 Data mining tools Amel GHOUILA (IPT) 15 May, / 35
18 Biomart on line Example : Ensembl Martview http :// Ensembl- Tools- BioMart : No programming required! Amel GHOUILA (IPT) 16 May, / 35
19 Biomart on line Example : Ensembl Martview Different queries All the genes of a given species only genes on one specific region of a chromosome InterPro domain associated to a chromosome or to one region of a chromosome Gene Ontology and expression vocabulary terms Multi species : orthologs and upstream regions etc. Amel GHOUILA (IPT) 16 May, / 35
20 Biomart on line Example : Ensembl Martview Steps Choose the Dataset Filters : define the set of genes Determine output columns Export results Amel GHOUILA (IPT) 16 May, / 35
21 Biomart on line Examples of other databases with Biomarts interfaces Many databases adpoted Biomart : dbsnp (via Ensembl) HapMap SequenceMart : Ensembl genome seqences wormbase Reactome Amel GHOUILA (IPT) 17 May, / 35
22 Biomart on line Biomarts user interfaces Martview : web based interface : Possibility to query all databases hosted by EBI s public biomart server MartExplorer BiomaRt R/bioconductor package Amel GHOUILA (IPT) 18 May, / 35
23 BiomaRt package Plan 1 Introduction 2 BioMarts 3 Biomart on line 4 BiomaRt package 5 Data mining tools Amel GHOUILA (IPT) 19 May, / 35
24 BiomaRt package BiomaRt package R interface to BioMart databases Developed by Steffen Durinck (started Feb 2005) Well suited for batch querying Main sets of functions Adapted to Ensembl with available shortcuts for FAQs : getgene, getgo, getomim Generic queries, modeled after MQL (Mart Query language) : can be used with any Biomart dataset Amel GHOUILA (IPT) 20 May, / 35
25 BiomaRt package BiomaRt package Advantages Possibility to retrieve large amount of data from various sources Uniform way without the need to know the underlying database schemas Avoiding writting complex SQL queries Amel GHOUILA (IPT) 20 May, / 35
26 BiomaRt package BiomaRt package Communication protocols Direct MySQL queries to BioMart database servers HTTP queries to BioMart webservices Amel GHOUILA (IPT) 20 May, / 35
27 BiomaRt package Getting started with biomart Install > R() >source( http ://bioconductor.org/bioclite.r ) > bioclite( biomart ) Loading required package : XML Amel GHOUILA (IPT) 21 May, / 35
28 BiomaRt package Getting started with biomart Advantages > library( biomart ) > ListMarts() : lists all available databases Amel GHOUILA (IPT) 21 May, / 35
29 BiomaRt package Getting started with biomart Selection of a database usemart() : Selects a specific BioMart database to be used Use Ensembl BioMart database : > Ensembl= usemart( ensembl ) The database choosen must be a valid name given by listmarts > listdatasets(ensembl) : Check for Datasets available in the selected BioMart Amel GHOUILA (IPT) 21 May, / 35
30 BiomaRt package Mining Ensembl data > ens - usemart( ensembl ) then choose a database to use : > hsap - usedataset( hsapiens gene ensembl,mart=ens) OR > hsap = usemart( ensembl, dataset= hsapiens gene ensembl ) Amel GHOUILA (IPT) 22 May, / 35
31 BiomaRt package Mining Ensembl data getgene function Example queries the database for gene information It accepts many forms of gene identifier : Entrez, HUGO, Affy transcript returns : Gene symbol, Description, Chromosome name, Band, Start position, End position, BioMartID > getgene(id=100, type= entrezgene, mart=hsap) Amel GHOUILA (IPT) 22 May, / 35
32 BiomaRt package Mining Ensembl data getbm function more general that getgene specifies a list of filters for selecting genes or SNPs and attributes to return from the database Syntax : getbm(attributes, filters=, values=, mart, list.names = NULL, checkfilters = TRUE, uniquerows = TRUE) Amel GHOUILA (IPT) 22 May, / 35
33 BiomaRt package Mining Ensembl data Main arguments attributes : Attributes you want to retrieve. listattributes listattributes(ensembl) Filters : (one or more) define a restriction in the query. listfilters function shows all available Filters on a given Dataset (listfilters(ensembl)) values : a vector of values for the filters mart Amel GHOUILA (IPT) 22 May, / 35
34 BiomaRt package Mining Ensembl data Examples Retrieve GO annotation for the following Illumina human wg6 v2 identifiers : ILMN , ILMN > illuminaids= c ( ILMN , ILMN ) : specifying filters > goannot= getbm (c( Illumina human wg6 v2, go id ), filters= Illumina human wg6 v2, values= illuminaids, mart=hsap) Amel GHOUILA (IPT) 22 May, / 35
35 BiomaRt package Other functions getgo : Go id, GO term getomim (Online Mendelian Inheritance in Man, a catalogue of human genes and genetic disorders) : OMIM id, Disease, BioMart id getinterpro : (Interpro is the metabase gathering protein domains information) Interpro id, description getsequence : Retrieves a sequence getsnp gethomolog Amel GHOUILA (IPT) 23 May, / 35
36 BiomaRt package Retrieve sequences Available sequences types in Ensembl Exon Coding sequence protein sequences 3 UTR, 5 UTR Amel GHOUILA (IPT) 24 May, / 35
37 BiomaRt package Retrieve sequences arguments of getseq function Example id : identifier type : type of identifier used : hgnc symbol or affy hg u133 plus 2, etc seqtype : sequence type that needs to be retrieved > agt -getsequence(id= AGT,type= hgnc symbol, seqtype= peptide,mart=ensembl) Retrieve all exons of CDH1 : > seq -getsequence(id= CDH1, type= hgnc symbol, seqtype= gene exon, mart=ensembl) Amel GHOUILA (IPT) 24 May, / 35
38 BiomaRt package Combination of marts and homology detection example getlds function combines two data marts useful for homologous detection Amel GHOUILA (IPT) 25 May, / 35
39 BiomaRt package Combination of marts and homology detection example Example the mouse equivalents of a particular Affy transcript, or of the NOX1 gene > human = usemart( ensembl, dataset = hsapiens gene ensembl ) > mouse = usemart( ensembl, dataset = mmusculus gene ensembl ) > getlds(attributes = c( hgnc symbol, chromosome name, start position ), filters = hgnc symbol, values = NOX1, mart = human,attributesl = c( chromosome name, start position, external gene id ),martl = mouse) Amel GHOUILA (IPT) 25 May, / 35
40 BiomaRt package Useful links Bioinformatics resources OBRC : Online Bioinformatics Resources Collection : http :// Biostar : A high quality question and answer Web site SEQanswers : A discussion and information site for next-generation sequencing http ://omictools.com/ : An informative directory for multi-omic data analysis Rosalind (http ://rosalind.info/) : Platform for learning bioinformatics through problem solving http :// : Guide to Selected Bioinformatics Internet Resources Amel GHOUILA (IPT) 26 May, / 35
41 Data mining tools Plan 1 Introduction 2 BioMarts 3 Biomart on line 4 BiomaRt package 5 Data mining tools Amel GHOUILA (IPT) 27 May, / 35
42 Data mining tools Tools needed for analysis Amel GHOUILA (IPT) 28 May, / 35
43 Data mining tools Tools needed for analysis make sense of all data generated huge amounts of biological data Available Analyse the data to extract new knowledge : Data Mining Vizualisation tools Amel GHOUILA (IPT) 28 May, / 35
44 Data mining tools Data Mining What is data mining? The extraction of knowledge from large amounts of data (han and Kamber, 2006) the automatic process of discovering patterns in data patterns discovered must be meaningful Amel GHOUILA (IPT) 29 May, / 35
45 Data mining tools Data Mining Amel GHOUILA (IPT) 29 May, / 35
46 Data mining tools Data mining techniques various machine learning techniques data selection and cleaning process used to deal with different biological data for discovering new knowledge that can be translated into clinical applications handling noisy and incomplete data and integrating various data sources, are new challenges faced by biologists in the post-genome era Amel GHOUILA (IPT) 30 May, / 35
47 Data mining tools Data mining techniques Amel GHOUILA (IPT) 30 May, / 35
48 Data mining tools Supervised learning Definition inferring a function from labeled training data A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples Amel GHOUILA (IPT) 31 May, / 35
49 Data mining tools Supervised learning Amel GHOUILA (IPT) 31 May, / 35
50 Data mining tools Unsupervised learning Definition trying to find hidden structure in unlabeled data many approaches to unsupervised learning : k-means, neural networks, hierarchical clustering), hidden Markov models, etc. Amel GHOUILA (IPT) 32 May, / 35
51 Data mining tools Unsupervised learning Amel GHOUILA (IPT) 32 May, / 35
52 Data mining tools Unsupervised learning vs Supervised learning Amel GHOUILA (IPT) 33 May, / 35
53 Data mining tools Data mining tools Weka collection of machine learning algorithms for data mining tasks Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization http :// R packages various packages decision trees, Kmeans, Hierarchical clustering, visualization packages, etc.. listing of R packages for data mining http :// Amel GHOUILA (IPT) 34 May, / 35
54 Data mining tools Data mining tools Amel GHOUILA (IPT) 34 May, / 35
55 Data mining tools You did it! Amel GHOUILA (IPT) 35 May, / 35
56 Data mining tools You did it! Amel GHOUILA (IPT) 35 May, / 35
The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28.
Tutorial Module 5 BioMart You will learn about BioMart, a joint project developed and maintained at EBI and OiCR www.biomart.org How to use BioMart to quickly obtain lists of gene information from Ensembl
More informationIntroduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
More informationGenome Viewing. Module 2. Using Genome Browsers to View Annotation of the Human Genome
Module 2 Genome Viewing Using Genome Browsers to View Annotation of the Human Genome Bert Overduin, Ph.D. PANDA Coordination & Outreach EMBL - European Bioinformatics Institute Wellcome Trust Genome Campus
More informationIntroduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationRETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
More informationData Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin
Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)
More informationCourse 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
More informationIndex Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
More informationChapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
More informationLecture 11 Data storage and LIMS solutions. Stéphane LE CROM lecrom@biologie.ens.fr
Lecture 11 Data storage and LIMS solutions Stéphane LE CROM lecrom@biologie.ens.fr Various steps of a DNA microarray experiment Experimental steps Data analysis Experimental design set up Chips on catalog
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationData Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over
More informationAGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
More informationBioinformatics Grid - Enabled Tools For Biologists.
Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis
More informationTutorial. Reference Genome Tracks. Sample to Insight. November 27, 2015
Reference Genome Tracks November 27, 2015 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com Reference
More informationGenBank, Entrez, & FASTA
GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,
More informationChapter 5. Warehousing, Data Acquisition, Data. Visualization
Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
More information<Insert Picture Here> The Evolution Of Clinical Data Warehousing
The Evolution Of Clinical Data Warehousing Srinivas Karri Principal Consultant Agenda Value of Clinical Data Clinical Data warehousing & The Big Data Challenge
More informationData Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
More informationHow To Use The Assembly Database In A Microarray (Perl) With A Microarcode) (Perperl 2) (For Macrogenome) (Genome 2)
The Ensembl Core databases and API Useful links Installation instructions: http://www.ensembl.org/info/docs/api/api_installation.html Schema description: http://www.ensembl.org/info/docs/api/core/core_schema.html
More informationHow to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
More informationorg.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.
org.rn.eg.db December 16, 2015 org.rn.egaccnum Map Entrez Gene identifiers to GenBank Accession Numbers org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank
More information131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationUsing the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova
Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationMaster's projects at ITMO University. Daniil Chivilikhin PhD Student @ ITMO University
Master's projects at ITMO University Daniil Chivilikhin PhD Student @ ITMO University General information Guidance from our lab's researchers Publishable results 2 Research areas Research at ITMO Evolutionary
More informationAbdullah Mohammed Abdullah Khamis
Abdullah Mohammed Abdullah Khamis Jeddah, Saudi Arabia Email: Abdullahkhamis@gmail.com Mobile: +966 567243182 Tel: +966 2 6340699 (Yemeni) Research and Professional Objective To Complete my Ph.D. in Pattern
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationHETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM. Aniket Bochare - aniketb1@umbc.edu. CMSC 601 - Presentation
HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM Aniket Bochare - aniketb1@umbc.edu CMSC 601 - Presentation Date-04/25/2011 AGENDA Introduction and Background Framework Heterogeneous
More informationGenomes and SNPs in Malaria and Sickle Cell Anemia
Genomes and SNPs in Malaria and Sickle Cell Anemia Introduction to Genome Browsing with Ensembl Ensembl The vast amount of information in biological databases today demands a way of organising and accessing
More informationIntroduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
More informationData Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationThe Galaxy workflow. George Magklaras PhD RHCE
The Galaxy workflow George Magklaras PhD RHCE Biotechnology Center of Oslo & The Norwegian Center of Molecular Medicine University of Oslo, Norway http://www.biotek.uio.no http://www.ncmm.uio.no http://www.no.embnet.org
More informationProcessing Genome Data using Scalable Database Technology. My Background
Johann Christoph Freytag, Ph.D. freytag@dbis.informatik.hu-berlin.de http://www.dbis.informatik.hu-berlin.de Stanford University, February 2004 PhD @ Harvard Univ. Visiting Scientist, Microsoft Res. (2002)
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationBioconductor: annotation databases
Bioconductor: annotation databases Thomas Lumley Ken Rice UW Biostatistics Seattle, June 2009 Outline One goal of Bioconductor is to provide efficient access inside R to the genome databases that are vital
More informationPPInterFinder A Web Server for Mining Human Protein Protein Interaction
PPInterFinder A Web Server for Mining Human Protein Protein Interaction Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar
More informationSearch and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social
More informationBIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16
Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationWelcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA
Welcome Xindong Wu Data Mining: Updates in Technologies Dept of Math and Computer Science Colorado School of Mines Golden, Colorado 80401, USA Email: xwu@ mines.edu Home Page: http://kais.mines.edu/~xwu/
More informationGWASrap User Manual v1.1
GWASrap User Manual v1.1 1 / 28 Table of contents Introduction... 3 System Requirements... 3 Welcome... 3 Features... 4 Create New Run... 5 GWAS Representation... 7 GWAS Annotation... 13 GWAS Prioritization...
More informationHidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006
Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm
More informationAn Overview of Database management System, Data warehousing and Data Mining
An Overview of Database management System, Data warehousing and Data Mining Ramandeep Kaur 1, Amanpreet Kaur 2, Sarabjeet Kaur 3, Amandeep Kaur 4, Ranbir Kaur 5 Assistant Prof., Deptt. Of Computer Science,
More informationData Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
More informationDATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
More informationUNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable
More informationLavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs
1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationLearning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationData Integration of Bioinformatics and Web-Based Software Development
Integration of Biological XML data Ph. D. Lecture Bioinformatics & Software Systems Lab. Woo-Hyuk Jang Information and Communications Univ. Where are we? Client-Side Info. Management Business related Issues
More informationClustering through Decision Tree Construction in Geology
Nonlinear Analysis: Modelling and Control, 2001, v. 6, No. 2, 29-41 Clustering through Decision Tree Construction in Geology Received: 22.10.2001 Accepted: 31.10.2001 A. Juozapavičius, V. Rapševičius Faculty
More informationIntegrated Data Mining and Knowledge Discovery Techniques in ERP
Integrated Data Mining and Knowledge Discovery Techniques in ERP I Gandhimathi Amirthalingam, II Rabia Shaheen, III Mohammad Kousar, IV Syeda Meraj Bilfaqih I,III,IV Dept. of Computer Science, King Khalid
More informationDetection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup
Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor
More informationMachine Learning with MATLAB David Willingham Application Engineer
Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the
More informationSyllabus. HMI 7437: Data Warehousing and Data/Text Mining for Healthcare
Syllabus HMI 7437: Data Warehousing and Data/Text Mining for Healthcare 1. Instructor Illhoi Yoo, Ph.D Office: 404 Clark Hall Email: muteaching@gmail.com Office hours: TBA Classroom: TBA Class hours: TBA
More informationIn this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
More informationFluency With Information Technology CSE100/IMT100
Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)
More informationDBTech Pro Workshop. Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining. Georgios Evangelidis
DBTechNet DBTech Pro Workshop Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining Dimitris A. Dervos dad@it.teithe.gr http://aetos.it.teithe.gr/~dad Georgios Evangelidis
More informationDatabase Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
More informationUniversity of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology
University of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology Programme Structure - the MSc outcome will require 180 credits total (full-time only) - 60
More informationSearch Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
More informationSoftware Architecture Document
Software Architecture Document Natural Language Processing Cell Version 1.0 Natural Language Processing Cell Software Architecture Document Version 1.0 1 1. Table of Contents 1. Table of Contents... 2
More informationData Mining and Data Warehousing on US Farmer s Data
Data Mining and Data Warehousing on US Farmer s Data Guide: Dr. Meiliu Lu Presented By, Yogesh Isawe Kalindi Mehta Aditi Kulkarni * Data Warehousing Project * Introduction * Background * Technologies Explored
More informationOpen source framework for data-flow visual analytic tools for large databases
Open source framework for data-flow visual analytic tools for large databases D5.6 v1.0 WP5 Visual Analytics: D5.6 Open source framework for data flow visual analytic tools for large databases Dissemination
More informationData Mining for Fun and Profit
Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools
More informationGrant: LIFE08 NAT/GR/000539 Total Budget: 1,664,282.00 Life+ Contribution: 830,641.00 Year of Finance: 2008 Duration: 01 FEB 2010 to 30 JUN 2013
Coordinating Beneficiary: UOP Associated Beneficiaries: TEIC Project Coordinator: Nikos Fakotakis, Professor Wire Communications Laboratory University of Patras, Rion-Patras 26500, Greece Email: fakotaki@upatras.gr
More informationData Mining Fundamentals
Part I Data Mining Fundamentals Data Mining: A First View Chapter 1 1.11 Data Mining: A Definition Data Mining The process of employing one or more computer learning techniques to automatically analyze
More informationData Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1
Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints
More informationData integration for metagenomics: current status and future plans
integration for metagenomics: current status and future plans Neil Wipat Computing Science University of Newcastle NERC Microbial Metagenomics Overview metamicrobase Current method of data integration
More informationBayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More informationInner Classification of Clusters for Online News
Inner Classification of Clusters for Online News Harmandeep Kaur 1, Sheenam Malhotra 2 1 (Computer Science and Engineering Department, Shri Guru Granth Sahib World University Fatehgarh Sahib) 2 (Assistant
More informationConverting GenMAPP MAPPs between species using homology
Converting GenMAPP MAPPs between species using homology 1 Introduction and Background 2 1.1 Fundamental principles of the GenMAPP Gene Database 2 1.1.1 Gene Database data types 2 1.1.2 GenMAPP System Codes
More informationOracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.
Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse
More informationPentaho Data Mining Last Modified on January 22, 2007
Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org
More informationMachine Learning Capacity and Performance Analysis and R
Machine Learning and R May 3, 11 30 25 15 10 5 25 15 10 5 30 25 15 10 5 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 100 80 60 40 100 80 60 40 100 80 60 40 30 25 15 10 5 25 15 10
More informationData Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction
Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration
More informationHealthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
More informationDelivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days
or 2008 Five Days Prerequisites Students should have experience with any relational database management system as well as experience with data warehouses and star schemas. It would be helpful if students
More informationSURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH
330 SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH T. M. D.Saumya 1, T. Rupasinghe 2 and P. Abeysinghe 3 1 Department of Industrial Management, University of Kelaniya,
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationPredicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
More informationSemi-Supervised and Unsupervised Machine Learning. Novel Strategies
Brochure More information from http://www.researchandmarkets.com/reports/2179190/ Semi-Supervised and Unsupervised Machine Learning. Novel Strategies Description: This book provides a detailed and up to
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationWeb-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, kai.xu@nicta.com.au Abstract. Despite the dramatic growth of online genomic
More informationfrom Larson Text By Susan Miertschin
Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.
More informationData Mining Governance for Service Oriented Architecture
Data Mining Governance for Service Oriented Architecture Ali Beklen Software Group IBM Turkey Istanbul, TURKEY alibek@tr.ibm.com Turgay Tugay Bilgin Dept. of Computer Engineering Maltepe University Istanbul,
More informationIntroduction to Pattern Recognition
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationSubject Description Form
Subject Description Form Subject Code Subject Title COMP417 Data Warehousing and Data Mining Techniques in Business and Commerce Credit Value 3 Level 4 Pre-requisite / Co-requisite/ Exclusion Objectives
More informationCOC131 Data Mining - Clustering
COC131 Data Mining - Clustering Martin D. Sykora m.d.sykora@lboro.ac.uk Tutorial 05, Friday 20th March 2009 1. Fire up Weka (Waikako Environment for Knowledge Analysis) software, launch the explorer window
More informationReplacing TaqMan SNP Genotyping Assays that Fail Applied Biosystems Manufacturing Quality Control. Begin
User Bulletin TaqMan SNP Genotyping Assays May 2008 SUBJECT: Replacing TaqMan SNP Genotyping Assays that Fail Applied Biosystems Manufacturing Quality Control In This Bulletin Overview This user bulletin
More informationEFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
More informationMobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
More information