Bioinformatics for Informatics Vinicius Vielmo Cogo. Presented in UFSM, Santa Maria, RS, BR. April 08, 2014.

Size: px
Start display at page:

Download "Bioinformatics for Informatics Vinicius Vielmo Cogo. Presented in UFSM, Santa Maria, RS, BR. April 08, 2014."

Transcription

1 Bioinformatics for Informatics Vinicius Vielmo Cogo Presented in UFSM, Santa Maria, RS, BR. April 08, 2014.

2 Context The presenter: is not a bioinformatician is a computer scientist from Inf2006-UFSM is a PhD. student in the ULisboa is a researcher at the LaSIGE (Navigators group ) has as main research areas: Distributed systems Dependability Fault Tolerance Security But why are you going to present about bioinformatics? 2

3 Context Interesting area Many open problems Opportunities to acquire and apply knowledge Opportunities from projects Bachelor in Computer Science (UFSM) Master in Informatics (ULisboa) PhD. in Informatics (ULisboa) PET-CC FTH-GRID CloudFIT BiobankCloud PET-CC LSC Tclouds 3

4 Context is not an introduction to bioinformatics is not a study of is a different way to look to bioinformatics, from an informatics perspective 4

5 What is bioinformatics? 5

6 What is bioinformatics? 6

7 What is bioinformatics? is the study of life and living organisms 7

8 What is bioinformatics? 8

9 What is bioinformatics? Mathematics Astronomy Physics Biology Ethics Chemistry Informatics Statistics 9

10 What is bioinformatics? Mathematics Astronomy Physics Biology Ethics Chemistry Informatics Statistics 10

11 What is bioinformatics? Biology Informatics 11

12 What is bioinformatics? Biology Bioinformatics? Informatics 12

13 What is bioinformatics? Biology Bioinformatics Informatics Computational Biology Biological Computing Modeling biological virtual systems and organs. Searching the optimal path in a graph based on ants behavior 13

14 What is bioinformatics? Biology Bioinformatics Informatics Bioinformatics is an informatics discipline for storing, retrieving, organizing and analyzing biological data. 14

15 Motivation But what makes the biological data so special? 15

16 Motivation But what makes the biological data so special? Data complexity Size Quantity Meaning Incomplete and uncertain data 16

17 Outline 1. Size 2. Quantity 3. Meaning 17

18 1. Size

19 DNA TCTCATGCATACGCAGCAGGTCAGGCAT DNA is: A very large string Composed only by A, C, G, and T characters 19

20 Human DNA From human beings to DNA trillion pairs 46 2 molecules A human being Cells Nucleus Chromosomes DNA 210 types of cells 300 million die every minute 20

21 Human DNA TCTCATGCATACGCAGCAGGTCAGGCAT Human DNA: 3.2 B characters (bp, base-pairs) 21

22 Genome size Organism Genome size Picture Porcine circovirus type K Drosophila melanogaster 130 M Mus musculus 2.7 B Homo sapiens 3.2 B Zea mays 5 B Marbled lungfish 130 B 22

23 Printing a human genome Times New Roman 12 pt 2622 bp per page 1.2 M single-sided pages 2415 reams (500 sheets) 129 meters 129 m 93 m 30 m 23

24 Printing a human genome Leighton Pritchard 4.5 pt Double-sided pages 120 book volumes 1 bookshelf 24

25 Loading a human genome to memory Does it work? char[] humandna = char[ ]; 25

26 Loading a human genome to memory Does it work? char[] humandna = char[ ]; In most programming languages: No Most data structures are indexed by signed int 2.1 B chars = (2^31-1 = ) Solutions? 26

27 Loading a human genome to memory char[] humandna = char[ ]; Solutions: Unsigned int: (2^32-1) 64-bit indexes: (2^63-1) Matrices: humandna = char[100][ ] 27

28 Loading a human genome to memory char[] humandna = char[ ]; Solutions: Unsigned int: (2^32-1) 64-bit indexes: (2^63-1) Matrices: humandna = char[100][ ] Split in chromosomes: Map<Key, Value> Map<chrName, chrsequence> 28

29 Human chromosomes The human genome is not a contiguous DNA 29

30 Storing a human genome in a file Entire human genome 3.2 B chars 3 GB UTF-8 or ISO char = 1-byte (8-bits) 30

31 Storing a human genome in a file FASTA format Broadly used Accepts comments (>) Store also incomplete or small sequences > HG00339 chr1 ATGCATCGTACGCTCATCATGCAGC AGACTGCTAGCTGATGGTACGTCAT GCAGTCGTCAGTCAGTCAGTACATC > HG00339 chr2 TCGTACGCTCATCATGCAGCAGGTC AGTACCATGGTCAGACTCATTGCAA CTGCTAGCTGATGATCGGCATAGT > HG00339 chr22 CAGTACGTCAGTGCTAGCTAGCTAC GATCGATCGACTGATACGATC > HG00339 chry GGGTACTCATGCAGTCAGTCAGTTG CAGTCAGTCAGTCGCAGTCA 31

32 Storing a human genome in a file 1 char = 2-bits 770 MB Entire human genome 3.2 B chars A = 00 C = 01 G = 10 T = 11 32

33 Storing a human genome in a file 2-bit format Not human readable (normal humans) There are other characters than A, C, G and T E.g.: N for any. Nomenclature for incompletely specified bases in nucleic acid sequences IUPAC How to reduce even more?

34 Compressing a human genome file ZIP Compression 830 MB (in 10min) 34

35 Compressing a human genome file Referential Compression Humans are 99.9% genetically equals 7 MB (in 30s) 35

36 Sequencing a human genome Reads Genomes are not sequenced in one contiguous line Reads contains bp each Human genome reference Resulting aligned genome 36

37 Sequencing a human genome Sequenced Genome 180 GB > HG00339 read bp ATGCATCGTACGCTCATCATGCAGC +!''*((((***+))%%%++)(%%%%).1*** > HG00339 read bp TCGTACGCTCATCATGCAGCAGGTC + %%%).1****''))**55CCF>>>>>>CC data FASTQ format > HG00339 read bp CAGTACGTCAGTGCTAGCTAGCTAC + )(%%%%).1***''))**5CCCC?CCC65 37

38 Resuming Human genome = string of 3.2B chars Size of creatures does not determine genome size Normally in memory, a genome is divided into 23 chromosomes In a file: 3 GB in FASTA 770 MB in 2-bits 830 MB when compressed with ZIP 7 MB when using referential compression 180 GB in FASTQ 38

39 2. Quantity

40 Problem statement Huge Big data data 40

41 Quantity The number of sequenced genomes is exponentially increasing Caused by the decreasing on: time and cost 1 st Human Genome Currently Year Cost US$ 3 billion US$ Time 13 years (2001) < 1 hour 41

42 From first to next-generation sequencing Sanger-based chemistries and capillary-based instruments sequencing platforms Next-Generation sequencing platforms (NGS) 42

43 Cost per human genome 43

44 Cost x quantity 44

45 US$1000 (2014) HiSeq X Ten 45

46 NGS around the world 46

47 US$100 (?) 47

48 US$30 (?) 48

49 Projects gathering genomes 49

50 Physical storage in biobanks 50

51 Physical storage in biobanks 51

52 Digital storage in e-biobanks Biobanks store also health and behavior data Pressure to store also sequenced DNA Researchers need a lot of data (shared) 52

53 Digital storage in e-biobanks Biobanks will soon start using cloud computing Illumina Basespace BiobankCloud is creating a platform for biobanks efficiently and securely use cloud facilities 53

54 Neonatal heel prick Newborn screening Detect congenital diseases Hypotiroidism Cystic fibrosis Implemented all over the world Employed in demographics for birth rates What about newborn genome sequencing? 54

55 Genomes of a population How much disk space do we need to store all genomes of an entire population? 1-person 1 Population Santa Maria, RS Porto Alegre, RS Rio Grande do Sul São Paulo, SP Brazil Latin America World 270 K 1.5 M 10.7 M 11.3 M 201 M 589 M 7 B 55

56 Genomes of a population How much disk space do we need to store all genomes of an entire population? Population Compressed Size with 2-bits Size with 8-bits WGS 1-person 1 7 MB 770 MB 3 GB 180 GB Santa Maria, RS Porto Alegre, RS Rio Grande do Sul São Paulo, SP Brazil Latin America World 270 K 1.5 M 10.7 M 11.3 M 201 M 589 M 7 B B KB MB GB TB PB EB ZB YB 56

57 Genomes of a population How much disk space do we need to store all genomes of an entire population? Population Compressed Size with 2-bits Size with 8-bits WGS 1-person 1 7 MB 770 MB 3 GB 180 GB Santa Maria, RS 270 K 1.8 TB 200 TB 790 TB 46 PB Porto Alegre, RS 1.5 M 10 TB 1.1 PB 4.3 PB 260 PB Rio Grande do Sul 10.7 M 72 TB 7.7 PB 30 PB 1.8 EB São Paulo, SP 11.3 M 75 TB 8.1 PB 32 PB 1.9 EB Brazil 201 M 1.3 PB 144 PB 575 PB 34 EB Latin America 589 M 3.8 PB 422 PB 1.6 EB 100 EB World 7 B 45 PB 5 EB 20 EB 1.1 ZB B KB MB GB TB PB EB ZB YB 57

58 Genomes of a population Current storage capacity of a hard disk? 58

59 Genomes of a population Current storage capacity of a hard disk? 6 TB WD Ultrastar He6 (US$750) Compressed Size with 2-bits Size with 8-bits WGS 1-person 7 MB 770 MB 3 GB 180 GB Population 898 K x SM B KB MB GB TB PB EB ZB YB 59

60 Genomes of a population Current storage capacity of a data center? 60

61 Genomes of a population Current storage capacity of a data center? 1 PB 350 Seagate of 3TB (US$150 each) (US$50.000) Compressed Size with 2-bits Size with 8-bits WGS 1-person 7 MB 770 MB 3 GB 180 GB Population 153 M 1.3 M 350 K 5825 SM B KB MB GB TB PB EB ZB YB 61

62 Genomes of a population Current storage capacity of a data center? 500 PB BackBlaze, Inc. Compressed Size with 2-bits Size with 8-bits WGS 1-person 7 MB 770 MB 3 GB 180 GB Population 76 B 697 M 174 M 2.9 M 9x World Lat. Am. POA B KB MB GB TB PB EB ZB YB 62

63 Genomes of a population Current storage capacity of the entire world? 63

64 Genomes of a population Current storage capacity of the entire world? 8.2 ZB 295 EB in 2007 law Compressed Size with 2-bits Size with 8-bits WGS 1-person 7 MB 770 MB 3 GB 180 GB Population T 50 B 400x World 7x World B KB MB GB TB PB EB ZB YB 64

65 Genomes of a population Near future storage capacity of a data center? 50 ZB (or 1 YB?) NSA data center in Bluffdale, UT, US Compressed Size with 2-bits Size with 8-bits WGS 1-person 7 MB 770 MB 3 GB 180 GB Population B 40x World B KB MB GB TB PB EB ZB YB 65

66 Resuming Sequencing cost and time will continue to fall Huge data: size*quantity Storage capacity is increasing (not at the same pace as sequencing) 66

67 3. Meaning

68 Meaning GATTCGTACTGACTACGCAGCAGGT What does it do? What does it produces? How does it looks like? Is it related with a specific disease? Does it interact with other components? Does it interact with some medicine? 68

69 Meaning CATTCATCATGCATACGCAGCAGGT What does it mean individual? (Personalized medicine) population? (Public-health genomics) human kind? (Science) 69

70 DNA Lowest structure present in all living organisms High expectation for medicine DNA cannot tell everything about your future DNA is not the only variable causing diseases Behavior and environment interfere But, DNA still plays an important role 70

71 From DNA to meaning DNA 98% Noncoding 2% Coding Amino acids Genes Proteins 71

72 3.1. Comparing sequences Do you see any difference in these sequences? CATTCATCATGCATACGGACTGCAGCAGGT CATTCATTATGCATACGGACTGCAGCAGGT 72

73 3.1. Comparing sequences Do you see any difference in these sequences? CATTCATCATGCATACGGACTGCAGCAGGT CATTCATTATGCATACGGACTGCAGCAGGT 73

74 3.1. Comparing sequences Comparison of two or more sequences Similarity, similarity and similarity Similar sequences may have similar genes, proteins, structures and functions Percent identity as metric Considering small variations (SNPs, insertions, deletions) Search terms: Hamming distance, Edit distance (Levenshtein), sequence alignment, Needleman-Wunsch algorithm, Smith and Waterman algorithm, substitution matrices, BLAST algorithm, seedand-extend, suffix tree, homologue sequences 74

75 3.2. Assembling DNA fragments In which chromosome can we find this sequence? In which position? CATTCATCATGCATACGGACTGCAGCAGGT 75

76 NGS Reads 3.2. Assembling DNA fragments Human genome reference Resulting aligned genome 76

77 3.2. Assembling DNA fragments 77

78 3.2. Assembling DNA fragments Assembling DNA to create a contiguous genome It may be tricky DNA has many repetitive regions Orientation, repeats and sequencing errors Search terms: Alignment algorithms, de-novo assembly, sequence mapping, MAQ, SHRiMP, Bowtie, GNUMAP, MUMmer, BWA, Slider, SOAP3, SNAP 78

79 3.3. Constructing evolutionary trees The 1 st sequence is more similar to the 2 nd or the 3 rd? CATTACGGACGCATCATCATGCAGCAGGTG CATTACGGACGTATCATCATCCAGCAGGTG CATTACGGACGGATCATCATACAGCAGGTG 79

80 3.3. Constructing evolutionary trees 80

81 3.3. Constructing evolutionary trees All about evolution and again similarity Comparing sequences from different organisms Groups organisms by similarity They show how organisms evolve in time Multiple alignments may generate phylogenic trees Recursively compare sequences until find a common ancestor Search terms: Evolutionary similarity, multiple alignments, tree of life, phylogenetic tree, Clustal, T- coffee, Muscle, ProbCons, COBALT, UniProt database, JalView, Logos, PSI-BLAST 81

82 3.4. Detecting patterns in sequences Do you see any pattern in this sequence? CGTTACGGACGCATCATCATGCAGCAGGTG 82

83 3.4. Detecting patterns in sequences Do you see any pattern in this sequence? CGTTACGGACGCATCATCATGCAGCAGGTG 83

84 3.4. Detecting patterns in sequences Delimit parts of sequences that have a biological meaning Detect the begin and end of a gene Detect the begin and end of shapes Search terms: Conserved domains, motifs, regular expressions, Fuzzy regular expressions, Hidden Markov Models (HMMs), InterPro database, PROSITE, multiple alignments, grammars, PCR analysis 84

85 3.5. Determining structure from sequence Which is the structure of this sequence in real life? CATTACGGACGCATCATCATGCAGCAGGTG 85

86 3.5. Determining structure from sequence Which is the structure of this sequence in real life? Primary Secondary Tertiary 86

87 3.5. Determining structure from sequence Structure is related with function Structure: Primary Secondary Tertiary Quaternary Search terms: Conserved domains, motifs, folding, PDB, 3D transformations, protein alignment, intermolecular distances, intramolecular distance, RMSD, DALI, SSAP, graph theory, abinitio methods, fold recognition, threading, neural networks, machine learning, support vector machines, random forests, lowest energy algorithms, ROSETTA, CASP challenge 87

88 3.6. Inferring cell regulation Does the 1 st sequence interfere in the 2 nd? CATTACGGACGCATCATCATGCAGCAGGTG 88

89 3.6. Inferring cell regulation Genes interact with each other Non-coding regions of DNA are important for regulation Depending on geographic position of a cell and neighbors to turn on/off genes Search terms: Cell regulation, junk DNA, turn on/off genes, pathways, Bayesian networks, Support vector machines, clustering algorithms, reducing dimensionality, simulation and modeling, discrete and continuous models, epigenomics 89

90 3.7. Determining protein function What does this sequence? Does it define my hair color? CATTACGGACGCATCATCATGCAGCAGGTG 90

91 3.7. Determining protein function 91

92 3.7. Determining protein function Body measurement 92

93 3.7. Determining protein function Cardiovascular disorder 93

94 3.7. Determining protein function Most challenging topic Depend on all previous topics Human annotations for protein functions Search terms: Annotating DNA, ontologies, natural language processing (NLP), motifs, conserved domains, gene ontology, OBO, BioOntologies, UMLS, term-for-term analysis, data mining, directed acyclic graph, text mining, semantic web 94

95 3.8. Languages Workloads everywhere Step-by-step to achieve the final goal Optimization, parallel programming Python, Perl and Bash Java, C, C++ R Search terms: Workload, programming languages, script languages, dynamic programming, design pattersn, gang of four, regular expressions, performance evaluation 95

96 3.9. Other important topics Database design, development, management Natural language processing (NLP) Graphics and graphical interfaces Distributed systems Security 96

97 3.10. Privacy The individual right of maintaining private undisclosed information Genomes contain private information 97

98 3.10. Our approach Does it contains any private information? CATTCATCATGCATACGGACTGCAGCAGGT 98

99 3.10. Disclosure filter for genetic data 7M bp/s * Throughput with a single core. Scales until approximately 300 M bp/s in some cases. 99

100 3.11. Other problems ELSI Ethical Legal Social implications Genism Genetic discrimination Targeted attacks Loss of reputation Data leakage Privacy problems 100

101 Resuming Obtaining meaning from DNA is Difficult Complex Time consuming Painful Similarity is important Comparison with what is already known 101

102 Final remarks Bioinformatics: interesting area many open problems opportunities to acquire and apply knowledge opportunities from projects reachable for students from all semesters Informatics: improve biology area improved by working on biological data 102

103 CC and SI disciplines Disciplinas CC SI Cálculo A 1 1 Circuitos Digitais 1 1 Lógica e Algoritmo 1 1 * Laboratório de Programação I 1 1 * Introdução à CC e SI 1 1 Geometria Analítica 1 - Vectors, 3D representation, torsion angles Teoria Geral da Administração - 1 Estruturas de Dados 2 2 Trees, graphs, probabilistic data structures Organização de Computadores 2 2 Laboratório de Programação II 2 2 * Matemática Discreta 2 2 Álgebra Linear 2 - Língua Inglesa Instrumental I 2 - Teoria Econômica - 2 Introdução a Ciencia da Informação - 2 * 103

104 CC and SI disciplines Disciplinas CC SI Arquitetura de Computadores 3 3 Estatística 3 3 GWAS, microarray, HMMs Pesquisa e Ordenação de Dados 3 3 DAG, pairwise alignment Paradigmas de Programação 3 3 * Cálculo B 3 - Língua Inglesa Intrumental II 3 - Gestão de Pessoas A - 3 Fundamentos de Bancos de Dados 4 4 Designing DB for biological data Engenharia de Software 4 3 Gang of Four, Design patterns Sistemas Operacionais 4 4 Compression, running workloads, scheduling Métodos Numéricos e Computacionais 4 - Eletricidade e Magnetismo 4 - Projeto e Análise de Algoritmos 4 - Optimization Marketing - 4 Engenharia Econômica - 4 Gerência de Projetos de Software

105 CC and SI disciplines Disciplinas CC SI Qualidade de Software 5 5 Comunicação de Dados 5 - Optimizing data transfer Computação Gráfica 5-3D structure representation, torsion angles Linguagens Formais 5 - Lógica de Predicados 5 - Implementação de Bancos de Dados 5 - Database for biological data Projeto e Gerência de Banco de Dados - 5 Database for biological data Custo A - 5 Redes de Computadores 6 6 Computadores e Sociedade 6 4 ELSI of genomics, privacy Empreendedorismo 6 6 Implementação de Linguagens de Programação 6 - Teoria da Computação 6 - Projeto de Software I

106 CC and SI disciplines Disciplinas CC SI Sistemas Distribuídos 7 7 Distributed services and data access Metodologia de Pesquisa em Informática 7 7 Defining experiments and writing articles Inteligência Artificial 7 - Gene finding, secondary structure Compiladores 7 - Projeto de Software II - 7 Sistemas de Computação Móvel DCG DCG Mobile solutions for storage and processing Middleware para Sistemas Distribuídos DCG DCG Distributed protocols Interface Humano-Computador DCG 5 ELSI of genomics, interfaces for biologists Processamento Estruturado de Documentos Métodos Computacionais Aplicados à Educação DCG - DCG - Processamento de Imagens DCG - Microarrays, diagnosis annotation Empreendedorismo DCG - Fundamentos de Tolerância a Falhas DCG - 106

107 CC and SI disciplines Disciplinas CC SI Programação Paralela DCG - Parallelize bioinformatics algorithms Visão Computacional DCG - Programação de Jogos 3D DCG - Computação Gráfica Avançada DCG - Projeto de Sistemas Digitais DCG - Concepção de Circuitos Integrados DCG - Bases da Gestão Eletrônica de Documento - DCG LIMS, management of donor consents Desenvolvimento de Software Para A Web - DCG Web services and data access Linguagens de Marcação Extensíveis - DCG Ontologies Modelagens de Processos de Negócios - DCG Preparação para Maratona de Programação I - DCG Sistemas Colaborativos - DCG Mineração de Dados - DCG Gene finding, secondary struct., clustering Sistemas Inteligentes - DCG Análise e Projeto de Sistemas Orientados a Objetos - DCG * 107

108 Final remarks exciting problems to work on Donald Knuth,

109 Further reading Bioinformatics An introduction for Computer Scientists. Jacques Cohen / Bioinformatics: Genes, Proteins & Computers. Christine Orengo David Jones Janet Thornton Introduction to Bioinformatics. Arthur M. Lesk Bioinformatics for Dummies. Jean-Michel Claverie Cedric Notredame Thank you! Vinicius Vielmo Cogo 109

Agreement on Dual Degree Bachelor Program in Computer Science. Universidade Federal do Rio Grande do Sul. Technische Universität Berlin

Agreement on Dual Degree Bachelor Program in Computer Science. Universidade Federal do Rio Grande do Sul. Technische Universität Berlin Agreement on Dual Degree Bachelor Program in Computer Science between Universidade Federal do Rio Grande do Sul Instituto de Informática and Technische Universität Berlin School of Electrical Engineering

More information

MSc on Applied Computer Science

MSc on Applied Computer Science MSc on Applied Computer Science (Majors in Security and Computational Biology) Amílcar Sernadas, Ana Teresa Freitas, Arlindo Oliveira, Isabel Sá Correia May 18, 2006 1 Context The new model for high education

More information

Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1

Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1 Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat

More information

Analysis of NGS Data

Analysis of NGS Data Analysis of NGS Data Introduction and Basics Folie: 1 Overview of Analysis Workflow Images Basecalling Sequences denovo - Sequencing Assembly Annotation Resequencing Alignments Comparison to reference

More information

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:

More information

Bio-Informatics Lectures. A Short Introduction

Bio-Informatics Lectures. A Short Introduction Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively

More information

Version 5.0 Release Notes

Version 5.0 Release Notes Version 5.0 Release Notes 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com

More information

Bioinformatics Grid - Enabled Tools For Biologists.

Bioinformatics Grid - Enabled Tools For Biologists. Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis

More information

Guidelines for Establishment of Contract Areas Computer Science Department

Guidelines for Establishment of Contract Areas Computer Science Department Guidelines for Establishment of Contract Areas Computer Science Department Current 07/01/07 Statement: The Contract Area is designed to allow a student, in cooperation with a member of the Computer Science

More information

Core Bioinformatics. Titulació Tipus Curs Semestre. 4313473 Bioinformàtica/Bioinformatics OB 0 1

Core Bioinformatics. Titulació Tipus Curs Semestre. 4313473 Bioinformàtica/Bioinformatics OB 0 1 Core Bioinformatics 2014/2015 Codi: 42397 Crèdits: 12 Titulació Tipus Curs Semestre 4313473 Bioinformàtica/Bioinformatics OB 0 1 Professor de contacte Nom: Sònia Casillas Viladerrams Correu electrònic:

More information

Doctor of Philosophy in Computer Science

Doctor of Philosophy in Computer Science Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects

More information

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16 Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems

More information

Guide for Bioinformatics Project Module 3

Guide for Bioinformatics Project Module 3 Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first

More information

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper

More information

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf]) 820 REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf]) (See also General Regulations) BMS1 Admission to the Degree To be eligible for admission to the degree of Bachelor

More information

14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) Overview Kyrre Glette kyrrehg@ifi INF3490 Swarm Intelligence Particle Swarm Optimization Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) 3 Swarms in nature Fish, birds,

More information

UGENE Quick Start Guide

UGENE Quick Start Guide Quick Start Guide This document contains a quick introduction to UGENE. For more detailed information, you can find the UGENE User Manual and other special manuals in project website: http://ugene.unipro.ru.

More information

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0

More information

Pairwise Sequence Alignment

Pairwise Sequence Alignment Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What

More information

Curriculum Reform in Computing in Spain

Curriculum Reform in Computing in Spain Curriculum Reform in Computing in Spain Sergio Luján Mora Deparment of Software and Computing Systems Content Introduction Computing Disciplines i Computer Engineering Computer Science Information Systems

More information

Master's projects at ITMO University. Daniil Chivilikhin PhD Student @ ITMO University

Master's projects at ITMO University. Daniil Chivilikhin PhD Student @ ITMO University Master's projects at ITMO University Daniil Chivilikhin PhD Student @ ITMO University General information Guidance from our lab's researchers Publishable results 2 Research areas Research at ITMO Evolutionary

More information

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,

More information

Human Genome and Human Genome Project. Louxin Zhang

Human Genome and Human Genome Project. Louxin Zhang Human Genome and Human Genome Project Louxin Zhang A Primer to Genomics Cells are the fundamental working units of every living systems. DNA is made of 4 nucleotide bases. The DNA sequence is the particular

More information

Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004

Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004 Protein & DNA Sequence Analysis Bobbie-Jo Webb-Robertson May 3, 2004 Sequence Analysis Anything connected to identifying higher biological meaning out of raw sequence data. 2 Genomic & Proteomic Data Sequence

More information

Next Generation Sequencing

Next Generation Sequencing Next Generation Sequencing Technology and applications 10/1/2015 Jeroen Van Houdt - Genomics Core - KU Leuven - UZ Leuven 1 Landmarks in DNA sequencing 1953 Discovery of DNA double helix structure 1977

More information

Core Bioinformatics. Degree Type Year Semester

Core Bioinformatics. Degree Type Year Semester Core Bioinformatics 2015/2016 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat Teachers Use of

More information

Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008

Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008 Professional Organization Checklist for the Computer Science Curriculum Updates Association of Computing Machinery Computing Curricula 2008 The curriculum guidelines can be found in Appendix C of the report

More information

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries First Semester Development 1A On completion of this subject students will be able to apply basic programming and problem solving skills in a 3 rd generation object-oriented programming language (such as

More information

01219211 Software Development Training Camp 1 (0-3) Prerequisite : 01204214 Program development skill enhancement camp, at least 48 person-hours.

01219211 Software Development Training Camp 1 (0-3) Prerequisite : 01204214 Program development skill enhancement camp, at least 48 person-hours. (International Program) 01219141 Object-Oriented Modeling and Programming 3 (3-0) Object concepts, object-oriented design and analysis, object-oriented analysis relating to developing conceptual models

More information

Genome Explorer For Comparative Genome Analysis

Genome Explorer For Comparative Genome Analysis Genome Explorer For Comparative Genome Analysis Jenn Conn 1, Jo L. Dicks 1 and Ian N. Roberts 2 Abstract Genome Explorer brings together the tools required to build and compare phylogenies from both sequence

More information

Cancer Genomics: What Does It Mean for You?

Cancer Genomics: What Does It Mean for You? Cancer Genomics: What Does It Mean for You? The Connection Between Cancer and DNA One person dies from cancer each minute in the United States. That s 1,500 deaths each day. As the population ages, this

More information

Databases and mapping BWA. Samtools

Databases and mapping BWA. Samtools Databases and mapping BWA Samtools FASTQ, SFF, bax.h5 ACE, FASTG FASTA BAM/SAM GFF, BED GenBank/Embl/DDJB many more File formats FASTQ Output format from Illumina and IonTorrent sequencers. Quality scores:

More information

Next generation sequencing (NGS)

Next generation sequencing (NGS) Next generation sequencing (NGS) Vijayachitra Modhukur BIIT modhukur@ut.ee 1 Bioinformatics course 11/13/12 Sequencing 2 Bioinformatics course 11/13/12 Microarrays vs NGS Sequences do not need to be known

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Focusing on results not data comprehensive data analysis for targeted next generation sequencing Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes

More information

A Primer of Genome Science THIRD

A Primer of Genome Science THIRD A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:

More information

Doctor of Philosophy in Informatics

Doctor of Philosophy in Informatics Doctor of Philosophy in Informatics 2014 Handbook Indiana University established the School of Informatics and Computing as a place where innovative multidisciplinary programs could thrive, a program where

More information

Eoulsan Analyse du séquençage à haut débit dans le cloud et sur la grille

Eoulsan Analyse du séquençage à haut débit dans le cloud et sur la grille Eoulsan Analyse du séquençage à haut débit dans le cloud et sur la grille Journées SUCCES Stéphane Le Crom (UPMC IBENS) stephane.le_crom@upmc.fr Paris November 2013 The Sanger DNA sequencing method Sequencing

More information

Introduction to Bioinformatics 3. DNA editing and contig assembly

Introduction to Bioinformatics 3. DNA editing and contig assembly Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 matthewb@ba.ars.usda.gov

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources 1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools

More information

Next Generation Sequencing: Technology, Mapping, and Analysis

Next Generation Sequencing: Technology, Mapping, and Analysis Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University gbenson@bu.edu http://tandem.bu.edu/ The Human Genome Project took

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

More information

Bachelor Curriculum in cooperation with

Bachelor Curriculum in cooperation with K 0/675 Bachelor Curriculum in cooperation with Přírodovědecká fakulta, PRF, Faculty of Science Jihočeská univerzita, University of South Bohemia in Budweis (České Budějovice) Česká republika, Czech Republic

More information

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want 1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very

More information

Bachelor Curriculum in cooperation with

Bachelor Curriculum in cooperation with K 0/675 Bachelor Curriculum in cooperation with Přírodovědecká fakulta, PRF, Faculty of Science Jihočeská univerzita, University of South Bohemia in Budweis (České Budějovice) Česká republika, Czech Republic

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

An Interdepartmental Ph.D. Program in Computational Biology and Bioinformatics:

An Interdepartmental Ph.D. Program in Computational Biology and Bioinformatics: An Interdepartmental Ph.D. Program in Computational Biology and Bioinformatics: The Yale Perspective Mark Gerstein, Ph.D. 1,2, Dov Greenbaum 1, Kei Cheung, Ph.D. 3,4,5, Perry L. Miller, M.D., Ph.D. 3,4,6

More information

Syllabus of B.Sc. (Bioinformatics) Subject- Bioinformatics (as one subject) B.Sc. I Year Semester I Paper I: Basic of Bioinformatics 85 marks

Syllabus of B.Sc. (Bioinformatics) Subject- Bioinformatics (as one subject) B.Sc. I Year Semester I Paper I: Basic of Bioinformatics 85 marks Syllabus of B.Sc. (Bioinformatics) Subject- Bioinformatics (as one subject) B.Sc. I Year Semester I Paper I: Basic of Bioinformatics 85 marks Semester II Paper II: Mathematics I 85 marks B.Sc. II Year

More information

Introduction to NGS data analysis

Introduction to NGS data analysis Introduction to NGS data analysis Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Sequencing Illumina platforms Characteristics: High

More information

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham,

More information

Hadoopizer : a cloud environment for bioinformatics data analysis

Hadoopizer : a cloud environment for bioinformatics data analysis Hadoopizer : a cloud environment for bioinformatics data analysis Anthony Bretaudeau (1), Olivier Sallou (2), Olivier Collin (3) (1) anthony.bretaudeau@irisa.fr, INRIA/Irisa, Campus de Beaulieu, 35042,

More information

T cell Epitope Prediction

T cell Epitope Prediction Institute for Immunology and Informatics T cell Epitope Prediction EpiMatrix Eric Gustafson January 6, 2011 Overview Gathering raw data Popular sources Data Management Conservation Analysis Multiple Alignments

More information

Bioinformatics: course introduction

Bioinformatics: course introduction Bioinformatics: course introduction Filip Železný Czech Technical University in Prague Faculty of Electrical Engineering Department of Cybernetics Intelligent Data Analysis lab http://ida.felk.cvut.cz

More information

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) 244 REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) (See also General Regulations) Any publication based on work approved for a higher degree should contain a reference

More information

CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/

CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/ CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu liwz@sdsc.edu 1. Introduction

More information

New solutions for Big Data Analysis and Visualization

New solutions for Big Data Analysis and Visualization New solutions for Big Data Analysis and Visualization From HPC to cloud-based solutions Barcelona, February 2013 Nacho Medina imedina@cipf.es http://bioinfo.cipf.es/imedina Head of the Computational Biology

More information

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment Tutorial for Windows and Macintosh Preparing Your Data for NGS Alignment 2015 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) 1.734.769.7249

More information

UF EDGE brings the classroom to you with online, worldwide course delivery!

UF EDGE brings the classroom to you with online, worldwide course delivery! What is the University of Florida EDGE Program? EDGE enables engineering professional, military members, and students worldwide to participate in courses, certificates, and degree programs from the UF

More information

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:

More information

How To Get A Masters Degree In Biomedical Informatics

How To Get A Masters Degree In Biomedical Informatics UNIVERSITY OF COLOMBO POSTGRADUATE INSTITUTE OF MEDICINE OF SRI LANKA Drafted and printed by the Board of Study in Multi Disciplinary Study Courses and the Postgraduate Institute of Medicine, University

More information

Biomedical Big Data and Precision Medicine

Biomedical Big Data and Precision Medicine Biomedical Big Data and Precision Medicine Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago October 8, 2015 1 Explosion of Biomedical Data 2 Types

More information

Network Protocol Analysis using Bioinformatics Algorithms

Network Protocol Analysis using Bioinformatics Algorithms Network Protocol Analysis using Bioinformatics Algorithms Marshall A. Beddoe Marshall_Beddoe@McAfee.com ABSTRACT Network protocol analysis is currently performed by hand using only intuition and a protocol

More information

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) 305 REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) (See also General Regulations) Any publication based on work approved for a higher degree should contain a reference

More information

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) 299 REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) (See also General Regulations) Any publication based on work approved for a higher degree should contain a reference

More information

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

An example of bioinformatics application on plant breeding projects in Rijk Zwaan An example of bioinformatics application on plant breeding projects in Rijk Zwaan Xiangyu Rao 17-08-2012 Introduction of RZ Rijk Zwaan is active worldwide as a vegetable breeding company that focuses on

More information

CCR Biology - Chapter 9 Practice Test - Summer 2012

CCR Biology - Chapter 9 Practice Test - Summer 2012 Name: Class: Date: CCR Biology - Chapter 9 Practice Test - Summer 2012 Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Genetic engineering is possible

More information

Usability in bioinformatics mobile applications

Usability in bioinformatics mobile applications Usability in bioinformatics mobile applications what we are working on Noura Chelbah, Sergio Díaz, Óscar Torreño, and myself Juan Falgueras App name Performs Advantajes Dissatvantajes Link The problem

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms Introduction Mate pair sequencing enables the generation of libraries with insert sizes in the range of several kilobases (Kb).

More information

Lecture Outline. Introduction to Databases. Introduction. Data Formats Sample databases How to text search databases. Shifra Ben-Dor Irit Orr

Lecture Outline. Introduction to Databases. Introduction. Data Formats Sample databases How to text search databases. Shifra Ben-Dor Irit Orr Introduction to Databases Shifra Ben-Dor Irit Orr Lecture Outline Introduction Data and Database types Database components Data Formats Sample databases How to text search databases What units of information

More information

Current Motif Discovery Tools and their Limitations

Current Motif Discovery Tools and their Limitations Current Motif Discovery Tools and their Limitations Philipp Bucher SIB / CIG Workshop 3 October 2006 Trendy Concepts and Hypotheses Transcription regulatory elements act in a context-dependent manner.

More information

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office

More information

High Performance Compu2ng Facility

High Performance Compu2ng Facility High Performance Compu2ng Facility Center for Health Informa2cs and Bioinforma2cs Accelera2ng Scien2fic Discovery and Innova2on in Biomedical Research at NYULMC through Advanced Compu2ng Efstra'os Efstathiadis,

More information

Intro to Bioinformatics

Intro to Bioinformatics Intro to Bioinformatics Marylyn D Ritchie, PhD Professor, Biochemistry and Molecular Biology Director, Center for Systems Genomics The Pennsylvania State University Sarah A Pendergrass, PhD Research Associate

More information

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office 2013 Laboratory Accreditation Program Audioconferences and Webinars Implementing Next Generation Sequencing (NGS) as a Clinical Tool in the Laboratory Nazneen Aziz, PhD Director, Molecular Medicine Transformation

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

200630 - FBIO - Fundations of Bioinformatics

200630 - FBIO - Fundations of Bioinformatics Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 200 - FME - School of Mathematics and Statistics 1004 - UB - (ENG)Universitat de Barcelona MASTER'S DEGREE IN STATISTICS AND

More information

IC05 Introduction on Networks &Visualization Nov. 2009. <mathieu.bastian@gmail.com>

IC05 Introduction on Networks &Visualization Nov. 2009. <mathieu.bastian@gmail.com> IC05 Introduction on Networks &Visualization Nov. 2009 Overview 1. Networks Introduction Networks across disciplines Properties Models 2. Visualization InfoVis Data exploration

More information

Introduction to the Mathematics of Big Data. Philippe B. Laval

Introduction to the Mathematics of Big Data. Philippe B. Laval Introduction to the Mathematics of Big Data Philippe B. Laval Fall 2015 Introduction In recent years, Big Data has become more than just a buzz word. Every major field of science, engineering, business,

More information

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Computational Challenges in Storage, Analysis and Interpretation of Next-Generation Sequencing Data Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Next Generation Sequencing

More information

Becker Muscular Dystrophy

Becker Muscular Dystrophy Muscular Dystrophy A Case Study of Positional Cloning Described by Benjamin Duchenne (1868) X-linked recessive disease causing severe muscular degeneration. 100 % penetrance X d Y affected male Frequency

More information

Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices

Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices overview Pipeline Pilot Enterprise Server Pipeline Pilot Enterprise Server (PPES) is a powerful client-server platform that streamlines the integration and analysis of the vast quantities of data flooding

More information

Master s Program in Information Systems

Master s Program in Information Systems The University of Jordan King Abdullah II School for Information Technology Department of Information Systems Master s Program in Information Systems 2006/2007 Study Plan Master Degree in Information Systems

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

Searching Nucleotide Databases

Searching Nucleotide Databases Searching Nucleotide Databases 1 When we search a nucleic acid databases, Mascot always performs a 6 frame translation on the fly. That is, 3 reading frames from the forward strand and 3 reading frames

More information

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011 Sequence Formats and Sequence Database Searches Gloria Rendon SC11 Education June, 2011 Sequence A is the primary structure of a biological molecule. It is a chain of residues that form a precise linear

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Machine Learning. 01 - Introduction

Machine Learning. 01 - Introduction Machine Learning 01 - Introduction Machine learning course One lecture (Wednesday, 9:30, 346) and one exercise (Monday, 17:15, 203). Oral exam, 20 minutes, 5 credit points. Some basic mathematical knowledge

More information

Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data

Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University HiCOMB 2014 May 19 th, Phoenix, Arizona 1 Outline

More information

Delivering the power of the world s most successful genomics platform

Delivering the power of the world s most successful genomics platform Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE

More information

GenBank: A Database of Genetic Sequence Data

GenBank: A Database of Genetic Sequence Data GenBank: A Database of Genetic Sequence Data Computer Science 105 Boston University David G. Sullivan, Ph.D. An Explosion of Scientific Data Scientists are generating ever increasing amounts of data. Relevant

More information

Using MATLAB: Bioinformatics Toolbox for Life Sciences

Using MATLAB: Bioinformatics Toolbox for Life Sciences Using MATLAB: Bioinformatics Toolbox for Life Sciences MR. SARAWUT WONGPHAYAK BIOINFORMATICS PROGRAM, SCHOOL OF BIORESOURCES AND TECHNOLOGY, AND SCHOOL OF INFORMATION TECHNOLOGY, KING MONGKUT S UNIVERSITY

More information

Overview. Overarching observations

Overview. Overarching observations Overview Genomics and Health Information Technology Systems: Exploring the Issues April 27-28, 2011, Bethesda, MD Brief Meeting Summary, prepared by Greg Feero, M.D., Ph.D. (planning committee chair) The

More information

Organization and analysis of NGS variations. Alireza Hadj Khodabakhshi Research Investigator

Organization and analysis of NGS variations. Alireza Hadj Khodabakhshi Research Investigator Organization and analysis of NGS variations. Alireza Hadj Khodabakhshi Research Investigator Why is the NGS data processing a big challenge? Computation cannot keep up with the Biology. Source: illumina

More information

Learning is a very general term denoting the way in which agents:

Learning is a very general term denoting the way in which agents: What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

More information

Importance of Statistics in creating high dimensional data

Importance of Statistics in creating high dimensional data Importance of Statistics in creating high dimensional data Hemant K. Tiwari, PhD Section on Statistical Genetics Department of Biostatistics University of Alabama at Birmingham History of Genomic Data

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information