Probabilistic and Statistical Methods in Bioinformatics

Size: px
Start display at page:

Download "Probabilistic and Statistical Methods in Bioinformatics"

Transcription

1 Probabilistic and Statistical Methods in Bioinformatics Course Director: Rovshan Sadygov, Ph.D. (Dept. of Biochemistry & Molecular Biology) Schedule: 15 weeks course 2 classes/week (total of 30 classes) 1.0 hour/class format will include lectures, exercises, programming sessions in R Target students: graduate students medical students researches who work with/process large scale, "omics" datasets. Background Recommendation: basic linear algebra - properties and operations on matrices, vectors basic probability, binomial, Poisson, hypergeometric, normal distributions familiarity with coding concepts in one of programming language Overview: Applications of high-throughput technologies to biological samples produce large amounts of data characterizing the current state of samples. Statistical bioinformatics plays important role in planning experiments for testing specific hypotheses, analyzing the results, evaluating statistical significance of the conclusions and in accepting/modifying the original hypothesis. Examples of data processing will be provided from proteomics experiments and standard databases available in R. Learning Objectives: learn basics of statistical bioinformatics apply statistical methods to analyze large scale biological datasets/appreciate random effects in biological observations learn how to develop statistical models to describe biological evaluate statistical models Syllabus Part 1 Introduction to Probability Class 1: Definition of Probability

2 Experiments and Events Naive definition of probability, properties of probability. Four basic samples, factorial, power, binomial, Bose-Einstein. Inclusion-exclusion theorem, matching problem. Independence of events, conditional probability, Simpson's paradox. Bayes' Rule. Class 2: Random Variables and Their Distributions, Expectations Random variables, discrete vs. continuous, PMF, PDF, CDF Expectation of a Random Variable, indicator random variable Expectations of Bernoulli, Binomial, Geometric random variables Hypergeometric Distribution, proof of Vardemonde identity Class 3: Discrete Random Variables Gambler's ruin problem, difference equation. Linearity of the Expectation. Introduction to Conditional Probability, Polya's urn, the Hardy Law Functions of Random Variables Moment Generating Functions Class 4: Continuous Random Variables PDF, Normal, Cauchy, Exponential, Beta, Gamma distributions. Joint, Conditional, Marginal Distributions. Covariance and Correlation Conditional Expectation Class 5: Laws of Large Numbers The Central Limit Theorem The Law of Large Numbers Using R to Compute Classical Distributions The equalities and inequalities of probability Class 6: Markov Chains Transition matrix, transition graph stationary distribution, irreducibility,

3 recurrence, transient states, persistence, aperiodicity, ergodicity. Part II Classical Statistics Class 7: Estimation and Inference Statistical Inference Prior and Posterior Distributions Bayes Estimators Maximum Likelihood Estimators Score Statistic, Information matrix, Entropy Sufficient Statistics Class 8: Sampling Distributions of Estimators Sampling Distribution of a Statistic The Chi-Square, Wishart's and t Distributions Confidence Intervals Comparing the means of Two Normal Distributions The F distribution Class 9: Nonparametric Methods, Categorical Data Test of goodness-of-fit. Contingency Tables Sign and Rank Tests Kolmogorov-Smirnov Test Class 10: Brief Introduction into Experimental Design Fisher's exact test Associate vs. causative experimental design Causative: Polio vaccine example Associative case study: US vs. Kirsten Gilbert Class 11-12: Linear Statistical Models The Method of Least Squares, Regression Statistical and Bayesian Inferences in Linear Regression The General Linear Model and Multiple Regression

4 Analysis of Variance Generalized Linear Models Class 13: Review of Linear Algebra Fundamental Theorem of Linear Algebra Spectral Theorem, QR decomposition Singular Value Decomposition Class 14: Algorithm Design Dynamic Programming Linear Programming Metropolis-Hastings algorithm Class 15: Using R for Statistical Modeling Introduction to R programming Matrix Algebra in R Statistical modeling in R Generalized Linear Models, Mixed Effects. Class 16: Midterm Exam Part 3: Machine Learning Class 16-17: Supervised Learning Classification/Learning Theory Supervised Learning Generative Models: a)non-probabilistic, Fisher s Discriminant analysis; b) Probabilistic, Linear and Quadratic Discriminant Analysis, Naïve Bayesian model; Discriminative models, Logistic Regression model.

5 Class 18-19: Support Vector Machines Convex optimization Langrangian, duality. Jensen's inequality Vapnik-Chervonenkis Dimension Support Vector Machines Class 20-21: Hidden Markovian Models Probability of Sequence Occurence Backward algorithm, forward algorithm Viterbi algorithm, Baum-Welch algorithm. Class 22: Unsupervised Learning Clustering k-means clustering Principle Component Analysis Class 23: Expectation Maximization Expectation Maximization M-step E-step Part 4: Bioinformatics of Proteomics and Genomics Class 24-26: Applications in Mass Spectrometry/Proteomics Probability models to matching sequences to mass spectra Generating protein identifications, SVM, EM, multinomial models Multiple Hypothesis testing corrections for p-values Spectral counting for protein quantification, Generalized Linear Models

6 Class 27-29: Applications in Genomics Data Structures, Hashing, Lists Genome Sequencing Sequence alignment Class 30: Final Exam Recommended readings: Classical Probability: a) An Introduction to Probability Theory and Its Applications, 3 rd Edition, W. Feller, b) Introduction to Probability Models, S. Ross, 7 th Edition; Classical Statistics: Probability and Statistics, 3 rd Edition, M. H. DeGroot, M. J. Schervish, Linear Algebra: Linear Algebra and Its Applications, 3 rd Edition, G. Strang, Machine Learning: a) The Elements of Statistical Learning, 2 nd Edition, T. Hastie, R. Tibshirani, J. Friedman; b) Pattern Recognition and Machine Learning, C. M. Bishop, Applications in Bioinformatics: Statistical Methods in Bioinformatics, W. J. Ewens and G. R. Grand, b) Bioinformatics, 2 nd Edition, P. Baldi, S. Brunak; For Learning R: The R Book, M. J. Crawley. This course is an introduction to the ideas and tools of probability calculus, statistical methods and machine learning techniques for bioinformatics processing of large scale biological datasets. It consists of four parts: basic probability calculus, statistical models, machine learning and applications in proteomics and genomics. We will also provide the necessary introduction into linear algebra, R programming and algorithmic design techniques. The course build concepts for machine learning and statistical models from probabilistic and linear algebraic bases. Sample spaces, conditioning, Bayes Rule, random variables, distributions, expectation, and Markov chains will be covered in the 1rst part of the course. Interesting examples such as the Matching problem, variations of Birthday Problem, Gambler s Ruin, Simpson's paradox, St. Petersburg Paradox, and Markov Chain examples are discussed in the context of probability modeling. Linear algebraic (matrix based) view of modeling for linear least squares, support vector machines and other statistical tools will be provided. The discussed topics in probability calculus, statistics and machine learning are later applied in the example of bioinformatics data processing. Specific examples of applications are in mass spectrometry based proteomics, genomic sequencing and sequence alignments. Future opportunities and current limitations will be critically addressed. In addition to the regular lecture sessions, supplementary sections may be scheduled to address issues related to R.

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Syllabus for the TEMPUS SEE PhD Course (Podgorica, April 4 29, 2011) Franz Kappel 1 Institute for Mathematics and Scientific Computing University of Graz Žaneta Popeska 2 Faculty

More information

MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS. Biostatistics

MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS. Biostatistics MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS title- course code: Program name: Contingency Tables and Log Linear Models Level Biostatistics Hours/week Ther. Recite. Lab. Others Total Master of Sci.

More information

STAT 360 Probability and Statistics. Fall 2012

STAT 360 Probability and Statistics. Fall 2012 STAT 360 Probability and Statistics Fall 2012 1) General information: Crosslisted course offered as STAT 360, MATH 360 Semester: Fall 2012, Aug 20--Dec 07 Course name: Probability and Statistics Number

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

STATISTICS COURSES UNDERGRADUATE CERTIFICATE FACULTY. Explanation of Course Numbers. Bachelor's program. Master's programs.

STATISTICS COURSES UNDERGRADUATE CERTIFICATE FACULTY. Explanation of Course Numbers. Bachelor's program. Master's programs. STATISTICS Statistics is one of the natural, mathematical, and biomedical sciences programs in the Columbian College of Arts and Sciences. The curriculum emphasizes the important role of statistics as

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE MAT 119 STATISTICS AND ELEMENTARY ALGEBRA 5 Lecture Hours, 2 Lab Hours, 3 Credits Pre-

More information

2014-2015 The Master s Degree with Thesis Course Descriptions in Industrial Engineering

2014-2015 The Master s Degree with Thesis Course Descriptions in Industrial Engineering 2014-2015 The Master s Degree with Thesis Course Descriptions in Industrial Engineering Compulsory Courses IENG540 Optimization Models and Algorithms In the course important deterministic optimization

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence Government of Russian Federation Federal State Autonomous Educational Institution of High Professional Education National Research University «Higher School of Economics» Faculty of Computer Science School

More information

PROBABILITY AND STATISTICS. Ma 527. 1. To teach a knowledge of combinatorial reasoning.

PROBABILITY AND STATISTICS. Ma 527. 1. To teach a knowledge of combinatorial reasoning. PROBABILITY AND STATISTICS Ma 527 Course Description Prefaced by a study of the foundations of probability and statistics, this course is an extension of the elements of probability and statistics introduced

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

How To Understand The Theory Of Probability

How To Understand The Theory Of Probability Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

More information

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html 10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Unsupervised and supervised dimension reduction: Algorithms and connections

Unsupervised and supervised dimension reduction: Algorithms and connections Unsupervised and supervised dimension reduction: Algorithms and connections Jieping Ye Department of Computer Science and Engineering Evolutionary Functional Genomics Center The Biodesign Institute Arizona

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

LECTURE 16. Readings: Section 5.1. Lecture outline. Random processes Definition of the Bernoulli process Basic properties of the Bernoulli process

LECTURE 16. Readings: Section 5.1. Lecture outline. Random processes Definition of the Bernoulli process Basic properties of the Bernoulli process LECTURE 16 Readings: Section 5.1 Lecture outline Random processes Definition of the Bernoulli process Basic properties of the Bernoulli process Number of successes Distribution of interarrival times The

More information

Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

More information

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing! MATH BOOK OF PROBLEMS SERIES New from Pearson Custom Publishing! The Math Book of Problems Series is a database of math problems for the following courses: Pre-algebra Algebra Pre-calculus Calculus Statistics

More information

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015 Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015 Lecture: MWF: 1:00-1:50pm, GEOLOGY 4645 Instructor: Mihai

More information

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

CSCI-599 DATA MINING AND STATISTICAL INFERENCE CSCI-599 DATA MINING AND STATISTICAL INFERENCE Course Information Course ID and title: CSCI-599 Data Mining and Statistical Inference Semester and day/time/location: Spring 2013/ Mon/Wed 3:30-4:50pm Instructor:

More information

HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

More information

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013 Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

More information

COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences. 2015-2016 Academic Year Qualification.

COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences. 2015-2016 Academic Year Qualification. COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences 2015-2016 Academic Year Qualification. Master's Degree 1. Description of the subject Subject name: Biomedical Data

More information

QUALITY ENGINEERING PROGRAM

QUALITY ENGINEERING PROGRAM QUALITY ENGINEERING PROGRAM Production engineering deals with the practical engineering problems that occur in manufacturing planning, manufacturing processes and in the integration of the facilities and

More information

Precalculus REVERSE CORRELATION. Content Expectations for. Precalculus. Michigan CONTENT EXPECTATIONS FOR PRECALCULUS CHAPTER/LESSON TITLES

Precalculus REVERSE CORRELATION. Content Expectations for. Precalculus. Michigan CONTENT EXPECTATIONS FOR PRECALCULUS CHAPTER/LESSON TITLES Content Expectations for Precalculus Michigan Precalculus 2011 REVERSE CORRELATION CHAPTER/LESSON TITLES Chapter 0 Preparing for Precalculus 0-1 Sets There are no state-mandated Precalculus 0-2 Operations

More information

Diablo Valley College Catalog 2014-2015

Diablo Valley College Catalog 2014-2015 Mathematics MATH Michael Norris, Interim Dean Math and Computer Science Division Math Building, Room 267 Possible career opportunities Mathematicians work in a variety of fields, among them statistics,

More information

International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics

International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics Lecturer: Mikhail Zhitlukhin. 1. Course description Probability Theory and Introductory Statistics

More information

203.4770: Introduction to Machine Learning Dr. Rita Osadchy

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

67 204 Mathematics for Business Analysis I Fall 2007

67 204 Mathematics for Business Analysis I Fall 2007 67 204 Mathematics for Business Analysis I Fall 2007 Instructor Asõkā Rāmanāyake Office: Swart 223 Office Hours: Monday 12:40 1:40 Wednesday 8:00 9:00 Thursday 9:10 11:20 If you cannot make my office hours,

More information

Statistics Graduate Programs

Statistics Graduate Programs Statistics Graduate Programs Kathleen Maurer, Coordinator of Graduate Studies 146 Middlebush Columbia, MO 65211 573-882-6376 http://www.stat.missouri.edu/ About Statistics The statistics department faculty

More information

How To Understand Multivariate Models

How To Understand Multivariate Models Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models

More information

THE MULTIVARIATE ANALYSIS RESEARCH GROUP. Carles M Cuadras Departament d Estadística Facultat de Biologia Universitat de Barcelona

THE MULTIVARIATE ANALYSIS RESEARCH GROUP. Carles M Cuadras Departament d Estadística Facultat de Biologia Universitat de Barcelona THE MULTIVARIATE ANALYSIS RESEARCH GROUP Carles M Cuadras Departament d Estadística Facultat de Biologia Universitat de Barcelona The set of statistical methods known as Multivariate Analysis covers a

More information

BookTOC.txt. 1. Functions, Graphs, and Models. Algebra Toolbox. Sets. The Real Numbers. Inequalities and Intervals on the Real Number Line

BookTOC.txt. 1. Functions, Graphs, and Models. Algebra Toolbox. Sets. The Real Numbers. Inequalities and Intervals on the Real Number Line College Algebra in Context with Applications for the Managerial, Life, and Social Sciences, 3rd Edition Ronald J. Harshbarger, University of South Carolina - Beaufort Lisa S. Yocco, Georgia Southern University

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Alabama Department of Postsecondary Education

Alabama Department of Postsecondary Education Date Adopted 1998 Dates reviewed 2007, 2011, 2013 Dates revised 2004, 2008, 2011, 2013, 2015 Alabama Department of Postsecondary Education Representing Alabama s Public Two-Year College System Jefferson

More information

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of

More information

Big Data Analytics and Optimization

Big Data Analytics and Optimization Big Data Analytics and Optimization C e r t i f i c a t e P r o g r a m i n E n g i n e e r i n g E x c e l l e n c e C e r t i f i c a t e P r o g r a m s i n A c c e l e r a t e d E n g i n e e r i n

More information

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on Pa8ern Recogni6on and Machine Learning Chapter 4: Linear Models for Classifica6on Represen'ng the target values for classifica'on If there are only two classes, we typically use a single real valued output

More information

Multivariate Statistical Inference and Applications

Multivariate Statistical Inference and Applications Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim

More information

Operations Research and Financial Engineering. Courses

Operations Research and Financial Engineering. Courses Operations Research and Financial Engineering Courses ORF 504/FIN 504 Financial Econometrics Professor Jianqing Fan This course covers econometric and statistical methods as applied to finance. Topics

More information

Description. Textbook. Grading. Objective

Description. Textbook. Grading. Objective EC151.02 Statistics for Business and Economics (MWF 8:00-8:50) Instructor: Chiu Yu Ko Office: 462D, 21 Campenalla Way Phone: 2-6093 Email: kocb@bc.edu Office Hours: by appointment Description This course

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

MATH. ALGEBRA I HONORS 9 th Grade 12003200 ALGEBRA I HONORS

MATH. ALGEBRA I HONORS 9 th Grade 12003200 ALGEBRA I HONORS * Students who scored a Level 3 or above on the Florida Assessment Test Math Florida Standards (FSA-MAFS) are strongly encouraged to make Advanced Placement and/or dual enrollment courses their first choices

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

IEOR 6711: Stochastic Models, I Fall 2012, Professor Whitt, Final Exam SOLUTIONS

IEOR 6711: Stochastic Models, I Fall 2012, Professor Whitt, Final Exam SOLUTIONS IEOR 6711: Stochastic Models, I Fall 2012, Professor Whitt, Final Exam SOLUTIONS There are four questions, each with several parts. 1. Customers Coming to an Automatic Teller Machine (ATM) (30 points)

More information

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate

More information

MBAD 5141 Quantitative Analysis in Business NEW. Syllabus

MBAD 5141 Quantitative Analysis in Business NEW. Syllabus Syllabus MBAD 5141 Quantitative Analysis in Business Oper3200 and MBAD 5141 Are the Same Course Instructor: Dr.W.Douglas Douglas Cooper Office: 275A Friday Bldg. Phone: 704.687.7686 EMail: wdcooper@uncc.edu

More information

AP Statistics: Syllabus 1

AP Statistics: Syllabus 1 AP Statistics: Syllabus 1 Scoring Components SC1 The course provides instruction in exploring data. 4 SC2 The course provides instruction in sampling. 5 SC3 The course provides instruction in experimentation.

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Mathematics INDIVIDUAL PROGRAM INFORMATION 2014 2015. 866.Macomb1 (866.622.6621) www.macomb.edu

Mathematics INDIVIDUAL PROGRAM INFORMATION 2014 2015. 866.Macomb1 (866.622.6621) www.macomb.edu Mathematics INDIVIDUAL PROGRAM INFORMATION 2014 2015 866.Macomb1 (866.622.6621) www.macomb.edu Mathematics PROGRAM OPTIONS CREDENTIAL TITLE CREDIT HOURS REQUIRED NOTES Associate of Arts Mathematics 62

More information

Lecture 9: Introduction to Pattern Analysis

Lecture 9: Introduction to Pattern Analysis Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16 Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems

More information

Statistics W4240: Data Mining Columbia University Spring, 2014

Statistics W4240: Data Mining Columbia University Spring, 2014 Statistics W4240: Data Mining Columbia University Spring, 2014 Version: January 30, 2014. The syllabus is subject to change, so look for the version with the most recent date. Course Description Massive

More information

UNDERGRADUATE DEGREE DETAILS : BACHELOR OF SCIENCE WITH

UNDERGRADUATE DEGREE DETAILS : BACHELOR OF SCIENCE WITH QATAR UNIVERSITY COLLEGE OF ARTS & SCIENCES Department of Mathematics, Statistics, & Physics UNDERGRADUATE DEGREE DETAILS : Program Requirements and Descriptions BACHELOR OF SCIENCE WITH A MAJOR IN STATISTICS

More information

Dimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU

Dimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU 1 Dimension Reduction Wei-Ta Chu 2014/10/22 2 1.1 Principal Component Analysis (PCA) Widely used in dimensionality reduction, lossy data compression, feature extraction, and data visualization Also known

More information

PS 271B: Quantitative Methods II. Lecture Notes

PS 271B: Quantitative Methods II. Lecture Notes PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

More information

Statistics in Applications III. Distribution Theory and Inference

Statistics in Applications III. Distribution Theory and Inference 2.2 Master of Science Degrees The Department of Statistics at FSU offers three different options for an MS degree. 1. The applied statistics degree is for a student preparing for a career as an applied

More information

Machine Learning. 01 - Introduction

Machine Learning. 01 - Introduction Machine Learning 01 - Introduction Machine learning course One lecture (Wednesday, 9:30, 346) and one exercise (Monday, 17:15, 203). Oral exam, 20 minutes, 5 credit points. Some basic mathematical knowledge

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

Lecture Notes 1. Brief Review of Basic Probability

Lecture Notes 1. Brief Review of Basic Probability Probability Review Lecture Notes Brief Review of Basic Probability I assume you know basic probability. Chapters -3 are a review. I will assume you have read and understood Chapters -3. Here is a very

More information

Graduate Certificate in Systems Engineering

Graduate Certificate in Systems Engineering Graduate Certificate in Systems Engineering Systems Engineering is a multi-disciplinary field that aims at integrating the engineering and management functions in the development and creation of a product,

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE: COS-STAT-747 Principles of Statistical Data Mining

More information

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document

More information

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant Statistical Analysis NBAF-B Metabolomics Masterclass Mark Viant 1. Introduction 2. Univariate analysis Overview of lecture 3. Unsupervised multivariate analysis Principal components analysis (PCA) Interpreting

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

More information

Mathematics (MAT) MAT 061 Basic Euclidean Geometry 3 Hours. MAT 051 Pre-Algebra 4 Hours

Mathematics (MAT) MAT 061 Basic Euclidean Geometry 3 Hours. MAT 051 Pre-Algebra 4 Hours MAT 051 Pre-Algebra Mathematics (MAT) MAT 051 is designed as a review of the basic operations of arithmetic and an introduction to algebra. The student must earn a grade of C or in order to enroll in MAT

More information

Probability and statistics; Rehearsal for pattern recognition

Probability and statistics; Rehearsal for pattern recognition Probability and statistics; Rehearsal for pattern recognition Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

Statistics Education of Practicing Engineers

Statistics Education of Practicing Engineers Statistics Education of Practicing Engineers Jorge Luis Romeu, Ph.D. Research Professor, Syracuse U. http://myprofile.cos.com/romeu jlromeu@syr.edu Conference in Honor of Dr. Dudewicz July 2008 Outline

More information

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Master of Arts in Mathematics

Master of Arts in Mathematics Master of Arts in Mathematics Administrative Unit The program is administered by the Office of Graduate Studies and Research through the Faculty of Mathematics and Mathematics Education, Department of

More information