Advanced Functions and Modules

Similar documents
Python. Python. 1 Python. M.Ulvrova, L.Pouilloux (ENS LYON) Informatique L3 Automne / 25

CIS 192: Lecture 13 Scientific Computing and Unit Testing

ANSA and μeta as a CAE Software Development Platform

Computational Mathematics with Python

Assignment 2: Option Pricing and the Black-Scholes formula The University of British Columbia Science One CS Instructor: Michael Gelbart

Exercise 0. Although Python(x,y) comes already with a great variety of scientic Python packages, we might have to install additional dependencies:

Computational Mathematics with Python

Intro to scientific programming (with Python) Pietro Berkes, Brandeis University

Computational Mathematics with Python

MATH 140 Lab 4: Probability and the Standard Normal Distribution

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

ESCI 386 Scientific Programming, Analysis and Visualization with Python. Lesson 5 Program Control

Introduction to Python

Gamma Distribution Fitting

Didacticiel - Études de cas

Part VI. Scientific Computing in Python

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS

Analytic Modeling in Python

Performance Monitoring using Pecos Release 0.1

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Scientific Programming in Python

Outline. hardware components programming environments. installing Python executing Python code. decimal and binary notations running Sage

CURVE FITTING LEAST SQUARES APPROXIMATION

MULTIPLE REGRESSION EXAMPLE

Simple linear regression

Question 2: How do you solve a matrix equation using the matrix inverse?

Survey Analysis: Options for Missing Data

SUMAN DUVVURU STAT 567 PROJECT REPORT

(!' ) "' # "*# "!(!' +,

LAYOUT OF THE KEYBOARD

Laboratory 3 Type I, II Error, Sample Size, Statistical Power

Exercises on using R for Statistics and Hypothesis Testing Dr. Wenjia Wang

SAS: A Mini-Manual for ECO 351 by Andrew C. Brod

Using R for Linear Regression

ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking

CRASH COURSE PYTHON. Het begint met een idee

SAS Software to Fit the Generalized Linear Model

STATISTICA Formula Guide: Logistic Regression. Table of Contents

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Simulation Tools. Python for MATLAB Users I. Claus Führer. Automn Claus Führer Simulation Tools Automn / 65

Tellurium and libroadrunner in a Nutshell

Additional sources Compilation of sources:

What does the number m in y = mx + b measure? To find out, suppose (x 1, y 1 ) and (x 2, y 2 ) are two points on the graph of y = mx + b.

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

5 Systems of Equations

Part 2: Analysis of Relationship Between Two Variables

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Financial Econometrics MFE MATLAB Introduction. Kevin Sheppard University of Oxford

Chapter 3 RANDOM VARIATE GENERATION

Advanced analytics at your hands

Reading Raster Data with GDAL

The Analysis of Variance ANOVA

AP Statistics Solutions to Packet 2

EQUATIONS and INEQUALITIES

Python for Data Analysis and Visualiza4on. Fang (Cherry) Liu, Ph.D PACE Gatech July 2013

Scientific Programming with Python. Randy M. Wadkins, Ph.D. Asst. Prof. of Chemistry & Biochem.

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Beginner s Matlab Tutorial

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Introduction to Python for Text Analysis

VI. Introduction to Logistic Regression

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Programming Languages & Tools

Predicting outcome of soccer matches using machine learning

Two Correlated Proportions (McNemar Test)

Chemical and Biological Engineering Calculations using Python 3. Jeffrey J. Heys

The Whys, Hows and Whats of the Noise Power Spectrum. Helge Pettersen, Haukeland University Hospital, NO

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

Math Quizzes Winter 2009

Algebra I Vocabulary Cards

from ztf_summerschool import source_lightcurve, barycenter_times %matplotlib inline

Evaluating Trading Systems By John Ehlers and Ric Way

Mathematics within the Psychology Curriculum

Improved metrics collection and correlation for the CERN cloud storage test framework

ALGEBRA. sequence, term, nth term, consecutive, rule, relationship, generate, predict, continue increase, decrease finite, infinite

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Linear Algebra Review. Vectors

Mathematics Online Instructional Materials Correlation to the 2009 Algebra I Standards of Learning and Curriculum Framework

Python for Scientific Computing.

Multiple Linear Regression

The power of IBM SPSS Statistics and R together

Big Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic

How To Develop Software

Euler s Method and Functions

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Ordinal Regression. Chapter

Statistical Models in R

Modeling with Python

NEW YORK CITY COLLEGE OF TECHNOLOGY The City University of New York

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

How To Understand And Solve A Linear Programming Problem

AUTOMATIC EVALUATION OF COMPUTER PROGRAMS USING MOODLE S VIRTUAL PROGRAMMING LAB (VPL) PLUG- IN

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Big Ideas in Mathematics

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Analysis Programs DPDAK and DAWN

5: Magnitude 6: Convert to Polar 7: Convert to Rectangular

SCIENTIFIC COMPUTING AND PROGRAMMING IN THE CLOUD USING OPEN SOURCE PLATFORMS: AN ILLUSTRATION USING WEIGHTED VOTING SYSTEMS

Transcription:

Advanced Functions and Modules CB2-101 Introduction to Scientific Computing November 19, 2015 Emidio Capriotti http://biofold.org/ Institute for Mathematical Modeling of Biological Systems Department of Biology

Errors and Exceptions In python errors can be divided in Syntax Errors and Exceptions. Syntax Errors are generate by the interpreter when it is not able to correctly parse the python script. Although the code is syntactically correct, errors can happen when the script is executed. These errors, detected during execution, are called Exceptions and are not unconditionally fatal. Python has Built-in Exceptions that allow to hand selected exceptions in the program.

Handeling Exceptions Exceptions are used when particular values the variables assume can generate errors. To handle Exception python implement the try/except statements Basic syntax of try/except statements try: do something except: do something else Example with ZeroDivisionError >>> def inverse(x):. return 1/float(x). >>> inverse(2) 0.5 >>> inverse(0) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 2, in inverse ZeroDivisionError: float division by zero Else can be used after verifying possible exceptions.

sys and os Important modules for interacting with operating systems and to modify settings and exe commands are sys and os. Important methods and variables in sys module >>> import sys >>> print sys.path ['', /Library/Python/2.7/site-packages/seqmagick-0.6.0_dev-py2.7.egg',. >>> sys.stderr.write( ERROR: File not found\n. ) ERROR: File not found. $> python -c "import sys; print sys.argv" arg1 arg2 arg3 Important methods and variables in sys module >>> import os >>> print os.environ {'LOGNAME': 'emidio', 'USER': emidio,. >>> print os.getpid() 96708 >>> print os.getcwd() /home/cb2user >>> print os.path.isdir( /home/cb2user ) True >>> print os.path.isfile( /home/cb2user/.bashrc ) True

system and subprocess The subprocess modules allows to run programs on your machine. Run subprocess in a python script >>> import os >>> os.system( pwd ) /home/cb2user 0 >>> from commands import getstatusoutput >>> getstatusoutput( pwd ) (0, '/home/cb2user') >>> getstatusoutput( ppwd ) (32512, 'sh: ppwd: command not found ) >>> import subprocess >>> proc = subprocess.popen([ cmd, flag1, arg1, ] \ stdout=subprocess.pipe, stderr=subprocess.pipe) >>> stdout, stderr = proc.communicate()

sets Comparison of different sets is a common problem in different scientific fields. Managing sets in python using sets >>> set1=set([1,2,3,4,5]) >>> set2=set([2,3,4,7,8]) >>> set1.intersection(set2) set([2, 3, 4, 5]) >>> list(set1.intersection(set2)) [2, 3, 4, 5] >>> list(set1.union(set2)) [1, 2, 3, 4, 5, 7, 8] >>> list(set1.difference(set2)) [1] >>> list(set1.symmetric_difference(set2)) [8, 1, 7] >>> set1.add(6) set([1,2, 3, 4, 5, 6]) >>> new_set=set1.copy() >>> print new_set set([1,2, 3, 4, 5, 6])

itertools Itertools is a module to optimize the generation of combinations of elements in a set. How calculate possible combinations of elements? >>> import itertools >>> list(itertools.product([ A, B, C ],repeat=2)) [('A', 'A'), ('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'B'), ('B', 'C'), ('C', 'A'), ('C', 'B'), ('C', 'C')] >>> list(itertools.permutations([ A, B, C ], 2)) [('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')] >>> list(itertools.combinations([ A, B, C ], 2)) [('A', 'B'), ('A', 'C'), ('B', C')] >>> list(itertools.combinations_with_replacement([ A, B, C ], 2)) [('A', 'A'), ('A', 'B'), ('A', 'C'), ('B', 'B'), ('B', 'C'), ('C', 'C')]

random To generate random variables and shuffle dataset python has the a random module. Random generates random number and shuffle sets >>> import random >>> random.random() >>> 0.254942802653793 >>> random.randint(1,100) >>> 25 >>> v=[1,2,3,4,5,6,7,8,9,10] >>> random.shuffle(v) >>> print v [5, 1, 3, 2, 7, 10, 6, 9, 4, 8] >>> random.gauss(20, 1) 18.685381502066623

Exercise 1 1. Write a script that generates n values that are normally distributed around the average value x with a standard deviation std. 2. Select average and sigma values on your choice. Using the first script generate 4 files with 10, 100, 1000 and 10000 numbers. Finally write a script that takes in input the previous file and calculates the average and the sigma values. 3. What are the differences from the chosen and calculates average and the sigma values?

regular expression In python has been implemented the re module which provides regular expression matching operations. The basic procedure for searching consists in the definition of the regular expression, the compilation process and the search. Simple regular expression search >>> import re >>> pattern=re.compile( \spf[0-9]{5}\s ) #define re and compile >>> text= File: PF00234\t\tPF00001\n\tProtein 1 PF00051\t PF22222' >>> re.findall(pattern,text) [' PF00234\t', '\tpf00001\n', ' PF00051\t', ' PF22222 ] >>> match=list(re.finditer(pattern,text)) >>> print match [<_sre.sre_match object at 0x105710920>,..] >>> def get_limits(x):... return x.start(), x.end() >>> maps(get_limits, match) [(5, 14), (14, 23), (33, 42), (42, 51)]

Exercise 2 In Pfam database a Protein kinase domain is indicated by code PF00069. In general proteins can contain multiple domains. Write a script or run a linux command to: 1. selects all the proteins that contains at least one PF00069 domain. 2. lists all the domains co-occurring in the same protein with PF00069 and calculates the frequency of co-occurrence without considering possible repetitions.

numpy (I) NumPy is the fundamental package for scientific computing with Python. It is a optimized tools for: N-dimensional array object sophisticated mathematical functions Linear algebra, Fourier transform, and random number capabilities Basic numpy functions >>> import numpy as np >>> np.array([1.,2.,3.,4.,5.,6.]) array([1., 2., 3., 4., 5., 6.]) >>> np.zeros([2,2]) array([[ 0., 0.], [ 0., 0.]]) >>> 10*np.log(np.array([1.,2.,3.,4.,5.,6.])) array([ 0., 6.9314718, 10.9861229, ]) >>> np.array([[1., 2.], [3., 4]])*np.array([[2., 2.], [2., 2]]) array([[ 2., 4.], [ 6., 8.]]) >>> np.dot(np.array([[1., 2.], [3., 4]])*np.array([[2., 2.], [2., 2]])) array([[ 6., 6.], [ 14., 14.]])

numpy (II) Numpy also includes statistical functions and linear algebra module Other numpy functions >>> import numpy as np >>> v=np.array([1.,2.,3.,4.,5.,6.]) >>> np.mean(v) 3.5 >>> np.std(v) 1.707825127659933 >>> import numpy.linalg as la >>> mat=np.array([[ 4., 12.],[ 10., 26.]]) >>> la.det(a) -15.999999999999998

The linregress function Import linregress from spicy.stats and calculate calculate the fitting curve >>> from scipy.stats import linregress >>> import numpy as np >>> x = np.array([1,2,3,4,5,6]) >>> y = np.array([1.1,2.2,2.9,3.98,5.2,6.1]) >>> reg=linregress(x,y) >>> print reg LinregressResult(slope=0.98285714285714287, intercept=0.060000000000000053, rvalue=0.99838143945702995, pvalue=3.9274872444222332e-06, stderr=0.027994168488950467) what happen if y = [3,0,5,1,7,4]. is this a better fitting? why? Y Y X X

Use matplotlib to plot Import matplotlib and plot the points and fitting curve. >>> import matplotlib.pyplot as plt >>> yp=reg[0]*x+reg[1] >>> plt.plot(x,y, o,x,yp, - ) >>> plt.show() Exercise: Write a python script that reads a file containing two columns of data (x,y) and calculate the linear regression curve and plots both the points an the regression curve. For this exercise download the data using the command wget from http://biofold.org/courses/docs/data_hw.txt

Contingency table In statistics, a contingency table is a type of table in a matrix format that displays the frequency distribution of the variables. They are heavily used in survey research, business intelligence, engineering and scientific research. Function 1 Function 2 Row Total Reference 7 6 13 Observation 1 12 13 Column Total 8 18 26

Fisher s test in python The fisher_exact function is contained in the spicy.stats module and takes in input a contingency matrix. The function returns odd ratio and p-value. >>> from scipy.stats import fisher_exact >>> import numpy as np >>> cm=np.array[[7,6],[1,12]] >>> ft=fisher_exact(cm) >>> print ft (0.071428571428571425, 0.030205949656750501) Exercise: Write a python script that reads two file containing a columns with alleles carried by each individual. Use Fisher s exact test verify if an allele is over represented in one of the populations. For this exercise download the data using the command wget from http://biofold.org/courses/docs/pop1_allele.txt http://biofold.org/courses/docs/pop2_allele.txt