Intro to scientific programming (with Python) Pietro Berkes, Brandeis University

Similar documents

CIS 192: Lecture 13 Scientific Computing and Unit Testing

Writing robust scientific code with testing (and Python) Pietro Berkes, Enthought UK

Scientific Programming in Python

Profiling, debugging and testing with Python. Jonathan Bollback, Georg Rieckh and Jose Guzman

Exercise 0. Although Python(x,y) comes already with a great variety of scientic Python packages, we might have to install additional dependencies:

Scientific Programming, Analysis, and Visualization with Python. Mteor 227 Fall 2015

CRASH COURSE PYTHON. Het begint met een idee

Advanced Topics: Biopython

Test-Driven Development

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Python for Scientific Computing.

Python Programming: An Introduction to Computer Science

Survey of Unit-Testing Frameworks. by John Szakmeister and Tim Woods

We will learn the Python programming language. Why? Because it is easy to learn and many people write programs in Python so we can share.

Computing Concepts with Java Essentials

Python. Python. 1 Python. M.Ulvrova, L.Pouilloux (ENS LYON) Informatique L3 Automne / 25

An introduction to Python Programming for Research

Main Bullet #1 Main Bullet #2 Main Bullet #3

Computational Mathematics with Python

Computational Mathematics with Python

Introduction to Python

Simulation Tools. Python for MATLAB Users I. Claus Führer. Automn Claus Führer Simulation Tools Automn / 65

Java course - IAG0040. Unit testing & Agile Software Development

HPC Wales Skills Academy Course Catalogue 2015

Functional and LoadTest Strategies

Part VI. Scientific Computing in Python

Instructional Design Framework CSE: Unit 1 Lesson 1

Some programming experience in a high-level structured programming language is recommended.

6.170 Tutorial 3 - Ruby Basics

Analysis Programs DPDAK and DAWN

Python programming guide for Earth Scientists. Maarten J. Waterloo and Vincent E.A. Post

Assignment 2: Option Pricing and the Black-Scholes formula The University of British Columbia Science One CS Instructor: Michael Gelbart

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Python Loops and String Manipulation

Chemical and Biological Engineering Calculations using Python 3. Jeffrey J. Heys

Introduction to Python

Software Testing with Python

Programming Using Python

ANSA and μeta as a CAE Software Development Platform

CUDAMat: a CUDA-based matrix class for Python

Wrestling with Python Unit testing. Warren Viant

How To Develop Software

Self-review 9.3 What is PyUnit? PyUnit is the unit testing framework that comes as standard issue with the Python system.

Python programming Testing

Python Programming: An Introduction to Computer Science

Computational Mathematics with Python

An Introduction to APGL

Modeling with Python

WAYNESBORO AREA SCHOOL DISTRICT CURRICULUM INTRODUCTION TO COMPUTER SCIENCE (June 2014)

Outline. hardware components programming environments. installing Python executing Python code. decimal and binary notations running Sage

Introduction to Python

William E. Hart Carl Laird Jean-Paul Watson David L. Woodruff. Pyomo Optimization. Modeling in Python. ^ Springer

Programming Languages & Tools

CME 193: Introduction to Scientific Python Lecture 8: Unit testing, more modules, wrap up

SOFTWARE TESTING TRAINING COURSES CONTENTS

Test Driven Development in Python

Achieving business benefits through automated software testing. By Dr. Mike Bartley, Founder and CEO, TVS

ESCI 386 Scientific Programming, Analysis and Visualization with Python. Lesson 5 Program Control

Effective unit testing with JUnit

Software Engineering Techniques

Data Analytics at NERSC. Joaquin Correa NERSC Data and Analytics Services

Domains and Competencies

Module 10. Coding and Testing. Version 2 CSE IIT, Kharagpur

CSC 221: Computer Programming I. Fall 2011

Python CS1 as Preparation for C++ CS2

Python Testing with unittest, nose, pytest

AP Computer Science A - Syllabus Overview of AP Computer Science A Computer Facilities

Programming and Software Development CTAG Alignments

Objectives. Python Programming: An Introduction to Computer Science. Lab 01. What we ll learn in this class

NetworkX: Network Analysis with Python

AMATH 352 Lecture 3 MATLAB Tutorial Starting MATLAB Entering Variables

CS 1133, LAB 2: FUNCTIONS AND TESTING

Introduction to Automated Testing

Data Analysis with MATLAB The MathWorks, Inc. 1

Exercise 1: Python Language Basics

IERG 4080 Building Scalable Internet-based Services

Python and Google App Engine

Testing Tools Content (Manual with Selenium) Levels of Testing

Tentative NumPy Tutorial

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics

Continuous Integration

Introduction to MATLAB Gergely Somlay Application Engineer

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Fail early, fail often, succeed sooner!

Postprocessing with Python

The Clean programming language. Group 25, Jingui Li, Daren Tuzi

[1] Learned how to set up our computer for scripting with python et al. [3] Solved a simple data logistics problem using natural language/pseudocode.

2+2 Just type and press enter and the answer comes up ans = 4

Advanced Functions and Modules

Syllabus for CS 134 Java Programming

Formal Software Testing. Terri Grenda, CSTE IV&V Testing Solutions, LLC

Introduction to the course, Eclipse and Python

Transcription:

Intro to scientific programming (with Python) Pietro Berkes, Brandeis University

Next 4 lessons: Outline Scientific programming: best practices Classical learning (Hoepfield network) Probabilistic learning (Restricted Boltzman Machine) Advanced probabilistic learning (Deep Belief Network)

Next 4 lessons: Outline Scientific programming: best practices Best practices in scientific programming Introduction to Python and numpy Test driven development in Python Hands-on session Classical learning (Hoepfield network) Probabilistic learning (Restricted Boltzman Machine) Advanced probabilistic learning (Deep Belief Network)

Programming needs of scientists Experimental study: [display stimuli for experiment] collect / store data manipulate and process data (statistics, ) visualize and prepare figures for publication Computational study: implement model run simulations on various datasets/parameters visualize and prepare figures for publication

Requirements for scientific programming Main requirement: code must be error free Scientist time, not computer time is the bottleneck being able to explore many different models and statistical analyses is more important than a very fast single approach Reproducibility and re-usability: easy to read, should not compile only on special architecture no need for somebody else to re-implement your algorithm

Best practices: The agile development cycle Write tests to check your code Write simplest version of the code Run tests and debug until all tests pass Optimize only at this point

Test-driven development Tests become part of the programming cycle and are automated Write test suite in parallel with your code External software runs the tests and provides reports and statistics testchoice ( main.testsequencefunctions)... ok testsample ( main.testsequencefunctions)... ok testshuffle ( main.testsequencefunctions)... ok ------------------------------------------------------- Ran 3 tests in 0.110s OK Write tests to check your code

Testing benefits Tests are the only way to trust your code Encourages better code and optimization: code can change, and consistency is assured by tests Faster development: Bugs are always pinpointed Avoids starting all over again when fixing one part of the code causes a bug somewhere else It might take you a while to get used to writing them, but it will pay off quite rapidly Write tests to check your code

What to test and how Test with hard-coded inputs for which you know the output: use simple but general cases E.g., test lower('text') -> 'text' with 'Hi there' : assertequal(lower('hi there'), 'hi there') test special or boundary cases 'another test' '?[{ 012' '' -> already lowercase -> lowercase undefined -> empty string Write tests to check your code

Numerical fuzzing Use deterministic test cases when possible In most numerical algorithm, this will cover only over-simplified situations; in some, it is impossible Fuzz testing: generate random input for which you know the answer E.g.: test a function that computes the variance with random data from a normal distribution Write tests to check your code

Testing learning algorithms Learning algorithms can get stuck in local maxima, the solution for general cases might not be known Turn your validation cases into tests Stability tests: start from final solution; verify that the algorithm stays there start from solution and add a small amount of noise to the parameters; verify that the algorithm converges back to the solution Generate data from the model with known parameters E.g., linear regression: generate data as y = a*x + b + noise for random a, b, and x, then test that the algorithm is able to recover a and b Write tests to check your code

Start simple Write small, testable chunks of code Write intention-revealing code Unnecessary features are not used but need to be tested and maintained Re-use external libraries (if well-tested) Do not try to write complex, efficient code at this point Write simplest version of the code

1. Isolate the bug How to handle bugs Test cases should already eliminate most possible causes Use a debugger, not print statements 2. Add a test that reproduces the bug to your test suite 3. Solve the bug 4. Run all tests and check that they pass Run tests and debug until all tests pass

How to optimize Usually, a small percentage of your code takes up most of the time Stop optimizing as soon as possible 1. Identify time-consuming parts of the code (use a profiler) 2. Only optimize those parts of the code 3. Keep running the tests to make sure that code is not broken Optimize only at this point

The agile development cycle (again) Write tests to check your code Write simplest version of the code Run tests and debug until all tests pass Optimize only at this point

Why Python? Very brief intro to Python much research is exploratory interpreted language high-level, dynamical language -> writing prototypes of ideas is easy and fast great support for numerical algorithms (numpy and scipy) and visualization (matplotlib) large standard library (file management, data types, ) large scientific community (fun, too) Matlab is a popular alternative, but it is expensive and makes it really difficult to apply the best practices mentioned before

Variables and data types [first: ipython command line; how to get help] integers, floats strings lists (slices, range) dictionaries [tuples, sets, files]

Control statements if-else statement (indentation matters) cycles: for statements (you can iterate over any sequence: lists, strings, dictionary keys,...) while statements

defining functions optional arguments docstrings Functions

Objects Objects are collections of variables (attributes) and functions (methods) create object through constructor -> initialize attributes interact through methods inheritance: sub-classes inherit methods Lists, strings, etc. are objects

Scripts vs. modules Scripts: a Python file that will be executed Module: a Python file that defines useful functions, objects, or constants (a Python library) import statement Python standard library

Numerical libraries NumPy is a Python extension module, written mostly in C, that defines the numerical array and matrix types and basic operations on them SciPy is another Python library that uses NumPy to do advanced math, signal processing, optimization, statistics and much more matplotlib is a Python library that facilitates publication-quality interactive plotting

numpy arrays arrays (multidimensional), shape array operations: sum, prod, mean, max, min, arange, linspace zeros, ones slicing, fancy indexing

Content of SciPy random Generate random numbers

plot (xlabel, title, ) imshow matplotlib

Test suites in Python: unittest unittest: standard Python testing library Each test case is a subclass of unittest.testcase Each test unit is a method of the class, whose name starts with test Each test unit checks one aspect of your code, and raises an exception if it does not work as expected

import unittest Anatomy of a TestCase Create new file, test_something.py: class FirstTestCase(unittest.TestCase): def test_truisms(self): """All methods beginning with test are executed""" self.asserttrue(true) self.assertfalse(false) def test_equality(self): """Docstrings are printed during executions of the tests in the Eclipse IDE""" self.assertequal(1, 1) if name == ' main ': unittest.main()

TestCase.assertSomething TestCase defines utility methods to check that some conditions are met, and raise an exception otherwise Check that statement is true/false: asserttrue('hi'.islower()) assertfalse('hi'.islower()) Check that two objects are equal: assertequal(2+1, 3) assertequal([2]+[1], [2, 1]) assertnotequal([2]+[1], [2, 1]) => fail => pass => pass => pass => fail

TestCase.assertSomething Check that two numbers are equal up to a given precision: assertalmostequal(x, y, places=7) places is the number of decimal places to use: assertalmostequal(1.121, 1.12, 2) assertalmostequal(1.121, 1.12, 3) => pass => fail

Testing with numpy arrays When testing numerical algorithms, numpy arrays have to be compared elementwise: class NumpyTestCase(unittest.TestCase): def test_equality(self): a = numpy.array([1, 2]) b = numpy.array([1, 2]) self.assertequal(a, b) E ====================================================================== ERROR: test_equality ( main.numpytestcase) ---------------------------------------------------------------------- Traceback (most recent call last): File "numpy_testing.py", line 8, in test_equality self.assertequal(a, b) File "/Library/Frameworks/Python.framework/Versions/6.1/lib/python2.6/unitt est.py", line 348, in failunlessequal if not first == second: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ---------------------------------------------------------------------- Ran 1 test in 0.000s FAILED (errors=1)

Testing with numpy arrays numpy.testing defines appropriate function: numpy.testing.assert_array_equal(x, y) numpy.testing.assert_array_almost_equal(x, y, decimal=6) numpy.testing.assert_array_less(x, y) If you need to check more complex conditions: numpy.all(x): returns true if all elements of x are true numpy.any(x): returns true is any of the elements of x is true # test that all elements of x are larger than 1.0 asserttrue(all(x > 1.0))

Basic tests example Test with hard-coded inputs: use simple but general cases test special or boundary cases class LowerTestCase(unittest.TestCase): def test_lower(self): # each test case is a tuple of (input, expected_result) test_cases = [('HeLlO world', 'hello world'), ('hi', 'hi'), ('123 ([?', '123 ([?'), ('', '')] # test all cases for arg, expected in test_cases: output = arg.lower() self.assertequal(output, expected)

Numerical fuzzing example class VarianceTestCase(unittest.TestCase): def test_var(self): N, D = 100000, 5 # goal variances: [0.1, 0.45, 0.8, 1.15, 1.5] desired = numpy.linspace(0.1, 1.5, D) # test multiple times with random data for _ in range(20): # generate random, D-dimensional data x = numpy.random.randn(n, D) * numpy.sqrt(desired) variance = numpy.var(x, axis=0) numpy.testing.assert_array_almost_equal(variance, desired, 1)

Agile development Demo

Thanks! Exercises next. Where to get help: Python: http://docs.python.org/release/2.6/library/index.html Numpy docs: http://docs.scipy.org/doc/numpy/reference/index.html Matplotlib gallery: http://matplotlib.sourceforge.net/gallery.html More information on best practices: Software carpentry course by Greg Wilson http://software-carpentry.org Similar course by Tiziano Zito http://itb.biologie.hu-berlin.de/~zito/teaching/sc