Introduction to Scientific Computing what you need to learn now to decide what you need to learn next Bob Dowling University Computing Service rjd4@cam.ac.uk
1. Why this course exists 2. Common concepts and general good practice Coffee break 3. Selecting programming languages
Why does this course exist? MPI e-science grid computing Condor? Basics C/Fortran Scripts!
Common misconceptions Any program can be copied from one machine to another. Every line in a program takes the same time to run. Programs with source code have to be built every time they're run. Everything has to fit in the same program.
Common concepts and good practice
A simple program Linear execution A B C B A D B Not constant speed A B C B D C B Repeated elements A B C B D D
A Lines in a script B Chunks of machine code C Calls to external libraries B Calls to other programs D
Parallel program A Much harder to program Single Instruction Multiple Data B B B B B C C C C C B B B B B D
Machine architectures CPU Memory Internal communication MIPS L2 Pipeline I/O i386 Opteron DMA i586 RAM CacheSparc Power G5 L1 Xeon bus L3 Pentium
Operating system support Your program Science library Support libraries System libraries Kernel C library Kernel Maths library
Floating point problems e.g. numerical simulations Universal principles: 0.1 0.1000000000001 and worse Program Design: How computers handle numbers
Other problems e.g. sequence comparison text searching ^f.*x$ firebox fix fusebox Pattern matching using Regular Expressions
Split up the job bash Bits of the job Different tools & Glue input data Python output data gnuplot Divide and conquer graphical Fortran
Choose a suitable tool for each bit What tools? How to pick the right one? Pros & Cons FORTRAN MATLABPerl Java bash gnuplot Python C++ Excel SPSS
Glue Splitting up Gluing together 1. Pipeline 2. Shell script 3. GUI
Pipeline Redirection Piping make_initial < input_file process_lots post_process > output_file input_file make_initial post_process process_lots output_file Course: Unix: Introduction to the Command Line Interface
Shell script #!/bin/bash -e job="${1}" if [! -f "${job}.dat" ] then make_initial < "${job}.in" > "${job}.out" fi while work_to_do "$ {job}.dat" do process < "${job}.dat" > "$ {job}.new" done post_process < "${job}.dat" Course: Simple shell scripting for scientists
Graphical User Interface Data output User input
Lumps Splitting up Gluing together Objects Modules Functions Units
Structured programming Don't repeat yourself
Never repeat code a_norm = 0.0 for i in range(0,100): a_norm += a[i]*a[i] b_norm = 0.0 for i in range(0,100): b_norm += b[i]*b[i] c_norm = 0.0 for i in range(0,100): c_norm += c[i]*c[i] Repetition
Structured code def norm(v): v_norm = 0.0 for i in range(0,100): v_norm += v[i]*v[i] return v_norm Define a function called norm() Single instance of the code a_norm = norm(a) b_norm = norm(b) c_norm = norm(c) Calling the function
Structured code Test function Function code Debug function Improve function Once! Time function All good practice follows structuring.
Improved code def norm(v): w = [] for i in range(0,100): w.append(v[i]*v[i]) w.sort() v_norm = 0.0 for i in range(0,100): v_norm += w[i] return v_norm a_norm = norm(a) b_norm = norm(b) c_norm = norm(c) Improve function
More flexible code def norm(v): w = [] for i in range(0,len(v) ): w.append(v[i]*v[i]) w.sort() v_norm = 0.0 for i in range(0,len(v) ): v_norm += w[i] return v_norm a_norm = norm(a) b_norm = norm(b) c_norm = norm(c) Improve function
More Pythonic code def norm(v): w = [item*item for item in v] w.sort() v_norm = 0.0 for item in w: v_norm += item return v_norm a_norm = norm(a) b_norm = norm(b) c_norm = norm(c) Make function more pythonic
Best sort of code import library a_norm = library.norm(a) b_norm = library.norm(b) c_norm = library.norm(c) Get someone else to do all the work!
Libraries Written by experts In every area Learn what libraries exist in your area Use them Save your effort for your research
An example library Numerical Algorithms Group: NAG Roots of equations Differential equations Interpolation Linear algebra Statistics Sorting Special functions
Unit testing Program split into units. Test each unit individually. Programming is like sex: one mistake and you're providing support for a lifetime. Michael Sinz Catch bugs earlier. Saves time in the long term.
Debuggers Step through running code. Examine state as you go. Better to write debugging code in your program! Each new user of a new system uncovers a new class of bugs. Brian Kernighan
The optimiser Finds short cuts in your code. Leave it to the system! Premature optimisation is the root of all evil (or at least most of it) in programming. Donald Knuth Structured code optimises better. Optimisation Magic wand
Algorithms Time taken / Memory used vs. Size of input / Required accuracy Algorithms selected make or break programs.
Example: Matrix multiplication for(int i=0; i<n, i++) { for(int j=0; j<p, j++) { for(int k=0; k<q, k++) { a[i][j] += b[i][k]*c[k][j] } } } for(int k=0; k<q, k++) { for(int j=0; j<p, j++) {
Example: Matrix multiplication ( ) ( )( ) C11 C12 C21 C22 = A11 A12 B11 B12 A21 A22 B21 B22 M1=(A11+A22)(B11+B22) M2=(A21+A22)B11 M3=A11(B12 B22) M4=A22(B21 B11) M5=(A11+A12)B22 M6=(A21 A11)(B11+B12) M7=(A12 A22)(B21+B22) C11=M1+M2 M5+M7 C12=M3+M5 C21=M2+M4 C22=M1 M2+M3+M6
UCS advice escience-support@ucs.cam.ac.uk Advice on libraries techniques algorithms
Coffee break 15 minutes Wrists Spine Eyes Caffeine addiction
Choosing a language What's best? available? used already? suitable?? FORTRAN MATLABPerl Java bash gnuplot Python C++ Excel SPSS
Classes of language Interpreted Shell Perl script Compiled Python What files get created Java What you do C,C++, Fortran What the system sees
Shell scripting languages Several scripting languages: job="${1}" /bin/sh /bin/sh /bin/csh /bin/bash /bin/ksh /bin/zsh #!/bin/bash /bin/tcsh
Shell script Suitable for Unsuitable for gluing programs together performancecritical jobs wrapping programs floating point small tasks GUIs Easy to learn complex tasks Very widely used
UCS courses Unix: Introduction to the Command Line Interface Simple shell scripting for scientists
Further shell scripting? Python!
High power scripting languages Python Perl Call out to libraries in other languages #!/usr/bin/python import library #!/usr/bin/perl use library;
Perl The Swiss army knife language Suitable for Bad first language text processing data pre-/post-processing Easy to write unreadable code small tasks CPAN: Comprehensive Perl Archive Network Widely used There's more than one way to do it. Beware Perl geeks
Python Batteries included Suitable for text processing Excellent first language data pre-/post-processing Easy to write maintainable code small & large tasks Built-in comprehensive library of functions Scientific Python library The Python way Code nesting style is unique
UCS courses Python: Introduction for absolute beginners Python: Introduction for programmers Python: Further Topics
UCS courses Python: Unit testing Python: Regular expressions Python: Interoperation with Fortran Python: Operating system access
Spreadsheets Microsoft Excel OpenOffice.org calc Apple Numbers
Spreadsheets Taught at school Taught badly at school! Easy to tinker Easy to corrupt data Easy to get started Hard to be systematic
UCS courses Excel 2007: Beginners Excel 2007: Introduction Excel 2007: Functions and Macros Excel 2007: Managing Data & Lists Excel 2007: Analysing & Summarizing Data
Specialist systems Database Graphs GUIs Mathematics Statistics PostgreSQL, Access, gnuplot, ploticus, Glade Mathematica, MATLAB, Maple SPSS, Stata
Drawing graphs Manual vs. automatic gnuplot ploticus matplotlib Gnuplot for simple graphs
GUIs Glade Back end: C, C++, Perl, Python Glade: Introduction to building GUIs
Mathematical manipulation MATLAB Mathematica
Mathematical manipulation Octave
Mathematical manipulation Suitable for fiddling Small and medium numerical work Graphical subsystem Problems additional modules often needed
UCS courses MATLAB: Basics MATLAB: Graphics MATLAB: Linear algebra Mathematica: Basics Mathematica: Graphics Mathematica: Linear algebra
Statistics Stata
UCS courses NB: we do not teach statistics! SPSS: Introduction for beginners SPSS: Beyond the basics Stata: Introduction Stata for regression analysis Regression analysis in R
Compiled languages No specialist system and scripts not fast enough Library requirement with no script interface C C++ Compiled language Fortran Java
source code fubar.c snafu.c main() pow() zap() pow() zap() printf() fubar.o snafu.o main() pow() zap() pow() zap() printf() compiling object files linking libc.so.6 fubar executable main() pow() zap() printf() printf()
Somebody else's code Unix: Building, installing and running software
You don't need to write the whole program in a compiled language! e.g. f2py Python Fortran
Compiled languages C C++ Which? Fortran Java
Fortran The best for numerical work Excellent numerical libraries Unsuitable for everything else Very different versions: 77, 90, 95, 2003
Fortran courses Fortran: Introduction to modern Fortran Python: Interoperation with Fortran
C The best for Unix work Excellent libraries Superceded by C++ Memory management
Very basic C course C: Introduction for Those New to Programming
C++ Extension of C Object oriented Standard template library Good general purpose language Very hard to learn well
Learning C++ Thinking in C++, 2nd ed. Eckel, Bruce (2003) (two volumes: 800 and 500 pages!) Programming: principles and practice using C++ Stroustrup, Bjarne (2008) harder but better for scientific computing
Parallel programming A Parallel programming: Introduction to MPI Parallel programming: Options and design MPI library Fortran, C, C++ B B B B B C C C C C B B B B B D
Java Object oriented Good general purpose language Much easier to learn and use than C++ Cross-platform Some poorly thought out libraries Multiple versions: Use >= 1.5 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6
Java courses Computer Lab Java courses IA: Object Oriented Programming IB: Further Java
Other UCS courses of interest vi editor: introduction emacs editor: introduction Programming concepts: introduction for absolute beginners Program design: organising and structuring programming tasks
Courses All UCS courses http://training.csx.cam.ac.uk/ucs/theme Scientific computing theme http://www-uxsup.cam.ac.uk/courses/ http://training.csx.cam.ac.uk/ucs/theme/scientific-comp
Contacts UCS help desk help-desk@ucs.cam.ac.uk Scientific computing support scientific-computing@ucs.cam.ac.uk